<p:declare-stepxmlns:p="http://www.w3.org/ns/xproc"xmlns:px="http://www.daisy.org/ns/pipeline/xproc"xmlns:d="http://www.daisy.org/ns/pipeline/data"xmlns:cx="http://xmlcalabash.com/ns/extensions"type="px:epub3-to-epub3.script"version="1.0"exclude-inline-prefixes="#all"name="main">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h1px:role="name">EPUB 3 Enhancer</h1><ppx:role="desc">Transforms an EPUB 3 publication into an EPUB 3 publication with audio and/or a braille rendition.</p>
<apx:role="homepage"href="http://daisy.github.io/pipeline/Get-Help/User-Guide/Scripts/epub3-to-epub3/">
Online documentation
</a>
<address>
Authors:
<dlpx:role="author">
<dt>Name:</dt>
<ddpx:role="name">Bert Frees</dd>
<dt>E-mail:</dt>
<dd><apx:role="contact"href="mailto:bertfrees@gmail.com">bertfrees@gmail.com</a></dd>
</dl>
</address>
</p:documentation>
<p:optionname="source"required="true"px:type="anyFileURI"px:media-type="application/epub+zip text/plain">
<p:documentation>
<h2px:role="name">Input EPUB 3</h2><ppx:role="desc"xml:space="preserve">The EPUB you want to convert.
You may alternatively use the "mimetype" document if your input is a unzipped/"exploded" version of an EPUB.</p>
</p:documentation>
</p:option><p:inputport="metadata"primary="false"sequence="true"px:media-type="application/xml">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Metadata</h2><ppx:role="desc"xml:space="preserve">Metadata to be included in the EPUB.
If specified, the document must be a single
[`metadata`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-metadata-elem) or
[`package`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-package-elem) element. A
[`prefix`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-prefix-attr) attribute is
allowed on the root element. The metadata will be injected in the EPUB's package document, possibly
overwriting existing metadata. This works as follows:
- All (valid) fields in the provided metadata document end up in the output EPUB. More than one
field with the same property is allowed. `meta` elements with a `refines` attribute must refine
elements within the metadata document itself. Elements that refine elements in the EPUB's package
document will be dropped.
- Any metadata fields in the input EPUB that have matching fields (same property in case of `meta`
fields, same element name in case of `dc:*` fields) in the provided metadata document are omitted,
together with any `meta` elements that refine them.
- Metadata fields in the input that do not have any matching fields in the provided metadata
document are preserved in the output.
There are a number of fields that result in addional changes in the EPUB (apart from an updated
`metadata` section in the package document):
- If the provided metadata document contains one or more
[`dc:identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dcidentifier)
fields, the first one without a `refines` attribute will be used to update the
[`unique-identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#attrdef-package-unique-identifier)
attribute on the package document. The `dc:identifier` metadata in the content documents can also be
aligned with it. This behavior can be enabled or disabled with the "Update <meta
name='dc:identifier'> elements based on EPUB metadata" option.
- If the provided metadata document contains one or more
[`dc:title`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dctitle) fields, the
first one can also be used as the `title` element in the content documents. This behavior can be
enabled or disabled with the "Update <title> elements based on EPUB metadata" option.
- If the provided metadata document contains exactly one
[`dc:language`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dclanguage) field,
it can be used to update the `xml:lang` and `lang` attributes of the content documents. This
behavior can be enabled or disabled with the "Update 'lang' attributes based on metadata" option.
Some fields are ignored:
- The [`dcterms:modified`](https://www.w3.org/publishing/epub3/epub-packages.html#last-modified-date)
field gets updated whenever Pipeline produces an EPUB. As a consequence, any `dcterms:modified`
fields in the provided metadata document are ignored.</p>
</p:documentation>
<p:empty/>
</p:input><p:optionname="update-lang-attributes"required="false"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Update 'lang' attributes based on metadata</h2><ppx:role="desc"xml:space="preserve">Whether to update 'lang' and 'xml:lang' attributes of content documents based on metadata in the package document.
If there is exactly one
[`dc:language`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dclanguage) element,
use its value to create `xml:lang` and `lang` attributes on the root elements of all content
documents (overwriting any existing attributes)
If the "Metadata" option is used to inject new metadata into the EPUB, the resulting metadata is
used to generate the attributes.</p>
</p:documentation>
</p:option><p:optionname="update-identifier-in-content-docs"required="false"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Update <meta name='dc:identifier'> elements based on EPUB metadata</h2><ppx:role="desc"xml:space="preserve">Whether to update <meta name='dc:identifier'> elements of content documents based on metadata in the package document.
Use the primary identifier (provided by the
[`dc:identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dcidentifier)
element identified by the
[`unique-identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#attrdef-package-unique-identifier)
attribute) to create a `<meta name='dc:identifier'>` element in all content documents
(overwriting any existing elements with the same name).
If the "Metadata" option is used to inject new metadata into the EPUB, the resulting metadata is
used to generate the attributes.</p>
</p:documentation>
</p:option><p:optionname="update-title-in-content-docs"required="false"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Update <title> elements based on EPUB metadata</h2><ppx:role="desc"xml:space="preserve">Whether to update <title> elements of content documents based on metadata in the package document.
If there is one or more
[`dc:title`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dctitle) element, use
the value of the first one to create a `<title>` element in all content documents (overwriting
any existing elements with the same name).
If the "Metadata" option is used to inject new metadata into the EPUB, the resulting metadata is
used to generate the attributes.</p>
</p:documentation>
</p:option><p:optionname="ensure-pagenum-text"required="false"select="'false'">
<p:pipeinfo>
<px:type>
<choicexmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">
<value>true</value>
<a:documentationxml:lang="en">Yes</a:documentation>
<value>false</value>
<a:documentationxml:lang="en">No</a:documentation>
<value>hidden</value>
<a:documentationxml:lang="en">Yes, but not visible</a:documentation>
</choice>
</px:type>
</p:pipeinfo>
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Ensure text content for page numbers</h2><ppx:role="desc"xml:space="preserve">Whether to fix empty page number elements.
Page number elements (elements with a `doc-pagebreak` `role` or `pagebreak` `epub:type`) that have
no child text node can be given one. The text can be generated based on
- the element's `aria-label` attribute,
- the element's `title` attribute, or
- the text used by the corresponding page link in the navigation document.
These options are tried in the listed order. If none of the attributes exist, and the page is linked
from the navigation document, no text is generated.</p>
</p:documentation>
</p:option><p:optionname="ensure-section-headings"required="false"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Ensure headings for all sections</h2><ppx:role="desc"xml:space="preserve">Whether to generate a heading element for sections that don't have one.
For sectioning elements that don't have a heading element, one can be created. The headings are
generated based on the section element's [`aria-label`](https://www.w3.org/TR/wai-aria/#aria-label)
attribute. If the `aria-label` attribute is not present, no heading element is generated. When an
`aria-label` is used to generate a heading, it is replaced with a
[`aria-labelledby`](https://www.w3.org/TR/wai-aria/#aria-labelledby) attribute that points to the
new heading. The rank of the generated heading matches the depth of the corresponding TOC item in
the navigation document.</p>
</p:documentation>
</p:option><p:optionname="braille"required="false"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Translate to braille</h2><ppx:role="desc">Whether to produce a braille rendition.</p>
</p:documentation>
</p:option><p:optionname="tts"required="false"select="'default'">
<p:pipeinfo>
<px:type>
<choicexmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">
<value>true</value>
<a:documentationxml:lang="en">Yes</a:documentation>
<value>false</value>
<a:documentationxml:lang="en">No</a:documentation>
<value>default</value>
<a:documentationxml:lang="en">If publication has no media overlays yet</a:documentation>
</choice>
</px:type>
</p:pipeinfo>
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Perform text-to-speech</h2><ppx:role="desc"xml:space="preserve">Whether to use a speech synthesizer to produce media overlays.
This will remove any existing media overlays in the EPUB.</p>
</p:documentation>
</p:option><p:optionname="sentence-detection"required="false"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Perform sentence detection</h2><ppx:role="desc"xml:space="preserve">Whether to add markup (span elements) for sentences.
This setting has no effect when text-to-speech is also enabled. In that case sentences are always
marked up.</p>
</p:documentation>
</p:option><p:optionname="braille-translator"required="false"px:type="transform-query"select="'(translator:liblouis)'">
<p:documentation>
<h2px:role="name">Braille translator query</h2>
</p:documentation>
</p:option><p:optionname="stylesheet"select="''"required="false"px:type="anyURI"px:sequence="true"px:separator=" "px:media-type="text/css text/x-scss application/xslt+xml">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Style sheets</h2><ppx:role="desc"xml:space="preserve">A list of CSS/Sass style sheets to take into account.
A list of CSS/Sass style sheets to take into account, both for braille transcription (if a braille
rendition is requested), and for text-to-speech (if text-to-speech is enabled).
Must be a space separated list of URIs, absolute or relative to the input.
All style sheets are applied at once, but the order in which they are specified has an influence on
the [cascading order](https://www.w3.org/TR/CSS2/cascade.html#cascading-order).
If the "Apply author style sheets" option is enabled, [author style
sheets](https://www.w3.org/TR/CSS2/cascade.html#cascade) will be taken into account and will take
precedence over any style sheets specified through this option ([user style
sheets](https://www.w3.org/TR/CSS2/cascade.html#cascade)).
When generating the braille rendition, style sheets are interpreted according to [braille
CSS](http://braillespecs.github.io/braille-css) rules. When performing text-to-speech, they are
interpreted as [aural CSS](https://www.w3.org/TR/CSS2/aural.html).
For info on how to use Sass (Syntactically Awesome StyleSheets) see the [Sass
manual](http://sass-lang.com/documentation/file.SASS_REFERENCE.html).
</p>
</p:documentation>
</p:option><p:optionname="stylesheet-parameters"select="''"required="false"px:type="stylesheet-parameters"><p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Style sheet parameters</h2><ppx:role="desc"xml:space="preserve">A list of parameters passed to the style sheets.
Style sheets, whether they're user style sheets (specified with the "stylesheet" option) or author
style sheets (associated with the source), may have parameters (Sass variables). The
"stylesheet-parameters" option, which takes a comma-separated list of key-value pairs enclosed in
parenthesis, can be used to set these variables.
For example, if a style sheet uses the Sass variable "foo":
~~~sass
@if $foo {
/* some style that should only be enabled when "foo" is truthy */
}
~~~
you can control that variable with the following parameters list: `(foo:true)`.</p>
</p:documentation>
</p:option><p:optionname="lexicon"select="p:system-property('d:org.daisy.pipeline.tts.default-lexicon')"required="false"px:type="anyURI"px:sequence="true"px:separator=" "px:media-type="application/pls+xml"><p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Lexicons</h2><ppx:role="desc"xml:space="preserve">A list of PLS lexicons to take into account.
Must be a space separated list of URIs, absolute or relative to the input.
Lexicons can also be attached to the source document, using a ['link'
element](http://kb.daisy.org/publishing/docs/text-to-speech/pls.html#ex-07).
PLS lexicons allow you to define custom pronunciations of words. It is
meant to help TTS processors deal with ambiguous abbreviations and
pronunciation of proper names. When a word is defined in a lexicon,
the processor will use the provided pronunciation instead of the
default rendering.
The syntax of a PLS lexicon is defined in [Pronunciation Lexicon
Specification (PLS) Version
1.0](https://www.w3.org/TR/pronunciation-lexicon), extended with
regular expression matching. To enable regular expression matching,
add the "regex" attribute, as follows:
~~~xml
<lexicon xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" version="1.0"
alphabet="ipa" xml:lang="en">
<lexeme regex="true">
<grapheme>([0-9]+)-([0-9]+)</grapheme>
<alias>between $1 and $2</alias>
</lexeme>
</lexicon>
~~~
The regex feature works only with alias-based substitutions. The regex
syntax used is that from [XQuery 1.0 and XPath
2.0](https://www.w3.org/TR/xpath-functions/#regex-syntax).
Whether or not the regex attribute is set to "true", the grapheme
matching can be made more accurate by specifying the
"positive-lookahead" and "negative-lookahead" attributes:
~~~xml
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
alphabet="ipa" xml:lang="en">
<lexeme>
<grapheme positive-lookahead="[ ]+is">SB</grapheme>
<alias>somebody</alias>
</lexeme>
<lexeme>
<grapheme>SB</grapheme>
<alias>should be</alias>
</lexeme>
<lexeme xml:lang="fr">
<grapheme positive-lookahead="[ ]+[cC]ity">boston</grapheme>
<phoneme>bɔstøn</phoneme>
</lexeme>
</lexicon>
~~~
Graphemes with "positive-lookahead" will match if the beginning of
what follows matches the "position-lookahead" pattern. Graphemes with
"negative-lookahead" will match if the beginning of what follows does
not match the "negative-lookahead" pattern. The lookaheads are
case-sensitive while the grapheme contents are not.
The lexemes are matched in this order:
1. Graphemes with regex="false" come first, no matter if there is a lookahead or not;
2. then come graphemes with regex="true" and no lookahead;
3. then graphemes with regex="true" and one or two lookaheads.
Within these categories, lexemes are matched in the same order as they
appear in the lexicons.</p>
</p:documentation>
</p:option><p:optionname="apply-document-specific-stylesheets"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Apply author CSS style sheets</h2><ppx:role="desc"xml:space="preserve">If this option is enabled, any CSS style sheets attached to the EPUB's content documents for media "embossed" or "speech" will be taken into account.
The EPUB's content documents may contain CSS ([author style
sheets](https://www.w3.org/TR/CSS2/cascade.html#cascade)) that apply to "embossed" or
"[speech](https://www.w3.org/TR/CSS2/aural.html)" media. Style sheets can be associated with an HTML
file in several ways: linked (using an 'xml-stylesheet' processing instruction or a 'link' element),
embedded (using a 'style' element) and/or inlined (using 'style' attributes).
Author style sheets take precedence over user style sheets (any CSS provided through the "Style
sheets" option). For instance, if the EPUB already contains the rule `p { padding-left: 2; }`, and
using this script the rule `p#docauthor { padding-left: 4; }` is provided, then the `padding-left`
property will get the value `2` because that's what was defined in the EPUB, even though the
provided CSS is more specific.
</p>
</p:documentation>
</p:option><p:optionname="set-default-rendition-to-braille"px:type="boolean"select="'false'">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Set default rendition to braille.</h2><ppx:role="desc">Make the generated braille rendition the default rendition.</p>
</p:documentation>
</p:option><p:inputport="tts-config"primary="false"px:media-type="application/vnd.pipeline.tts-config+xml"><p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Text-to-speech configuration file</h2><ppx:role="desc"xml:space="preserve">Configuration file for text-to-speech.
[More details on the configuration file format](http://daisy.github.io/pipeline/Get-Help/User-Guide/Text-To-Speech/).</p>
</p:documentation>
<p:inline><d:config/></p:inline>
</p:input><p:optionname="sentence-class"required="false"select="''">
<p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Sentence class</h2><ppx:role="desc"xml:space="preserve">Class attribute to mark sentences with.
When sentence detection is enabled, this option may be used to add a class attribute to the `span`
elements that represent the sentences.</p>
</p:documentation>
</p:option><p:optionname="result"required="true"px:output="result"px:type="anyDirURI"px:media-type="text">
<p:documentation>
<h2px:role="name">Output EPUB 3</h2><pxmlns="http://www.w3.org/1999/xhtml"px:role="desc">The output braille file.</p></p:documentation>
</p:option><p:optionname="include-tts-log"select="p:system-property('d:org.daisy.pipeline.tts.log')"px:type="boolean"><p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">Enable TTS log</h2><ppx:role="desc"xml:space="preserve">Whether or not to make the TTS log available.
The TTS log contains a great deal of additional information that is not present in the main job log
and that is helpful for troubleshooting. Most of the log entries concern particular chunks of text
of the input document.
The default can be changed using the
[`org.daisy.pipeline.tts.log`](http://daisy.github.io/pipeline/Get-Help/User-Guide/Text-To-Speech/#common-settings)
property.
</p>
</p:documentation>
</p:option><p:outputport="tts-log"sequence="true"><p:documentationxmlns="http://www.w3.org/1999/xhtml">
<h2px:role="name">TTS log</h2><ppx:role="desc">Log file with information about text-to-speech process.</p>
</p:documentation>
<p:pipestep="convert"port="tts-log"/>
</p:output>
<p:importhref="http://www.daisy.org/pipeline/modules/epub-utils/library.xpl">
<p:documentation>
px:epub-load
</p:documentation>
</p:import>
<p:importhref="epub3-to-epub3.convert.xpl">
<p:documentation>
px:epub3-to-epub3
</p:documentation>
</p:import>
<p:importhref="http://www.daisy.org/pipeline/modules/fileset-utils/library.xpl">
<p:documentation>
px:fileset-store
px:fileset-delete
</p:documentation>
</p:import>
<px:epub-loadversion="3"store-to-disk="true"name="load"px:progress="0.1"px:message="Loading EPUB">
<p:with-optionname="href"select="$source"/>
<p:with-optionname="temp-dir"select="concat($temp-dir,'load/')"/>
</px:epub-load>
<px:epub3-to-epub3name="convert"px:progress="0.8">
<p:inputport="source.in-memory">
<p:pipestep="load"port="result.in-memory"/>
</p:input>
<p:with-optionname="result-base"select="concat($result,'/',replace(replace($source,'(\.epub|/mimetype)$',''),'^.*/([^/]+)$','$1'),'.epub!/')"/>
<p:inputport="metadata">
<p:pipeport="metadata"step="main"/>
</p:input>
<p:with-optionname="braille-translator"select="$braille-translator"/>
<p:with-optionname="stylesheet"select="string-join( for $s in tokenize($stylesheet,'\s+')[not(.='')] return resolve-uri($s,$source), ' ')"/>
<p:with-optionname="stylesheet-parameters"select="$stylesheet-parameters"/>
<p:with-optionname="lexicon"select="for $l in tokenize($stylesheet,'\s+')[not(.='')] return resolve-uri($l,$source)"/>
<p:with-optionname="apply-document-specific-stylesheets"select="$apply-document-specific-stylesheets"/>
<p:with-optionname="set-default-rendition-to-braille"select="$set-default-rendition-to-braille"/>
<p:with-optionname="braille"select="$braille"/>
<p:with-optionname="tts"select="$tts"/>
<p:with-optionname="sentence-detection"select="$sentence-detection"/>
<p:with-optionname="update-lang-attributes"select="$update-lang-attributes"/>
<p:with-optionname="update-identifier-in-content-docs"select="$update-identifier-in-content-docs"/>
<p:with-optionname="update-title-in-content-docs"select="$update-title-in-content-docs"/>
<p:with-optionname="ensure-pagenum-text"select="$ensure-pagenum-text"/>
<p:with-optionname="ensure-section-headings"select="$ensure-section-headings"/>
<p:with-optionname="sentence-class"select="$sentence-class"/>
<p:with-optionname="include-tts-log"select="$include-tts-log"/>
<p:inputport="tts-config">
<p:pipestep="main"port="tts-config"/>
</p:input>
<p:with-optionname="temp-dir"select="concat($temp-dir,'convert/')"/>
</px:epub3-to-epub3>
<px:fileset-storename="store"px:progress="0.1"px:message="Storing EPUB">
<p:inputport="in-memory.in">
<p:pipestep="convert"port="result.in-memory"/>
</p:input>
</px:fileset-store>
<px:fileset-deletecx:depends-on="store">
<p:inputport="source">
<p:pipestep="convert"port="temp-audio-files"/>
</p:input>
</px:fileset-delete>
</p:declare-step>