<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:px="http://www.daisy.org/ns/pipeline/xproc" xmlns:d="http://www.daisy.org/ns/pipeline/data" xmlns:cx="http://xmlcalabash.com/ns/extensions" type="px:epub3-to-epub3.script" version="1.0" exclude-inline-prefixes="#all" name="main" px:input-filesets="epub3" px:output-filesets="epub3"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h1 px:role="name">EPUB 3 Enhancer</h1> <p px:role="desc">Transforms an EPUB 3 publication into an EPUB 3 publication with audio and/or a braille rendition.</p> <a px:role="homepage" href="http://daisy.github.io/pipeline/Get-Help/User-Guide/Scripts/epub3-to-epub3/"> Online documentation </a> <address> Authors: <dl px:role="author"> <dt>Name:</dt> <dd px:role="name">Bert Frees</dd> <dt>E-mail:</dt> <dd><a px:role="contact" href="mailto:bertfrees@gmail.com">bertfrees@gmail.com</a></dd> </dl> </address> </p:documentation> <p:option name="source" required="true" px:type="anyFileURI" px:media-type="application/epub+zip text/plain"> <p:documentation> <h2 px:role="name">Input EPUB 3</h2> <p px:role="desc" xml:space="preserve">The EPUB you want to convert. You may alternatively use the "mimetype" document if your input is a unzipped/"exploded" version of an EPUB.</p> </p:documentation> </p:option> <p:input port="metadata" primary="false" sequence="true" px:media-type="application/xml"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Metadata</h2> <p px:role="desc" xml:space="preserve">Metadata to be included in the EPUB. If specified, the document must be a single [`metadata`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-metadata-elem) or [`package`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-package-elem) element. A [`prefix`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-prefix-attr) attribute is allowed on the root element. The metadata will be injected in the EPUB's package document, possibly overwriting existing metadata. This works as follows: - All (valid) fields in the provided metadata document end up in the output EPUB. More than one field with the same property is allowed. `meta` elements with a `refines` attribute must refine elements within the metadata document itself. Elements that refine elements in the EPUB's package document will be dropped. - Any metadata fields in the input EPUB that have matching fields (same property in case of `meta` fields, same element name in case of `dc:*` fields) in the provided metadata document are omitted, together with any `meta` elements that refine them. - Metadata fields in the input that do not have any matching fields in the provided metadata document are preserved in the output. There are a number of fields that result in addional changes in the EPUB (apart from an updated `metadata` section in the package document): - If the provided metadata document contains one or more [`dc:identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dcidentifier) fields, the first one without a `refines` attribute will be used to update the [`unique-identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#attrdef-package-unique-identifier) attribute on the package document. The `dc:identifier` metadata in the content documents can also be aligned with it. This behavior can be enabled or disabled with the "Update <meta name='dc:identifier'> elements based on EPUB metadata" option. - If the provided metadata document contains one or more [`dc:title`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dctitle) fields, the first one can also be used as the `title` element in the content documents. This behavior can be enabled or disabled with the "Update <title> elements based on EPUB metadata" option. - If the provided metadata document contains exactly one [`dc:language`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dclanguage) field, it can be used to update the `xml:lang` and `lang` attributes of the content documents. This behavior can be enabled or disabled with the "Update 'lang' attributes based on metadata" option. Some fields are ignored: - The [`dcterms:modified`](https://www.w3.org/publishing/epub3/epub-packages.html#last-modified-date) field gets updated whenever Pipeline produces an EPUB. As a consequence, any `dcterms:modified` fields in the provided metadata document are ignored.</p> </p:documentation> <p:empty/> </p:input> <p:option name="update-lang-attributes" required="false" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Update 'lang' attributes based on metadata</h2> <p px:role="desc" xml:space="preserve">Whether to update 'lang' and 'xml:lang' attributes of content documents based on metadata in the package document. If there is exactly one [`dc:language`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dclanguage) element, use its value to create `xml:lang` and `lang` attributes on the root elements of all content documents (overwriting any existing attributes) If the "Metadata" option is used to inject new metadata into the EPUB, the resulting metadata is used to generate the attributes.</p> </p:documentation> </p:option> <p:option name="update-identifier-in-content-docs" required="false" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Update <meta name='dc:identifier'> elements based on EPUB metadata</h2> <p px:role="desc" xml:space="preserve">Whether to update <meta name='dc:identifier'> elements of content documents based on metadata in the package document. Use the primary identifier (provided by the [`dc:identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dcidentifier) element identified by the [`unique-identifier`](https://www.w3.org/publishing/epub3/epub-packages.html#attrdef-package-unique-identifier) attribute) to create a `<meta name='dc:identifier'>` element in all content documents (overwriting any existing elements with the same name). If the "Metadata" option is used to inject new metadata into the EPUB, the resulting metadata is used to generate the attributes.</p> </p:documentation> </p:option> <p:option name="update-title-in-content-docs" required="false" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Update <title> elements based on EPUB metadata</h2> <p px:role="desc" xml:space="preserve">Whether to update <title> elements of content documents based on metadata in the package document. If there is one or more [`dc:title`](https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dctitle) element, use the value of the first one to create a `<title>` element in all content documents (overwriting any existing elements with the same name). If the "Metadata" option is used to inject new metadata into the EPUB, the resulting metadata is used to generate the attributes.</p> </p:documentation> </p:option> <p:option name="ensure-pagenum-text" required="false" select="'false'"> <p:pipeinfo> <px:type> <choice xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"> <value>true</value> <a:documentation xml:lang="en">Yes</a:documentation> <value>false</value> <a:documentation xml:lang="en">No</a:documentation> <value>hidden</value> <a:documentation xml:lang="en">Yes, but not visible</a:documentation> </choice> </px:type> </p:pipeinfo> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Ensure text content for page numbers</h2> <p px:role="desc" xml:space="preserve">Whether to fix empty page number elements. Page number elements (elements with a `doc-pagebreak` `role` or `pagebreak` `epub:type`) that have no child text node can be given one. The text can be generated based on - the element's `aria-label` attribute, - the element's `title` attribute, or - the text used by the corresponding page link in the navigation document. These options are tried in the listed order. If none of the attributes exist, and the page is linked from the navigation document, no text is generated.</p> </p:documentation> </p:option> <p:option name="ensure-section-headings" required="false" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Ensure headings for all sections</h2> <p px:role="desc" xml:space="preserve">Whether to generate a heading element for sections that don't have one. For sectioning elements that don't have a heading element, one can be created. The headings are generated based on the section element's [`aria-label`](https://www.w3.org/TR/wai-aria/#aria-label) attribute. If the `aria-label` attribute is not present, no heading element is generated. When an `aria-label` is used to generate a heading, it is replaced with a [`aria-labelledby`](https://www.w3.org/TR/wai-aria/#aria-labelledby) attribute that points to the new heading. The rank of the generated heading matches the depth of the corresponding TOC item in the navigation document.</p> </p:documentation> </p:option> <p:option name="braille" required="false" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Translate to braille</h2> <p px:role="desc">Whether to produce a braille rendition.</p> </p:documentation> </p:option> <p:option name="audio" required="false" select="'default'"> <p:pipeinfo> <px:type> <choice xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"> <value>true</value> <a:documentation xml:lang="en">Yes</a:documentation> <value>false</value> <a:documentation xml:lang="en">No</a:documentation> <value>default</value> <a:documentation xml:lang="en">If publication has no media overlays yet</a:documentation> </choice> </px:type> </p:pipeinfo> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Perform text-to-speech</h2> <p px:role="desc" xml:space="preserve">Whether to use a speech synthesizer to produce media overlays. This will remove any existing media overlays in the EPUB.</p> </p:documentation> </p:option> <p:option name="sentence-detection" required="false" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Perform sentence detection</h2> <p px:role="desc" xml:space="preserve">Whether to add markup (span elements) for sentences. This setting has no effect when text-to-speech is also enabled. In that case sentences are always marked up.</p> </p:documentation> </p:option> <p:option name="braille-translator" required="false" px:type="transform-query" select="'(translator:liblouis)'"> <p:documentation> <h2 px:role="name">Braille translator query</h2> </p:documentation> </p:option> <p:option name="stylesheet" select="''" required="false" px:type="anyURI" px:sequence="true" px:separator=" " px:media-type="text/css text/x-scss"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Style sheets</h2> <p px:role="desc" xml:space="preserve">A list of CSS/Sass style sheets to take into account. A list of CSS/Sass style sheets to take into account, both for braille transcription (if a braille rendition is requested), and for text-to-speech (if text-to-speech is enabled). Must be a space separated list of URIs, absolute or relative to the input. All style sheets are applied at once, but the order in which they are specified has an influence on the [cascading order](https://www.w3.org/TR/CSS2/cascade.html#cascading-order). If the "Apply author style sheets" option is enabled, [author style sheets](https://www.w3.org/TR/CSS2/cascade.html#cascade) will be taken into account and will take precedence over any style sheets specified through this option ([user style sheets](https://www.w3.org/TR/CSS2/cascade.html#cascade)). When generating the braille rendition, style sheets are interpreted according to [braille CSS](http://braillespecs.github.io/braille-css) rules. When performing text-to-speech, they are interpreted as [aural CSS](https://www.w3.org/TR/CSS2/aural.html). For info on how to use Sass (Syntactically Awesome StyleSheets) see the [Sass manual](http://sass-lang.com/documentation/file.SASS_REFERENCE.html). </p> </p:documentation> </p:option> <p:option name="stylesheet-parameters" select="'()'" required="false" px:type="stylesheet-parameters"><p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Style sheet parameters</h2> <p px:role="desc" xml:space="preserve">A list of parameters passed to the style sheets. Style sheets, whether they're user style sheets (specified with the "Style sheets" option) or author style sheets (associated with the source), may have parameters (Sass variables). This option, which takes a comma-separated list of key-value pairs enclosed in parenthesis, can be used to set these variables. For example, if a style sheet uses the Sass variable "foo": ~~~sass @if $foo { /* some style that should only be enabled when "foo" is truthy */ } ~~~ you can control that variable with the following parameters list: `(foo:true)`.</p> </p:documentation> </p:option> <p:option name="lexicon" select="p:system-property('d:org.daisy.pipeline.tts.default-lexicon')" required="false" px:type="anyURI" px:sequence="true" px:separator=" " px:media-type="application/pls+xml"><p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Lexicons</h2> <p px:role="desc" xml:space="preserve">A list of PLS lexicons to take into account. Must be a space separated list of URIs, absolute or relative to the input. Lexicons can also be attached to the source document, using a ['link' element](http://kb.daisy.org/publishing/docs/text-to-speech/pls.html#ex-07). PLS lexicons allow you to define custom pronunciations of words. It is meant to help TTS processors deal with ambiguous abbreviations and pronunciation of proper names. When a word is defined in a lexicon, the processor will use the provided pronunciation instead of the default rendering. The syntax of a PLS lexicon is defined in [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon), extended with regular expression matching. To enable regular expression matching, add the "regex" attribute, as follows: ~~~xml <lexicon xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" version="1.0" alphabet="ipa" xml:lang="en"> <lexeme regex="true"> <grapheme>([0-9]+)-([0-9]+)</grapheme> <alias>between $1 and $2</alias> </lexeme> </lexicon> ~~~ The regex feature works only with alias-based substitutions. The regex syntax used is that from [XQuery 1.0 and XPath 2.0](https://www.w3.org/TR/xpath-functions/#regex-syntax). Whether or not the regex attribute is set to "true", the grapheme matching can be made more accurate by specifying the "positive-lookahead" and "negative-lookahead" attributes: ~~~xml <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en"> <lexeme> <grapheme positive-lookahead="[ ]+is">SB</grapheme> <alias>somebody</alias> </lexeme> <lexeme> <grapheme>SB</grapheme> <alias>should be</alias> </lexeme> <lexeme xml:lang="fr"> <grapheme positive-lookahead="[ ]+[cC]ity">boston</grapheme> <phoneme>bɔstøn</phoneme> </lexeme> </lexicon> ~~~ Graphemes with "positive-lookahead" will match if the beginning of what follows matches the "position-lookahead" pattern. Graphemes with "negative-lookahead" will match if the beginning of what follows does not match the "negative-lookahead" pattern. The lookaheads are case-sensitive while the grapheme contents are not. The lexemes are matched in this order: 1. Graphemes with regex="false" come first, no matter if there is a lookahead or not; 2. then come graphemes with regex="true" and no lookahead; 3. then graphemes with regex="true" and one or two lookaheads. Within these categories, lexemes are matched in the same order as they appear in the lexicons.</p> </p:documentation> </p:option> <p:option name="apply-document-specific-stylesheets" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Apply author CSS style sheets</h2> <p px:role="desc" xml:space="preserve">If this option is enabled, any CSS style sheets attached to the EPUB's content documents for media "embossed" or "speech" will be taken into account. The EPUB's content documents may contain CSS ([author style sheets](https://www.w3.org/TR/CSS2/cascade.html#cascade)) that apply to "embossed" or "[speech](https://www.w3.org/TR/CSS2/aural.html)" media. Style sheets can be associated with an HTML file in several ways: linked (using an 'xml-stylesheet' processing instruction or a 'link' element), embedded (using a 'style' element) and/or inlined (using 'style' attributes). Author style sheets take precedence over user style sheets (any CSS provided through the "Style sheets" option). For instance, if the EPUB already contains the rule `p { padding-left: 2; }`, and using this script the rule `p#docauthor { padding-left: 4; }` is provided, then the `padding-left` property will get the value `2` because that's what was defined in the EPUB, even though the provided CSS is more specific. </p> </p:documentation> </p:option> <p:option name="set-default-rendition-to-braille" px:type="boolean" select="'false'"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Set default rendition to braille.</h2> <p px:role="desc">Make the generated braille rendition the default rendition.</p> </p:documentation> </p:option> <p:input port="tts-config" primary="false" px:media-type="application/vnd.pipeline.tts-config+xml"><p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Text-to-speech configuration file</h2> <p px:role="desc" xml:space="preserve">Configuration file for text-to-speech. [More details on the configuration file format](http://daisy.github.io/pipeline/Get-Help/User-Guide/Text-To-Speech/).</p> </p:documentation> <p:inline><d:config/></p:inline> </p:input> <p:option name="sentence-class" required="false" select="''"> <p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Sentence class</h2> <p px:role="desc" xml:space="preserve">Class attribute to mark sentences with. When sentence detection is enabled, this option may be used to add a class attribute to the `span` elements that represent the sentences.</p> </p:documentation> </p:option> <p:option name="result" required="true" px:output="result" px:type="anyDirURI" px:media-type="text"> <p:documentation> <h2 px:role="name">Output EPUB 3</h2> <p xmlns="http://www.w3.org/1999/xhtml" px:role="desc">The output braille file.</p></p:documentation> </p:option> <p:option name="include-tts-log" select="p:system-property('d:org.daisy.pipeline.tts.log')" px:type="boolean"><p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">Enable TTS log</h2> <p px:role="desc" xml:space="preserve">Whether or not to make the TTS log available. The TTS log contains a great deal of additional information that is not present in the main job log and that is helpful for troubleshooting. Most of the log entries concern particular chunks of text of the input document. The default can be changed using the [`org.daisy.pipeline.tts.log`](http://daisy.github.io/pipeline/Get-Help/User-Guide/Text-To-Speech/#common-settings) property. </p> </p:documentation> </p:option> <p:output port="tts-log" sequence="true"><p:documentation xmlns="http://www.w3.org/1999/xhtml"> <h2 px:role="name">TTS log</h2> <p px:role="desc">Log file with information about text-to-speech process.</p> </p:documentation> <p:pipe step="convert" port="tts-log"/> </p:output> <p:serialization port="tts-log" indent="true" omit-xml-declaration="false"/> <p:import href="http://www.daisy.org/pipeline/modules/epub-utils/library.xpl"> <p:documentation> px:epub-load </p:documentation> </p:import> <p:import href="epub3-to-epub3.convert.xpl"> <p:documentation> px:epub3-to-epub3 </p:documentation> </p:import> <p:import href="http://www.daisy.org/pipeline/modules/fileset-utils/library.xpl"> <p:documentation> px:fileset-store px:fileset-delete </p:documentation> </p:import> <px:epub-load version="3" store-to-disk="true" name="load" px:progress="0.1" px:message="Loading EPUB"> <p:with-option name="href" select="$source"/> <p:with-option name="temp-dir" select="concat($temp-dir,'load/')"/> </px:epub-load> <px:epub3-to-epub3 name="convert" px:progress="0.8"> <p:input port="source.in-memory"> <p:pipe step="load" port="result.in-memory"/> </p:input> <p:with-option name="result-base" select="concat($result,'/',replace(replace($source,'(\.epub|/mimetype)$',''),'^.*/([^/]+)$','$1'),'.epub!/')"/> <p:input port="metadata"> <p:pipe port="metadata" step="main"/> </p:input> <p:with-option name="braille-translator" select="$braille-translator"/> <p:with-option name="stylesheet" select="string-join( for $s in tokenize($stylesheet,'\s+')[not(.='')] return resolve-uri($s,$source), ' ')"/> <p:with-option name="stylesheet-parameters" select="$stylesheet-parameters"/> <p:with-option name="lexicon" select="for $l in tokenize($lexicon,'\s+')[not(.='')] return resolve-uri($l,$source)"/> <p:with-option name="apply-document-specific-stylesheets" select="$apply-document-specific-stylesheets"/> <p:with-option name="set-default-rendition-to-braille" select="$set-default-rendition-to-braille"/> <p:with-option name="braille" select="$braille"/> <p:with-option name="tts" select="$audio"/> <p:with-option name="sentence-detection" select="$sentence-detection"/> <p:with-option name="update-lang-attributes" select="$update-lang-attributes"/> <p:with-option name="update-identifier-in-content-docs" select="$update-identifier-in-content-docs"/> <p:with-option name="update-title-in-content-docs" select="$update-title-in-content-docs"/> <p:with-option name="ensure-pagenum-text" select="$ensure-pagenum-text"/> <p:with-option name="ensure-section-headings" select="$ensure-section-headings"/> <p:with-option name="sentence-class" select="$sentence-class"/> <p:with-option name="include-tts-log" select="$include-tts-log"/> <p:input port="tts-config"> <p:pipe step="main" port="tts-config"/> </p:input> <p:with-option name="temp-dir" select="concat($temp-dir,'convert/')"/> </px:epub3-to-epub3> <px:fileset-store name="store" px:progress="0.1" px:message="Storing EPUB"> <p:input port="in-memory.in"> <p:pipe step="convert" port="result.in-memory"/> </p:input> </px:fileset-store> <px:fileset-delete cx:depends-on="store"> <p:input port="source"> <p:pipe step="convert" port="temp-audio-files"/> </p:input> </px:fileset-delete> </p:declare-step>