EPUB

This page provides an overview of the EPUB document format and sketches out how AxAp might provide access to EPUB documents.

Overview

An EPUB file is basically a "web site in a can". Specifically, it's a renamed ZIP archive, containing a variety of web-related assets (e.g., CSS, HTML) and some associated metadata. For a gentle introduction, see Matt Garrish's excellent publications, published by O'Reilly:

For the nitty-gritty details, see the IDPF standards documents, e.g.:

We are currently analyzing some example EPUB documents we have on hand, attempting to discover issues that we'll need to address. For more information, see our Examples pages.

Approach

Here's a simplified guess at an approach:

  • obtain an EPUB document
  • import it, if necessary
  • provide it to a browser

Importing

The "import" step clearly deserves some clarification. Basically, we're using Git to retain immutable cached versions of the reference data (e.g., document, generated file tree). We optimize on processing time and storage space by assuming that identical documents (per Git) will produce identical file trees. Here's some pseudo-code:

  • commit the document to Git
  • if any new data blobs were created:
    • unpack a copy of the document
    • commit the file tree to Git
    • add auxiliary files for browsers
    • commit the file tree again
    • return the commit ID to Sinatra

Although the unpacked HTML files can be displayed by a web browser, there is no support for indexing or navigation. Some of this can be provided by means of added HTML files; alternatively, JSON files can be supplied for use by a client-side app.

Note: Although there are many differences between static documents and remote web sites, they can be handled in a similar manner, using cached file trees, Git, etc.

Unpacking

As a minor detail, we need to get a (renamed) copy of the archive and unpack it into a directory tree. Here is some BASH code that does the trick:

$ base='programming-ruby-1-9-2-0_p2_0'
$ mkdir -p ex_pr/archives
$ cd ex_pr/archives
$ cp .../$base.epub $base.zip
$ unzip -d ../files $base.zip
$ cd ../files


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r21 - 26 Oct 2016, RichMorin
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CFCL Wiki? Send feedback