This page discusses the generation and use of "Table of Contents" (TOC) lists.
"Table of Contents" (TOC) lists can be very useful for navigation
within and among web pages.
There are many pages (and other documents)
that could benefit from generated (or simply improved) TOCs.
A TOC can be generated for a web page by harvesting the section headers.
For example, the TOC at the top of this page was created by the wiki engine.
In a similar manner, it should be possible to generate TOCs
pages that we generate (e.g., from code listings),
retrieve from the web, or extract from EPUB
EPUB documents may contain a variable set of information sources,
including the OPF file and one or both of the Nav and NCX files.
Each file has a different format and the reported information
may well be conflicting and/or erroneous.
Some of the issues include:
- omitting mention of pages in lists, etc.
- using ordered lists for already numbered titles
- treating a "part" page as if it were a chapter
- treating sections as if they were chapters
The Navigation (Nav) file, defined in EPUB version 3, uses XHTML
So, it can be used directly as a TOC page or harvested for information.
It contains a list (or possibly a tree) of page titles and links.
The Navigation Control XML (NCX
) file was borrowed
from the Daisy Digital Talking Book (DTB
It was defined for EPUB version 2 and is not required for version 3 documents.
However, it is often included for compatibility with older reading software.
The contains a tree of
containing page titles and file paths.
The Open Packaging Format (OPF) file contains information
which can be used to generate a serial "reading list".
element is an unsorted collection of
each of which has
element is a sorted list of
each of which has an
element of a page file can be harvested,
it isn't always useful.
In some EPUB documents, it may be a fixed string or simply missing.
We'd like to integrate these sources into a more complete and polished TOC,
reconciling the harvested information along the way.
Our expected approach is to collect any available information,
then produce both a corrected, integrated, and reconciled result.
We currently harvest lists of hashes from the Nav, NCX, and OPF files,
using a consistent and convenient format:
Using this data, we generate TOC pages (Nav', NCX', and OPF'),
containing trees of links.
Given that the information sources are imperfect,
we have ample opportunity for ambiguity.
Here is a simple example of the diamond problem
- The OPF spine begins with items A, B1, and C.
- The Nav tree begins with items A, B2, and C.
- The NCX tree begins with items A, B3, and C.
- Items B1-B3 should all be between A and C.
- However, what order should they be in?
Because we don't see any clean and reliable way to merge the TOC entries,
we have decided to take an entirely different approach.
The main TOC is based on the best available data set,
using the preference order: Nav, NCX, OPF.
Any other items are listed as "extras".
Some EPUB documents take liberties in reporting their structure.
For example, Pragmatic Bookshelf's "Programming Ruby 1.9 & 2.0"
lists Part instances (and some Section instances) as if they were Chapters.
This isn't a problem, visually, but it distorts the hierarchy
and gets in the way of our attempts at providing navigation aids.
So, let's examine a possible remedy.
The first task is to determine whether the document has this issue.
is a list of hashes summarizing TOC items.
By scanning this list, we can locate any instances where items
with different levels in the structure are reported at the same level.
If we don't find any such instances, we're done!
Otherwise, we need to impose some rules, e.g.:
- Chapter, section, and subsection items are assumed to be subsumed under the preceding Part item. So, if need be, we increment their level numbers.
- Section and subsection items are assumed to be subsumed under the preceding Chapter item. So, if need be, we increment their level numbers.
- Any item (e.g., Appendix, Index) which is not one of the above terminates the range of the Part.
To be continued...
This wiki page is maintained by Rich Morin
an independent consultant specializing in software design, development, and documentation.
Please feel free to email
comments, inquiries, suggestions, etc!