Last tended on 14 May, 2021 (first created 8 May, 2021)

The proof of concept book list I made in opml (also see these additional remarks) currently has the following structure:

It follows the OPML 2 specification
It uses schema.org specifications w.r.t. ‘thing’, ‘creative work’, ‘collection’ and ‘book’ for outline elements and data attributes within them, with a few exceptions.

The file

  • A booklist file is in OPML format, and has a .opml file extension.
  • It opens with declaring it to be XML version 1.0 and utf-8 encoding.
  • It declares an XSL stylesheet, for which the URL is specified, which allows HTML rendering of the file. I think it’s important to package a opml to html parser with the booklist file, so that regardless of data structure, anyone can see what data is contained within it.
  • It declares OPML version 2.0

The HEADER section
In the HEADER section of the OPML file the following fields are used:

  • title: mandatory, the name of this booklist file, or of the owner’s main list of lists if this is a sublist
  • url: mandatory, the url of the booklist file meant in the title
  • dateCreated: date created, optional
  • dateModified: date modified, optional
  • ownerName: mandatory, name of the list owner
  • ownerId: the url of the owner, optional
  • ownerEmail: email address of the owner, optional

the OPML HEADER fields for expansion state, vertical scroll state, and for window location are not used (and ignored by the included XSL parser if present).

The BODY section

The body section contains one or more outline elements, with a number of attributes. Each attribute can exist only once within an outline element.

  • type=”collection” : At least one is needed. A collection is a single booklist. With the following data attributes, which are all strings:
    • text: mandatory, the name of the booklist
    • author: the name of the creator of the booklist, expected
    • url: the URL of the collection, if it has its own URL, optional if the current file outlines books within the collection
    • comment: a brief description of the list, optional
  • type=”book”: A book is always part of a collection. If a collection has its own URL attribute (different from the url of the current file), it does not need to have any book within the file where the collection is listed. If a collection does not have its own URL attribute (or is the current file’s url), it is expected have at least one book (otherwise it’s simply an empty collection). With the following data attributes:
    • text: mandatory, a string “[title of book] by [name of author(s)/editor]
    • name: mandatory, the title of the book
    • author: mandatory, the name of the author(s) or editor of the book
    • isbn: the ISBN number of the book, optional
    • comment: a short comment by the booklist owner about the inclusion of the book in the list, optional
    • url: an url for the book itself, optional
    • authorurl: the url to the website of the book’s author. This attribute is not listed as part of schema.org. Optional
    • referencelisturl: the url of a list by a different owner, where this list’s owner found the book. This attribute is not listed as part of schema.org. Optional.
    • referenceurl: the url of a posting or a person’s url that served as recommendation or motivation for the inclusion of the book in this list by its owner. This attribute is not listed as part of schema.org. Optional.
    • inLanguage: the language in which the book is written as ISO-639(-1/2/3) code, optional
    • category: a list of tags, comma separated, optional
  • type=”rss”: a booklist opml file can point to one or more RSS feeds, optional. Multiple rss-type nodes can be grouped together nested in a typeless outline node with only a text attribute for the name of the group. Not a node within a ‘collection’, not a sub node of a ‘book’. E.g. the book reviews site and feed of someone. These feeds are not booklists or collections but content streams, to which the booklist file owner may want to point. With the following data attributes:
    • text: mandatory, the name of the feed
    • xmlUrl: mandatory, the url of the RSS feed
    • htmlUrl: the url of the website the RSS feed originates from, optional
    • author: the author of the RSS feed, optional. I use it mostly to mark my own feeds in the XSL style sheet, so I can display it differently than feed I myself subscribe to
  • type=”include”: points to an OPML file, preferrably a booklist file, that then should be included at this point in this booklist file. In booklists files only to be used at the top level, not as sub node in a ‘collection’ or ‘book’. Optional, and at this point only foreseen, not implemented. With the following attributes:
    • text: mandatory, descrption or title of the file to be included. This is what is shown in outliners and html renderings.
    • url: mandatory, the link to the opml file to be included, the linked file must be an .opml file.

20 reactions on “Booklist OPML Data Structure

  1. Saw it, will use it, thank you—and know OPML from waaaaay back in the RSS wars… I mean days… / I saw that Dave W. already jumped into the mentions too. 🙂


  2. @ton I’m a bit confused / unclear on what your workflow / sources / tools / process / product(s) / goal(s) are here.Yes, Pandoc is a conversion tool, which converts to and from numerous document-ish formats. It is often used with Markdown as an input format, but by no means exclusively, and supports numerous other light-weight and heavy-weight markup languages, as well as numerous application-specific formats, for both input and output.(Input and output formats vary somewhat, be aware.)Pandoc is immensely flexible and powerful, and I strongly recommend reading at least an overview. The project documentation is excellent.Pandoc is not what you use to author or create a document, but once you’ve created one through the editor (or process) of your choice, you can then flexibly convert between formats with Pandoc.If you’re looking to outline directly from an existing source (e.g., notetaking from a book or article), could you specify that a bit more clearly?Otherwise, I’m assuming you’re using an editor to create an outline and can annotate that with the specific attributes you’d like to output. Again, depending on the input markup used, you should have quite extensive flexibility, and Pandoc itself can be further extended through templates, preprocessors, and postprocessors.I’m familiar with Linux / Unix and its toolchains, mostly command-line / terminal-based tools, and prefer those. If you prefer other tools, please specify. If you want specific suggestions on the Linuxy / Unixy side, I can make suggestions or discuss options.@edsu

Comments are closed.