Gutenberg posts aren’t HTML…

…and my eraser won’t work with my ink pen!

Lately I’ve been hearing plenty of conversations about the way that Gutenberg posts are saved inside of post_content. If this is all mumbo-jumbo to you, Gutenberg is the project building an entirely new editing experience for WordPress, the software which powers this blog as well as 28% of the internet. Gutenberg is an experimental, risky, and visionary rethinking of how people should be able to create content and structure a website. It’s a break from traditional editors as well as over a decade of tooling and plugins to support that.

Gutenberg also stores necessary information in HTML comments.

<!-- wp:dmsnell/demo { "format": "serializedHtml" } -->
What the heck is going on?
<!-- /wp:dmsnell/demo -->

Block Editing: playing with Legos

The prevailing idea behind Gutenberg is that if we build independent and useful website pieces then authors can put them together, rearrange them, experiment with them, and have an easier go at putting together a usable web page. These pieces might be paragraphs of formatted text, rich media types like images with captions or interactive JavaScript panels, they could indicate columns or grid-like layouts, or they be one of any number of different kinds of content within the page, each potentially with a distinct behavior and meaning.

Let’s take a step back. What is a Gutenberg post? It’s a collection of blocks. What isn’t a Gutenberg post? It’s not HTML. I mean, it is HTML, but it’s not HTML. Confusing, right? I think this is pretty generally confusing in the wild, so I want to try and elucidate what this means and what it implies for those working with the project (i.e. working with WordPress content editing).

Why not HTML?

HTML is the language of the web. It’s a robust document markup format and has been used to describe content as simple as unformatted paragraphs of text and as complex as entire application interfaces. It has done a fairly impressive job at enabling people to separate the visualization of their content from the inherent semantic meaning of it. That is to say, HTML has the power indicate that a segment of text should be emphasized, but it doesn’t imply how it should grab its attention: italicized fonts, blinking colors, animated arrows pointing at it, etc… (this detail is handled in stylesheets, possibly in combination with JavaScript).

But HTML sadly isn’t everything we’d want it to be when creating a website. HTML is mostly specific to formatting text and inlaying media. It’s also a low-level language; there are some pretty common constructions – such as an image with a caption – that require repetitions of similar HTML markup. Further, HTML has historically been an incredibly lenient language and a significant number of existing documents and tools deal with technically invalid or ambiguous code. This code, even when valid, can be incredibly tricky and complicated to parse – to understand.

Blocks do two things: blocks raise the level of abstraction from a single document to a collection of meaningful sub-documents; they also replace ambiguity with explicit structure.

Blocks are higher-level than HTML

Blocks are pieces of a web page. Even though the end result is HTML in a browser, a “block” connotes more meaning than the HTML it generates. For example, an image block intrinsically supports a caption. An image block is little more than an image and an optional caption. The meaning of that block is something like “the juxtaposition of an image with a caption explaining the image or providing context for it.” Note that this meaning is different than “the construction of <figure>, <img>, and <figcaption> elements.” The block has flexibility to change and an editor need only know that it needs these two attributes and their types.

A simple block could render in quite complicated ways. We can think of blocks as the evolution of shortcodes. We could imagine a map block which roughly means, “a map centered at a given address.” The generated HTML could load Google maps and the editor could either be a text input to enter an address or it could also be an actual interactive map where you can click on the location to save the address. The map block may only have a single property – its address – so an editor can be as simple or as complicated as necessary. The map block may also actually save an image generated at edit time as a loading placeholder until an associated JavaScript snippet replaces it dynamically after the page loads.

I hope you can start to see how blocks can contain an escalation of complexity which would otherwise be needed when shuffling around bits of HTML soup. We can strip away everything that’s incidental to the type of content and focus on the bare essentials. We can even easily create modal editing so that blocks render as they would on the webpage in the editor when editing other blocks.

Blocks are unambiguous

<table><td>Cats</td><td>Dogs</td></table>

In the above snippet, what is the purpose of the markup? Are we creating a tabular arrangement of data related to cats and dogs or are we trying to force a two-column visual layout? If we ignore the fact that this is a bad example (because we probably should be trying to get our two-column layouts with CSS instead) it does demonstrate how we have lost meaning when we move down to HTML alone. In fact, in many cases the same snippet of HTML could have been the result of several different “types” of content generating the same end markup: it’s a one-way street.

Additionally, how do we even know this came from our editor? Maybe someone snuck it in by hand when trying to quickly jump in and change the page. The structure of the higher-level meaning is implicit and indistinguishable from the same markup when entered manually.

{
    type: 'dmsnell/multi-column',
    children: [
        { type: 'core/text', children: 'Cats' },
        { type: 'core/text', children: 'Dogs' }
    ]
}

The above, of course, is unambiguously a multi-column region and explicitly designated so. We know that the editor wanted to create this too because it’s not even HTML.

Gutenberg is a tree

Thus, a Gutenberg post isn’t HTML, but a tree of objects and associated attributes. Gutenberg relies on a structure-preserving data model so that the editors and views for specific block types can remain independent from the final rendered HTML. It’s a tree similar to how HTML is a tree, though at the top-level it’s just a list of nodes – it needs no “root node.”

[
    {
        type: 'core/cover-image',
        attributes: {
            url: 'my-hero.jpg',
            align: 'full',
            hasParallax: false,
            hasBackgroundDim: true
        },
        children: [
            "Gutenberg posts aren't HTML"
        ]
    },
    {
        type: 'core/paragraph',
        children: [
            "Lately I've been hearing plen…"
        ]
    }
]

The specific data structure is slightly different than I’m showing here but I have only altered it in order to try and convey more clearly what is actually processing “under the hood.” Blocks can contain arbitrary attributes and children. Blocks can even contain other blocks as their children! Just imagine that two-column layout: one half might include a list of things while the other half explains the list as a paragraph – the list is a kind of block.

{
    type: 'dmsnell/multi-column',
    children: [
        {
            type: 'core/list',
            children: [
                'strong data model',
                'well-defined persistence'
            ]
        },
        'Gutenberg is an open book'
    ]
}

Easy peasy. In fact, we can see a couple of insights here: the multi-column layout is using the number of children to determine how many columns we need; “simple” child nodes can be a plain string. As a disclaimer, Gutenberg doesn’t officially support nesting blocks like this yet, but that’s purely because the user interface part of tackling that problem is much more difficult than the technical side of supporting it. The original data model supported it from before Gutenberg was even a project.

Gutenberg lives in memory

At this point you may be wanting to scream at me because it could seem like I’m lying to you. That tree isn’t what you see in post_content! Gutenberg’s data model however is something that lives in memory while editing a post. It’s not visible to the page viewer when rendered (though nothing is preventing someone from building a clever live-editor for logged-in page views).

Early on in the project’s development we asked ourselves a question: “can we build a new editor which lets us do things we were never able to do before without breaking the internet?” It became clear through many long discussions that inasmuch as possible we should be able to create and edit something in Gutenberg while being able to view it in any HTML renderer. We settled on the idea of having a kind of “fallback render” which would be a fully rendered HTML document produced by the editor which supplemented “the real” document (the tree).

There were many different ideas presented for consideration. A variety of propositions dealt with storing the tree as JSON somewhere: in post_meta, in a new database table, at the top of the document’s post_content, at the bottom, via a REST endpoint, and others. In common to all was the desire to reliable store and load the object tree when editing. Two primary drawbacks of storing the tree separately from the fallback render, however, were the risk of the post_content and the tree getting out of sync and the duplication of data in both places.

Long story short (and I encourage you to scour the discussions in Slack and in the forums and in GitHub issues): we decided to take a hybrid approach, serializing the tree as HTML and using HTML comments as explicit block delimiters which could contain the attributes in non-HTML form and children as we would with normal HTML.

Gutenberg is parsed and serialized for persistence

We didn’t want to mess this up. Comments are a bizarre mechanism for storing structured data. Thankfully, storing structured data in text form is something computer programmers have been doing successfully for decades and we can lean on all the research that has gone in to making this viable.

A formal grammar defines how the serialized representation of a Gutenberg post should be loaded just as a some basic rules define how to turn the tree into an HTML-like string. Gutenberg posts aren’t designed to be edited by hand; they aren’t designed to be edited as HTML documents; Gutenberg posts aren’t HTML. They just happen, incidentally, to be stored inside of post_content in a way in which they require no transformation in order to be viewable by any legacy system. It’s possible loading the stored HTML into a browser without the corresponding machinery that the experience will degrade: dynamic elements may not load, server-generated content may not appear, interactive content may remain static. However, it at least protects against viewing Gutenberg posts on themes and installations which are Gutenberg-unaware.

That is to say, if we are trying to edit a Gutenberg post by manipulating its generated HTML then we aren’t understanding Gutenberg’s data model or how it works. Sure, we might get away with a number of reasonable simple edits inside the HTML directly, but in reality we ought to consider the generated HTML as throwaway code. There’s nothing preventing a block editor from completely wiping out the existing block code and replacing with a new format. In fact, we expect this to happen as blocks are updated to keep up with web trends. For example, we might create a type of block for embedding a 3D simulation and today it might generate an <iframe> due to a lack of better native HTML support, but next year the new <simulation3d> tag might just be available and the block can “upgrade” the webpage without anyone needing to mangle HTML code.

Thus the workflow for editing a Gutenberg post starts with taking the persisted version of the document and generating the in-memory tree. It ends with the reverse: serialization into post_content.

HTML comments are explicit delimiters

There are a few plausible ways to store structural information alongside renderable HTML. If we created a new syntax and associated filter then our posts would no longer render on older versions of WordPress or any which lack the necessary filter. We chose instead to try and find a way to keep the formality and explicitness and unambiguity into the existing HTML syntax. Within the HTML there are a number of options:

Of these options a novel approach was suggested that by storing data in HTML comments we would know that we wouldn’t break the rest of the HTML in the document, that browsers should ignore it, and that we could simplify our approach to parsing the document.

Data attributes are another feasible way of storing information but there was some concern that tools (especially older tools) would strip them out and that we would need a full HTML parser in order to properly find them.

Unique to comments is that they cannot legitimately exist in ambiguous places, such as inside of HTML attributes – <img alt='data-id="14"'>. Comments are also quite permissive. Whereas HTML attributes are complicated to parse properly, comments are quite easily described by a leading <!-- followed by anything except -- until the first -->. This simplicity and permisiveness means that the parser can be implemented in several ways without needing to understand HTML properly and we have the liberty to use more convenient syntax inside of the comment – we only need to escape double-hyphen sequences. We take advantage of this in how we store block attributes: JSON literals inside the comment:

<!-- wp:core/code {
    "language": "haskell",
    "indent": [ "\t", 1 ]
} -->
<code><pre>
sum :: Foldable t, Num a => t a -> a
sum = foldl (+) 0
</pre></code>
<!-- /wp:core/code -->

After running this through the parser we’re left with a simple object we can manipulate idiomatically and we don’t have to worry about escaping or unescaping the data. It’s handled for us through the serialization process.

Because the comments are so different than other HTML tags and because we can perform a first-pass to extract the top-level blocks, we don’t actually depend on having fully-valid HTML. We frankly don’t care; and that has dramatic implications for how simple and performant we can make our parser. These explicit boundaries also protect damage in a single block from bleeding into other blocks or tarnishing the entire document. There are drawbacks too, of course, as the block boundaries become points of weakness. However, Gutenberg posts are designed to be edited in a Gutenberg editor (or compliant editor). It is simply not possible to be fully backwards-compatible and do things which the old editors could never do.

Recap

In summary, a Gutenberg post is built upon an in-memory data structure which gets persisted somehow in an fully-isomorphic way. Right now that persistence is via a serialization/parser pair but could just as easily be replaced through a plugin to store the data structure as a JSON blob somewhere else.

There was no clearly dominant solution for the problem of storing the Gutenberg data structure in a way which preserved backwards-compatability.

The views in this post aren’t all mine and there are likely inaccuracies contained within. I have tried to blend my own opinions with a terse summary of the project discussions I was aware of. I’m not completely persuaded that we have chosen the best option (I doubt anyone is) but I do believe that we have a solution that has been working well despite being unexpected.

If we had decided to store Gutenberg posts in a custom binary format it would have been obvious that they needed to be handled in a custom editor. As it turns out we found a mechanism which is HTML-compliant and can provide a default render without any strings attached. Maybe on account of the familiarity of this storage format it gives the impression that it is something other than what it is, but Gutenberg posts are not HTML.


Please don’t hesitate to point out the mistakes I have made here or to uncover the major flaws in the design. Please though, do it nicely. It is understandable if you missed the development discussions, debates, and designs when they happened, but they were carried out in the open and they are still publically available.

I know that for several people who have been very active in contributing to the project that it can be tiring dealing with the same critiques over and over again. In fact, most of the critiques running around were probably initially raised by the very same people who are defending the choices that were made. Personally I have tugged at both ends of this proverbial rope.

As always, the very best way to communicate a better software idea is to present a working and clean implementation of the idea.

34 thoughts on “Gutenberg posts aren’t HTML…

  1. Reblogged this on .

  2. Great post, thank you for sharing how Gutenberg is going to work behind the scenes. I must admit it’s pretty neat how using HTML comments enables backward compatibility for the existing posts and pages. It is even more interesting that this approach allows to iterate on blocks without risking that individual block is going to break whenever its implementation changes. I guess it will still require to provide stable block API, in effect the param names will have to remain unchanged once they were exposed to the public.

    Could you confirm if I understand correctly how all this parsing stuff works. This is how I got it. When you use block for the first time it saves all properties with data and options as serialized JSON in comment and outputs generated HTML. When it is loaded again by Guttenberg then only the comment is parsed and all existing properties are used to generate block from scratch and HTML part is skipped. When post is displayed then only HTML is consumed.

    1. Not all properties are saved as JSON in the comment, some are extracted from the inner HTML. The block’s author decides how to serialize/parse each attribute. (source property here http://gutenberg-devdoc.surge.sh/block-api/#attributes)

      block authors could decide to store all the attributes in the comment.

    2. Thanks for the reply Grzegorz and the response Riad!

      I guess it will still require to provide stable block API, in effect the param names will have to remain unchanged once they were exposed to the public.

      Not entirely. I expect more widely used blocks to adopt versioning and update schemes. A block could store its version number in the attributes and maintain a function to update from version X to version Y. Best of course is a stable API, but it’s not required.

      only the comment is parsed and all existing properties are used to generate block from scratch and HTML part is skipped

      Many attributes are currently serialized into HTML and inferred from its structure. For example, currently an image block infers the caption by means of finding the inside of the <figcaption> tag. It doesn’t have to, but it’s doing so right now to prevent duplication of data. So many blocks have chosen to store attributes which closely match the HTML-level of abstraction in the HTML itself, so to speak, even though in memory that’s still a tree representing HTML nodes; the higher-level attributes are stored as JSON.

      So actually it’s still depending on the HTML inside the block comments in a per-block manner and there is a query library built in to provide easy inference from the HTML structure.

      Skipping the HTML entirely is clearly safer from breakage. We have actually discussed using the structural diff of the HTML section to detect if someone or something has messed with the block, determine if it breaks the block, and provide an undo action for the user to revert the accidental or uninformed change. This is still stuff in the pipeline that hasn’t been implemented yet but it was part of the original vision. I’ve been a big proponent for treating the HTML entirely as “write-only” as you describe, but for practical purposes most blocks are also reading from it. In some, such as the code block or paragraph block, it’s hard to justify not simply treating all the contained content as the actual data itself.

      1. Kudos for you both for in-depth explanation on this topic 🙂

        Not entirely. I expect more widely used blocks to adopt versioning and update schemes.

        Yes, it makes perfect sense.

        I’ve been a big proponent for treating the HTML entirely as “write-only” as you describe, but for practical purposes most blocks are also reading from it. In some, such as the code block or paragraph block, it’s hard to justify not simply treating all the contained content as the actual data itself.

        I see what has happened and I think it is a good compromise in case when there is an easy way to get data from HTML using the well-established patterns created by jQuery or XSLT.

  3. Excellent post Dennis, I will use some of your ideas in a speak. Thanks!

  4. Gutenberg posts aren’t designed to be edited by hand; they aren’t designed to be edited as HTML documents

    The problem I have with this thinking is that if true, then text mode and thirdparty editors aren’t really supported at all, they’re just “tolerated”. And if the goal is just to tolerate third party tools, that could be done with a live-rendered shortcode structure that would satisfy their existing knowledge of wordpress syntax. Meanwhile the actual data could be stored in a universally recognized storage standard, like JSON.

    1. Thanks for the note. It’s my impression that “tolerate” is a reasonable word. How can we do new things that the old editing experience couldn’t provide (or at best made quite difficult)? Can we do it without entirely breaking old flows? Should we break old flows?

      the actual data could be stored in a universally recognized storage standard, like JSON

      Having an all-JSON structure is something I personally like. Having a formal grammar+parser+printer means we can have this structure be “the real” data structure while storing it in a way that doesn’t break existing themes and doesn’t require processing on render. Admittedly, most of our blocks are not only storing block meta in attributes but also in a hybrid way with HTML strings; I’d say we can expect continued iteration on how that’s done and finding what’s appropriate given all the tradeoffs.

      From early on using JSON was discussed; I don’t want to rehash all of the conversations leading up to why the current implementation was chosen, but I’d be happy to share my thoughts on specific points. Caveat emptor, I’m just one of many contributors and this was all done in the open. Feel free to ping me @dmsnell on #core-editor in the WordPress Slack team.

      that could be done with a live-rendered shortcode structure that would satisfy their existing knowledge of wordpress syntax

      This is a good point to raise. Shortcodes are familiar but they are also quite problematic. Some of the same issues you have raised apply at least as equally if not more to shortcodes: they have an ill-defined and ambiguous specification; they are trapped unreliably through RegExps which are attempting to parse HTML; they produce documents which can’t be displayed outside of a WordPress context.

      A stronger contrast to HTML comments as delimiters is the use of <div> tags as delimiters. While offering strong advantages that they live within the HTML world and within the HTML language, they carry with them one strong drawback: adding “container-like” <div>s changes the semantic markup of a document while comments are semantically neutral.

      So the big question is, if we’re going to try and preserve some notion of backwards compatibility then how much is essential and in what contexts will people need them? Who is going to be wanting those? Are there deficiencies in the editing experience which have been the reason people generally jump back to the text editor instead of the visual editor?


      These discussions were carried out at length in the earlier days of the project. Some amount of breakage is essential in my opinion simply because we’re trying to do things that weren’t possible before. How much breakage and in what ways are the questions answered by HTML comments in post_content. PRs are gladly welcomed.

      1. and, of course, if a PEG parser is gonna be used, why not make one for shortcodes, instead?

    2. In an attempt to maintain the tenor of discussion here I have unapproved a comment tree discussing some merits and drawbacks of different approaches to growing Gutenberg and formalizing the storage format. This post is a place to hold technical discussions concerning the level of abstraction that Gutenberg posts are intended to represent but I want to try and prevent rehashing long discussions which are recorded openly in the project channels and more importantly to discourage overly-critical or demeaning behaviors.

      WordPress depends on our ability as a community to engage in civil discussions with respect for one another and in my subjective opinion I felt that the comments below this point crossed over an important boundary of disrespect; thus I have removed them.

      It is not my intention to hide or repress any specific arguments and so I will welcome resubmissions which maintain mutual respect and which also withheld from casting judgment on others and which also abstain from ascribing intentionality in others.

Leave a Reply

Discover more from Fluffy and Flakey

Subscribe now to keep reading and get access to the full archive.

Continue reading