Real-time Collaboration: architecture

The goal of this issue is to lay down the technical architecture to implement real-time collaboration in WP Adminadmin (and super admin) (Post and Site editor). See the following bootstrap post for more information on the scope and features of the project.

There are different use-cases we want to enable as outlined on the post but all of them come down to these two fundamental aspects:

  • Data needs to be shared and synchronized between users connected to the same WP-Admin.
  • If the networknetwork (versus site, blog) is disconnected, the data is persisted locally and synchronized back to the server the network is restored.

In addition to that I’d like to add another guiding principle to the technical solution we try to conceptualize: 

Ideally, developers working on the UIUI User interface of WP-Admin/editors shouldn’t have to think about whether the data is local/remote/synchronized/merged… Developers declare their “data requirements” and these requirements are fulfilled for them by a separate and automated layer. 

This principle is important for multiple reasons:

  • It frees developers from thinking about the synchronization and the collaboration for every new feature and piece of UI that they add. Additional data will be collaborative by default and offline-ready.
  • It is also important to ensure the backward compatibility of our existing public facing APIs to access and manipulate WordPress Data.

This is also the same guiding principle that we put in place when we initially developed the @wordpress/data package to address the local and remote data needs.

Current architecture

The following schema represents how the data flows in the site and post editor code-bases.

The architecture is separated into two sections: 

  • The UI layer: A developer working on this is typically writing components, declaring its data needs using selectors (like the `getEntityRecord` selector in the example above and is notified of data changes automatically. Developers can also make use of actions to perform mutations. (Editing posts, saving posts…).
  • The CoreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. Data layer: Responsible for addressing the data needs, retrieving the data from the server or locally, caching data if needed and notifying the consumers (UI layer) of any changes.

Proposal

So the goal here is to introduce data synchronization between remote peers (collaborators) while also persisting the data locally, without impacting the UI layer. To do so we can introduce a sync engine.

Prior art: Take a look at this resource if you want to read more about offline sync engines in SPAs and prior art (Figma, Linear…)

To understand better how such a sync engine would work, let’s take a look at a small example. First thing to note is that all the data that is rendered / shared / persisted can be represented as a simple list of documents / objects. So the role of the sync engine is to fetch / retrieve / persist locally any changes happening to these objects separately.

  • The user (UI components) asks for the post with id 1 by calling the `getEntityRecord` APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. of our core-data package. 
  • Internally, Core Data asks the sync engine to bootstrap a document of type post and its identifier is 1.
  • First, the sync engine creates a “document” in memory that will represent the source of truth.
  • The sync engine tries to load that document from the local database. If not found, it creates an empty database to persist any future changes of that particular document.
  • The sync engine then performs what we call the handshake where it connects to the remote peers, all the collaborators working on the same document (type post, identifier 1) and merges the local copy with the remote peers copy.
  • The sync engine also asynchronously triggers a fetch call to retrieve the document from the WordPress backend and refreshes the local copy or initializes it if there were no local copy or remote peers connected.
  • Finally, any changes that happen to the local document, regardless of the source of the change (user triggered change, loaded from the local database, change triggered by a remote peer) all trigger the same change handler in the core data package.
  • Once the change handler is called, the UI component is going to be notified (re-rendered).

Introducing Yjs

We can split the implementation of the proposed sync engine into the following steps:

  1. Introduce the observable documents objects: in-memory document objects with an API to make an update and to subscribe to changes.
  2. Support merging changes from multiple sources into the observable documents. Note that there are essentially two ways to synchronize changes: a conflictconflict A conflict occurs when a patch changes code that was modified after the patch was created. These patches are considered stale, and will require a refresh of the changes before it can be applied, or the conflicts will need to be resolved.-free replicated data type (CRDT) and operational transformation (OT). Previous explorations have shown that OT is too complex to implement on top of our existing architecture. 
  3. Loading and persisting document changes to a local database.
  4. A communication layer between all the users connected to the same WordPress Admin.  

Fortunately, there are some open sourceOpen Source Open Source denotes software for which the original source code is made freely available and may be redistributed and modified. Open Source **must be** delivered via a licensing model, see GPL. solutions that can help us implement the proposed sync engine and address most of these requirements. The most promising one that has been used in the previous explorations is Yjs.

Yjs is an implementation of CRDT. You can think of it as a data structure that you can use to represent the objects being synchronized. Once you use that special representation, Yjs offers adapters to address all the requirements above: observe changes, merge changes from different sources, persist into a local database and potentially communicate with other peers.

Q&A

While this library solves most of our requirements, explorations have shown that the devil is in the details. Writing sync engines is a challenging project and there are a lot of questions that need to be addressed.

What about the performance impact of the added observable objects and the back and forth transformations from our regular objects into their equivalent CRDT format?

The sync engine is most likely going to have a small performance impact even on local changes, it remains to be seen whether this impact is going to be a blockerblocker A bug which is so severe that it blocks a release. or not. We do have the tools in place (performance metrics) to help assess the question once everything is in place.

Yjs allows using WebRTC or WebSockets by default to synchronize documents between peers, what communication layer is the most adapted to WordPress?

As mentioned in the bootstrap post, this is one of the biggest challenges for us. The most performant transport layers rely on a centralized web socket server and a lot of PHPPHP The web scripting language in which WordPress is primarily architected. WordPress requires PHP 5.6.20 or higher hosting providers don’t have support for restful communication using web sockets. 

This means that we need an alternative communication layer that can work with any WordPress install by default while allowing the communication layer to be replaceable by plugins/hosts.

Can we use WebRTC as the default communication layer as it’s P2P and should work with all WordPress installs?

While WebRTC is indeed P2P, most existing implementations of WebRTC, including the default Yjs WebRTC adapter rely on a centralized signaling server. This is a very light server used to perform the handshake (for the peers to discover each other) and in general rely on web sockets.

Since we can’t rely on web sockets, there are three possibilities that we can explore:

  • Build a public signaling server (on .org infrastructure for instance).
  • Implement WebRTC signaling using long polling instead of web-sockets (supported in all WordPress instances) and a public stun server (there are existing public stun servers and .org infrastructure can also ship one if needed). This also involves providing a custom Yjs adapter to support the polling based signaling server.
  • Avoid WebRTC by default and use long polling to perform both the handshake and then data changes. In this case, no public server is needed but performance can suffer.

What should we also consider as part of project

  • Performance impact.
  • Memory footprint of the shared documents.
  • Storage footprint of the local database. (yjs stores the history of changes with some optimizations).
  • Security: Prevent access to peers without the necessary rights.
  • Undo/Redo.

Get involved!

We’re just getting started in this journey, there’s a number of open questions and If you are interested in the challenges or want to leave any feedback on the project, please chime in.

#gutenberg, #phase-3