WordPress.org

Welcome!

The WordPress coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. development team builds WordPress! Follow this site for general updates, status reports, and the occasional code debate. There’s lots of ways to contribute:

Found a bugbug A bug is an error or unexpected result. Performance improvements, code optimization, and are considered enhancements, not defects. After feature freeze, only bugs are dealt with, with regressions (adverse changes from the previous version) being the highest priority.? Create a ticket in the bug tracker.

Want to contribute? Get started quickly with tickets marked as good first bugs for new contributors or join a bug scrub. There’s more on the reports page, like patches needing testing, and on feature projects page.

Other questions? Here is a detailed handbook for contributors, complete with tutorials.

The goal of this issue is to lay down the technical architecture to implement real-time collaboration in WP Adminadmin (and super admin) (Post and Site editor). See the following bootstrap post for more information on the scope and features of the project.

There are different use-cases we want to enable as outlined on the post but all of them come down to these two fundamental aspects:

Data needs to be shared and synchronized between users connected to the same WP-Admin.
If the networknetwork (versus site, blog) is disconnected, the data is persisted locally and synchronized back to the server the network is restored.

In addition to that I’d like to add another guiding principle to the technical solution we try to conceptualize:

Ideally, developers working on the UIUI User interface of WP-Admin/editors shouldn’t have to think about whether the data is local/remote/synchronized/merged… Developers declare their “data requirements” and these requirements are fulfilled for them by a separate and automated layer.

This principle is important for multiple reasons:

It frees developers from thinking about the synchronization and the collaboration for every new feature and piece of UI that they add. Additional data will be collaborative by default and offline-ready.
It is also important to ensure the backward compatibility of our existing public facing APIs to access and manipulate WordPress Data.

This is also the same guiding principle that we put in place when we initially developed the @wordpress/data package to address the local and remote data needs.

Current architecture

The following schema represents how the data flows in the site and post editor code-bases.

The architecture is separated into two sections:

The UI layer: A developer working on this is typically writing components, declaring its data needs using selectors (like the `getEntityRecord` selector in the example above and is notified of data changes automatically. Developers can also make use of actions to perform mutations. (Editing posts, saving posts…).
The CoreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. Data layer: Responsible for addressing the data needs, retrieving the data from the server or locally, caching data if needed and notifying the consumers (UI layer) of any changes.

Proposal

So the goal here is to introduce data synchronization between remote peers (collaborators) while also persisting the data locally, without impacting the UI layer. To do so we can introduce a sync engine.

Prior art: Take a look at this resource if you want to read more about offline sync engines in SPAs and prior art (Figma, Linear…)

To understand better how such a sync engine would work, let’s take a look at a small example. First thing to note is that all the data that is rendered / shared / persisted can be represented as a simple list of documents / objects. So the role of the sync engine is to fetch / retrieve / persist locally any changes happening to these objects separately.

The user (UI components) asks for the post with id 1 by calling the `getEntityRecord` APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. of our core-data package.
Internally, Core Data asks the sync engine to bootstrap a document of type post and its identifier is 1.
First, the sync engine creates a “document” in memory that will represent the source of truth.
The sync engine tries to load that document from the local database. If not found, it creates an empty database to persist any future changes of that particular document.
The sync engine then performs what we call the handshake where it connects to the remote peers, all the collaborators working on the same document (type post, identifier 1) and merges the local copy with the remote peers copy.
The sync engine also asynchronously triggers a fetch call to retrieve the document from the WordPress backend and refreshes the local copy or initializes it if there were no local copy or remote peers connected.
Finally, any changes that happen to the local document, regardless of the source of the change (user triggered change, loaded from the local database, change triggered by a remote peer) all trigger the same change handler in the core data package.
Once the change handler is called, the UI component is going to be notified (re-rendered).

Introducing Yjs

We can split the implementation of the proposed sync engine into the following steps:

Introduce the observable documents objects: in-memory document objects with an API to make an update and to subscribe to changes.
Support merging changes from multiple sources into the observable documents. Note that there are essentially two ways to synchronize changes: a conflictconflict A conflict occurs when a patch changes code that was modified after the patch was created. These patches are considered stale, and will require a refresh of the changes before it can be applied, or the conflicts will need to be resolved.-free replicated data type (CRDT) and operational transformation (OT). Previous explorations have shown that OT is too complex to implement on top of our existing architecture.
Loading and persisting document changes to a local database.
A communication layer between all the users connected to the same WordPress Admin.

Fortunately, there are some open sourceOpen Source Open Source denotes software for which the original source code is made freely available and may be redistributed and modified. Open Source **must be** delivered via a licensing model, see GPL. solutions that can help us implement the proposed sync engine and address most of these requirements. The most promising one that has been used in the previous explorations is Yjs.

Yjs is an implementation of CRDT. You can think of it as a data structure that you can use to represent the objects being synchronized. Once you use that special representation, Yjs offers adapters to address all the requirements above: observe changes, merge changes from different sources, persist into a local database and potentially communicate with other peers.

Q&A

While this library solves most of our requirements, explorations have shown that the devil is in the details. Writing sync engines is a challenging project and there are a lot of questions that need to be addressed.

What about the performance impact of the added observable objects and the back and forth transformations from our regular objects into their equivalent CRDT format?

The sync engine is most likely going to have a small performance impact even on local changes, it remains to be seen whether this impact is going to be a blockerblocker A bug which is so severe that it blocks a release. or not. We do have the tools in place (performance metrics) to help assess the question once everything is in place.

Yjs allows using WebRTC or WebSockets by default to synchronize documents between peers, what communication layer is the most adapted to WordPress?

As mentioned in the bootstrap post, this is one of the biggest challenges for us. The most performant transport layers rely on a centralized web socket server and a lot of PHPPHP The web scripting language in which WordPress is primarily architected. WordPress requires PHP 5.6.20 or higher hosting providers don’t have support for restful communication using web sockets.

This means that we need an alternative communication layer that can work with any WordPress install by default while allowing the communication layer to be replaceable by plugins/hosts.

Can we use WebRTC as the default communication layer as it’s P2P and should work with all WordPress installs?

While WebRTC is indeed P2P, most existing implementations of WebRTC, including the default Yjs WebRTC adapter rely on a centralized signaling server. This is a very light server used to perform the handshake (for the peers to discover each other) and in general rely on web sockets.

Since we can’t rely on web sockets, there are three possibilities that we can explore:

Build a public signaling server (on .org infrastructure for instance).
Implement WebRTC signaling using long polling instead of web-sockets (supported in all WordPress instances) and a public stun server (there are existing public stun servers and .org infrastructure can also ship one if needed). This also involves providing a custom Yjs adapter to support the polling based signaling server.
Avoid WebRTC by default and use long polling to perform both the handshake and then data changes. In this case, no public server is needed but performance can suffer.

What should we also consider as part of project

Performance impact.
Memory footprint of the shared documents.
Storage footprint of the local database. (yjs stores the history of changes with some optimizations).
Security: Prevent access to peers without the necessary rights.
Undo/Redo.

Get involved!

We’re just getting started in this journey, there’s a number of open questions and If you are interested in the challenges or want to leave any feedback on the project, please chime in.

#gutenberg, #phase-3

Pascal Birchler 10:49 am on July 13, 2023

If anyone’s wondering about those abbreviations:

CRDT: Conflictconflict A conflict occurs when a patch changes code that was modified after the patch was created. These patches are considered stale, and will require a refresh of the changes before it can be applied, or the conflicts will need to be resolved.-Free Replicated Data Types
OT: Operational Transformation
- Francesca Marano 10:55 am on July 13, 2023
  
  Thank you!
- Riad Benguella 11:00 am on July 13, 2023
  
  Thanks for the clarifications @swissspidy I’ve updated the post with links to the corresponding wikipedia pages.
Damon Cook 2:30 pm on July 13, 2023

This is a fascinating read. A bit out of my skill set, but glad to see it shared, and I commend those endeavoring to chip away at a scaleable solution. ❤️
Phil Johnston 3:19 pm on July 13, 2023

This initiative looks really cool, and also like it’s pretty complex! If you had to estimate, how much complexity does this add to existing blockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience.-editor complexity? Is it 2x? Less than that? More?

Is there any way for me to understand how this decision was arrived at for the GutenbergGutenberg The Gutenberg project is the new Editor Interface for WordPress. The editor improves the process and experience of creating new content, making writing rich content much simpler. It uses ‘blocks’ to add richness rather than shortcodes, custom HTML etc. https://wordpress.org/gutenberg/ roadmap? Is there research or data that was done which could be shared to help me understand why this is the next most important thing for coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress.?

Does that research suggest that the work and complexity increase required by this initiative will “pay off” in some way?

For example, did it increase the time users spent using WP? Did it reduce the amount of churn/abandonment of WP by some amount?

Are there a list of outcomes this initiative hopes to achieve outside of technical ones? For example, a technical goal/outcome might be “people can edit collaboratively with WP”. But a non-technical goal/outcome might be “WP users choose WordPress over Google Docs 60% more often”.

Is there any type of research like that which could be shared with the community to help us get on board more easily?
- Riad Benguella 8:58 pm on July 13, 2023
  
  > This initiative looks really cool, and also like it’s pretty complex! If you had to estimate, how much complexity does this add to existing blockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience.-editor complexity? Is it 2x? Less than that? More?
  
  It’s not easy to put a number here, in terms of architectural complexity of the data flows. The goal of the proposal is to actually limit the complexity by moving it into a low level package that is independent of any business/UIUI User interface logic.
  
  The devil here is going to lie in the details though, it’s probably not going to be too complex to have an initial version but making it performance / scalable / secure is going to take some time and only experimentation and trial can give us the correct answers here.
  Is there any way for me to understand how this decision was arrived at for the GutenbergGutenberg The Gutenberg project is the new Editor Interface for WordPress. The editor improves the process and experience of creating new content, making writing rich content much simpler. It uses ‘blocks’ to add richness rather than shortcodes, custom HTML etc. https://wordpress.org/gutenberg/ roadmap?
  
  As for the research, I don’t know much. This is one of the phases that was set in place as part of the 4 phases of Gutenberg project initially. I personally don’t have all the answers here but what I can say is that it came up often in the feedback about Gutenberg and the need to avoid the Google Doc –> paste workflow that a lot of people rely on.
- Matias Ventura 8:09 am on July 14, 2023
  
  From previous explorations, it could be a very self-contained layer. There’s also some benefits in restructuring a few lower level components, like undo/redo stack, to be easier to reason about across blocks. Generally, some of the collaboration concerns can enforce some discipline in the data flows that can help make the system overall more robust — for example, offline support without involving multiple concurrent users.
  
  Good questions about the overall vision! Do these posts help clarify some of the intent and goals? Real-time is one aspect of the overall “multi-player” goal established when GutenbergGutenberg The Gutenberg project is the new Editor Interface for WordPress. The editor improves the process and experience of creating new content, making writing rich content much simpler. It uses ‘blocks’ to add richness rather than shortcodes, custom HTML etc. https://wordpress.org/gutenberg/ was kicked off. There are many levels to it — WordPress sites tend to have multiple people in various roles; the ability to collaborate on the web has become sort of a table stakes proposition for modern web software; workflow demands generally involve multiple people doing different tasks over the same post or area on a site, which is heavily constrained right now; a lot of user feedback points that people keep their work outside of WP until the very end if they know they will need feedback from others (for example, in Google Docs).