Gathering WordPress performance data in the field

TL;DR: This Colab provides a comprehensive introduction on how to query HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. Archive and CrUX for WordPress performance data with BigQuery.

While profiling and benchmarking during development helps assess the potential performance impact of a change, it should be noted that such assessments are typically only based on synthetic test sites in a development environment, not real WordPress production sites accessed by lots of different end users. In other words, profiling and benchmarking typically happens in what can be referred to as a “lab environment”, or “in the lab”. To assess the actual performance impact of a change and validate the lab assessment, it is imperative to also conduct a performance analysis of real users accessing WordPress sites running the logic, i.e. “in the field”.

Obtaining field data comes with its own set of challenges. Most importantly, since almost no developer has access to lots of different WordPress sites with millions of users, it is impossible to conduct real A/B tests like we can do when profiling or benchmarking. Instead, we have to work with extremely large datasets of real WordPress sites and their performance metrics, which we can segment in certain ways to get an impression of the performance impact of a change. It is not always possible to come to a reliable conclusion in this process, as every metric is affected by many different aspects of a site. But by using large datasets of millions of sites, it is often possible to correlate certain functionality with certain performance trends and thus draw conclusions about its impact.

A publicly available dataset with performance data on millions of sites is the Chrome User Experience Report (CrUX), which can be queried using BigQuery and Google SQL. Joined together with the public HTTP Archive dataset, the data can be segmented granularly, e.g. to only include WordPress sites that use a certain version or a certain feature.

This Colab provides a comprehensive introduction on how to query HTTP Archive and CrUX for WordPress performance data with BigQuery. The Colab was created by members of the WordPress Performance Team and of the HTTP Archive team. As it contains a lot of content, please feel free to work through it in multiple sessions.

Last updated: