Analyzing the Core Web Vitals performance impact of WordPress 6.3 in the field

As highlighted in the WordPress 6.3 performance summary post, the 6.3 release included numerous performance enhancements. Based on the lab benchmarks cited in that post, the test sites used with WordPress coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. were loading 27% faster for blockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience. themes and 18% faster for classic themes based on the Largest Contentful Paint (LCP) metric.

While lab benchmarks are great to estimate the projected performance impact of a release, the tests are not representative of the average WordPress site and real-world traffic. Therefore, it is crucial to further review and attempt to validate the impact in the field, i.e. on actual production sites using WordPress, at scale. Last week, three analyses were conducted to assess the performance impact of WordPress 6.3, using the public data sets from HTTP Archive and the Chrome User Experience Report.

Highlights of the WordPress 6.3 performance analysis findings

Before diving into the results, the term “passing rate” should be briefly explained here. It denotes the percentage of sites in a dataset for which a specific Web Vitals metric performs better than the threshold value that is considered “good”. For LCP, that encompasses all sites in the dataset that load faster than 2.5 seconds in total per the LCP metric. For example, if 600,000 out of 1,000,000 URLs have an LCP faster or equal to 2.5 seconds, the LCP passing rate is 60%.

The results from the analyses indicate that WordPress 6.3 is indeed a great success from a performance perspective, as indicated by the lab benchmarks. Some notable findings to highlight include:

  • Looking at all applicable sites in the dataset, the Largest Contentful Paint (LCP) passing rate has improved by 5.6% for classic theme sites and by 2.7% for block theme sites for mobile viewports. In terms of the absolute LCP passing rate, for classic theme sites this means a bump from 31.3% to 33%, while for block theme sites it means a bump from 42.8% to 44%. For desktop viewports, the improvements are not as pronounced, yet they are still positive. See the source for overall LCP passing rate changes.
  • When segmenting between sites that use the emoji loader script and the sites that have disabled it, the impact of the improvements to the emoji loader script are clearly visible. The Largest Contentful Paint (LCP) boost for classic theme sites using the emoji loader script is 3.4% to 7% higher than for those that don’t use it, and for block themes it’s 0.7% to 4.5% better as well. To outline the numbers behind that more clearly, classic theme sites using the emoji loader script see a relative LCP boost of 8.4% on phone and 2.4% on desktop, compared to only 1.4% and -0.8% for those that don’t use the emoji loader script. Similarly, for block theme sites using the emoji loader script the relative LCP boost amounts to 4.2% on phone and 0.8% on desktop, compared to only -0.3% and 0.1% for those that don’t use the emoji loader script. See the source for LCP passing rate differences between sites using vs not using the emoji loader script.
  • When looking at the impact of more accurate lazy-loading heuristics and support for fetchpriority="high", segmentation is especially important, since the enhancements themselves have a varying degree of accuracy. As a reminder, the LCP image of a URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org should not be lazy-loaded, but it should have fetchpriority="high". When looking at only the sites where that is the case and which were still lazy-loading the LCP image with WordPress 6.2, the LCP performance impact amounts to a massive 16% to 21% improvement for mobile viewports and 6% to 9% on desktop. Even in absolute LCP passing rate numbers, this is a jump of 4.3% for classic theme sites and 8% for block theme sites, which is nothing short of amazing. See the source for LCP passing rate changes for sites that no longer lazy-load LCP image and use fetchpriority correctly.
  • Of course this only applies to a subset of sites, however the accuracy of the lazy-loading heuristics has notably improved as well: In WordPress 6.3, only 9–10% of sites still lazy-load their LCP image for classic theme sites (down from 27–28% in 6.2) while for block theme sites it’s 5–8% (down from 17–29% in 6.2), so this multiplies the above LCP improvements horizontally. See the source for the accuracy comparison of how many sites (correctly) no longer lazy-load their LCP image.

Explaining the metrics

Tooling used

HTTP Archive is an open-source project that runs a pipeline across millions of URLs every month to monitor the state of the web, recording aspects like which technologies are used, how specific web features are being leveraged, how many HTMLHTML HyperText Markup Language. The semantic scripting language primarily used for outputting content in web browsers. tags or attributes of a specific kind are present on pages, and much more. The Core Performance Team has been heavily relying on this tool to measure success of specific features or enhancements in WordPress core releases. In fact, HTTP Archive even monitors a few specific metrics that are specific to WordPress.

The Chrome User Experience Report (short “CrUX”) exposes Core Web Vitals (CWV) performance data for millions of URLs, based on how real-world Chrome users experience visiting those URLs. While the tool can be used for individual sites to monitor their Web Vitals (e.g. via PageSpeed Insights), the data can also be aggregated at a larger lens. While CrUX does not contain much data other than the actual Web Vitals metrics, intersecting its dataset with that of HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. Archive allows gathering valuable insights. For example, it becomes possible to group sites into specific segments (such as all sites that use WordPress) and measure their CWV passing rates.

Both HTTP Archive and CrUX expose data aggregated on a monthly basis.

Joining data from HTTP Archive with data from CrUX is the foundation for tools like the Core Web Vitals Technology Report, which displays CWV passing rates for numerous technologies over time. The dashboard also includes WordPress-specific passing rates, which can be helpful to look at for a quick overview of how WordPress sites are performing on the web at a glance. However, it should be noted that those numbers are quite broad, since the passing rates are based on all WordPress sites in the dataset, regardless of the version used or any other factors. Therefore, in order to assess the impact of a specific WordPress release such as 6.3, a more granular approach is needed.

Methodology

The WordPress 6.3 performance summary post highlighted two client-side performance enhancements as the main sources for the improved LCP performance, which are the optimizations of the emoji loader script (see #58472) and the lazy-loading fixes plus the newly added support for the fetchpriority attribute, which are closely related (see the WordPress 6.3 image performance enhancements post). To assess whether those enhancements resulted in the anticipated LCP improvement, two analyses were conducted specific to those efforts.

Additionally, a broader analysis was conducted to compare the LCP performance of WordPress 6.3 and WordPress 6.2 sites overall, as well as their Time to First Byte (TTFB) performance, which directly impacts LCP as well. While with broader analyses like this one it is impossible to directly connect it to specific enhancements or fixes that launched as part of that release, it is crucial to look at the performance impact as a whole as well to get an idea how successful the release is at scale, regardless of how a specific feature is being used.

The analyses were conducted by running various BigQuery queries against the intersection of HTTP Archive and CrUX datasets, specifically zooming in on only the sites that were using WordPress 6.2 in July 2023 and WordPress 6.3 in August 2023. To present the approach, queries, and results transparently, the research tool Colab was used.

The links below point to the three Colabs with the analyses. They are quite detailed, so for a quick summary you may want to continue reading this post first. Please feel free to dive into the individual Colabs and their details, which you can also use to validate the summary below. Potentially you will find other notable metrics to highlight, or additional conclusions to draw.

It should be noted that any field metrics need to be interpreted carefully as they always contain some degree of noise. Websites change over time in many ways, and it is impossible to eliminate external factors from the data. For example, a WordPress site may be slower with WordPress 6.3 than it was in 6.2 simply because it activated a new pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party in the meantime that impacts performance. Such scenarios cannot be reliably detected and are therefore part of the metrics as well. Fortunately, the number of WordPress sites in the dataset is quite large: Looking at only the WordPress sites in the dataset that match the aforementioned criteria, we are looking at more than 500,000 WordPress home page URLs. This means that such specific side effects of individual sites usually have only negligible impact when looking at the overall data. Still, this is something to keep in mind: While field data is the closest there is available to assess the actual performance impact of a change, field data cannot be used to confidently claim that something is true or false — it has to be interpreted.

Conclusion

The large positive LCP impact confirms that the 6.3 release is an important milestone for WordPress performance. The numbers are particularly impressive on the sites for which the lazy-loading behavior was fixed and where fetchpriority support was correctly added. This shows the potential vertical impact that a few specific changes like that can have. Of course the overall LCP improvements are not as high, but it confirms this is a large opportunity: By further improving the heuristics so that they apply correctly to more WordPress sites, the horizontal impact of the change can be increased so that in the future the large LCP benefits may scale to even more sites.

Another metaMeta Meta is a term that refers to the inside workings of a group. For us, this is the team that works on internal WordPress sites like WordCamp Central and Make WordPress. observation worth noting is that the LCP passing rate improvements in WordPress 6.3 compared to 6.2 for the correct behavior above (16-21% higher LCP passing rate) is actually not too far off from the lab benchmarks measured for 6.3 a few months ago (18-27% faster LCP). This makes sense, given that for lab benchmarks the test site was a simulated scenario where lazy-loading and fetchpriority were behaving correctly. It is great to know that the lab benchmarks carry some weight even when compared to the field impact.

Last but not least, there are also two points to be highlighted which show that there is still room for improvement:

  • The accuracy with which fetchpriority="high" is applied to the LCP image is only around 50% across all scenarios. While this is okay for the newly added support of the attribute, it is clearly something to follow up on. Getting the heuristics for applying fetchpriority right is even more challenging than not lazy-loading the LCP image especially since the LCP image may differ between different viewports, but it’s safe to say there should be more that WordPress core can do in that area. At least, it is relieving to see that the negative LCP impact of adding fetchpriority="high" to the wrong image is fairly low, compared to the negative LCP impact of lazy-loading the LCP image. See the source for fetchpriority accuracy against the LCP image and the source for LCP passing rate changes for sites that no longer lazy-load LCP image but use fetchpriority incorrectly.
  • At a higher level, the Time to First Byte (TTFB) passing rate is not seeing much of an improvement and in parts is even regressing: For mobile viewports, the TTFB passing rate is improving between 1.6-1.7%, while for desktop viewports it is regressing by ~4.9% for classic theme sites and ~9% for block theme sites. It’s impossible to connect that to specific changes that landed in WordPress 6.3, and as mentioned before it could be affected by external factors, but it clarifies that server-side performance needs to continue to be a point of focus. See the source for overall TTFB passing rate changes.

Please feel free to take a closer look at the analyses and leave your feedback as comments on this post. Additional thoughts, observations and questions are much appreciated.

Props @joemcgill @westonruter for proofreading.

#6-3, #analysis, #performance, #summary