Profiling a web engine

One topic that interests me endlessly is profiling. I’ve covered this topic many times in this blog, but not enough to risk sounding like a broken record yet. So here we are again!

Not everyone may know this but GNOME has its own browser, Web (a.k.a. Epiphany, or Ephy for the intimates). It’s a fairly old project, descendant of Galeon. It uses the GTK port of WebKit as its web engine.

The recent announcement that WebKit on Linux (both WebKitGTK and WPE WebKit) switched to Skia for rendering brought with it a renewed interest in measuring the performance of WebKit.

And that was only natural; prior to that, WebKit on Linux was using Cairo, which is entirely CPU-based, whereas Skia had both CPU and GPU-based rendering easily available. The CPU renderer mostly matches Cairo in terms of performance and resource usage. Thus one of the big promises of switching to Skia was better hardware utilization and better overall performance by switching to the GPU renderer.

A Note About Cairo

Even though nowadays we often talk about Cairo as a legacy piece of software, there’s no denying that Cairo is really good at what it does. Cairo can and often is extremely fast at 2D rendering on the CPU, specially for small images with simple rendering. Cairo has received optimizations and improvements for this specific use case for almost 20 years, and it is definitely not a low bar to beat.

I think it’s important to keep this in mind because, as tempting as it may sound, simply switching to use GPU rendering doesn’t necessarily imply better performance.

Guesswork is a No-No

Optimizations should always be a byproduct of excellent profiling. Categorically speaking, meaningful optimizations are a consequence of instrumenting the code so much that the bottlenecks become obvious.

I think the most important and practical lesson I’ve learned is: when I’m guessing what are the performance issues of my code, I will be wrong pretty much 100% of the time. The only reliable way to optimize anything is to have hard data about the behavior of the app.

I mean, so many people – myself included – were convinced that GNOME Software was slow due to Flatpak that nobody thought about looking at app icons loading.

Enter the Profiler

Thanks to the fantastic work of Søren Sandmann, Christian Hergert, et al, we have a fantastic modern system profiler: Sysprof.

Sysprof offers a variety of instruments to profile the system. The most basic one uses perf to gather stack traces of the processes that are running. Sysprof also supports time marks, which allow plotting specific events and timings in a timeline. Sysprof also offers extra instrumentation for more specific metrics, such as network usage, graphics, storage, and more.

  • Screenshot of Sysprof's callgraph view
  • Screenshot of Sysprof's flamegraphs view
  • Screenshot of Sysprof's mark chart view
  • Screenshot of Sysprof's waterfall view

All these metrics are super valuable when profiling any app, but they’re particularly useful for profiling WebKit.

One challenging aspect of WebKit is that, well, it’s not exactly a small project. A WebKit build can easily take 30~50min. You need a fairly beefy machine to even be able to build a debug build of WebKit. The debug symbols can take hundreds of megabytes. This makes WebKit particularly challenging to profile.

Another problem is that Sysprof marks require integration code. Apps have to purposefully link against, and use, libsysprof-capture to send these marks to Sysprof.

Integrating with Sysprof

As a first step, Adrian brought the libsysprof-capture code into the WebKit tree. As libsysprof-capture is a static library with minimal dependencies, this was relatively easy. We’re probably going to eventually remove the in-tree copy and switch to host system libsysprof-capture, but having it in-tree was enough to kickstart the whole process.

Originally I started sprinkling Sysprof code all around the WebKit codebase, and to some degree, it worked. But eventually I learned that WebKit has its own macro-based tracing mechanism that is only ever implemented for Apple builds.

Looking at it, it didn’t seem impossible to implement these macros using Sysprof, and that’s what I’ve been doing for the past few weeks. The review was lengthy but behold, WebKit now reports Sysprof marks!

Screenshot of Sysprof with WebKit marks highlighted

Right now these marks cover a variety of JavaScript events, layout and rendering events, and web page resources. This all came for free from integrating with the preexisting tracing mechanism!

This gives us a decent understanding of how the Web process behaves. It’s not yet complete enough, but it’s a good start. I think the most interesting data to me is correlating frame timings across the whole stack, from the kernel driver to the compositor to GTK to WebKit’s UI process to WebKit’s Web process, and back:

Screenshot of Sysprof with lots of compositor and GTK and WebKit marks

But as interesting as it may be, oftentimes the fun part of profiling is being surprised by the data you collect.

For example, in WebKit, one specific, seemingly innocuous, completely bland method is in the top 3 of the callgraph chart:

Screenshot of Sysprof showing the callgraph view with an interesting result highlighted

Why is WebCore::FloatRect::contains so high in the profiling? That’s what I’m investigating right now. Who guessed this specific method would be there? Nobody, as far as I know.

Once this is out in a stable release, anyone will be able to grab a copy of GNOME Web, and run it with Sysprof, and help find out any performance issues that only reproduce in particular combinations of hardware.

Next Plans

To me this already is a game changer for WebKit, but of course we can do more. Besides the rectangular surprise, and one particular slowdown that comes from GTK loading Vulkan on startup, no other big obvious data point popped up. Specially in the marks, I think their coverage is still fairly small compared to what it could have been.

We need more data.

Some ideas that are floating right now:

  • Track individual frames and correlate them with Sysprof marks
  • Measure top-to-bottom-to-top latency
  • Measure input delay
  • Integrate with multimedia frames

Perhaps this will allow us to make WebKit the prime web engine for Linux, with top-tier performance, excellent system integration, and more. Maybe we can even redesign the whole rendering architecture of WebKit on Linux to be more GPU friendly now. I can dream high, can’t I? 🙂

In any case, I think we have a promising and exciting time ahead for WebKit on Linux!


One response to “Profiling a web engine”

  1. Zaedus Avatar

    This is so awesome to read about, and even cooler to see the ENTIRE call stack all the way down the compositor. There’s no doubt that this is going to be huge.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.