Dependency tree with invalidated hashed file names

Progressively enhanced caching of JavaScript modules without bundling using import maps

Published Wednesday 23 November 2022

Last updated Sunday 24 September 2023

I went to ffconf 2022 a couple of weeks ago, and two of the talks in particular resonated with me... (more actually, but these felt actionable):

"This Talk is Under Construction: a love letter to the personal website" Sophie Koonin
"Working towards a greener world from behind the keyboard" Natalia Waniczek

I really like having my own place on the web, and I've already put a fairly substantial amount of effort into making it as gentle on the environment as possible:

Semantic markup without lots of nesting reduces the size of HTML over the wire (and probably makes it slightly less demanding to parse for the browser).
Few images, served in the most efficient formats supported by the browser making the request.
Lazy loaded images. This is important for my notes stream.
CSS is minified, hashed, and has immutable cache headers. This means that your browser will make a single network request for it and reuse it every time you visit, unless your browser evicts it from the cache, or the CSS changes (which is reflected by a change of the hash in the CSS file name).
The site uses a static site generator, so requests for pages resolve to files which are easily cached. There's no database serving queries behind the scenes.
As little JS as possible. Most pages have only enough JS to load a service worker as a progressive enhancement to allow offline access. These pages work with JS disabled (you're reading one now).

I was inspired to put some time into reducing my footprint even further!

Caching JavaScript

The small elephant in the room are my JavaScript experiments (for example, generative art stuff). I like to build with small, reusable modules, and until now I haven't bothered bundling. The most common solution is to bundle the JS for each page, put a hash of the content in the file name, and serves it with immutable^[1] cache headers with a long max-age, like the CSS mentioned above.

Why though? Resources which are immutably cached need no more network requests once cached. This means fewer requests (especially on subsequent page views). Less energy used to transmit files. Less opportunity for network latency and failure to delay or break features using JS. It's also kinder to mobile users and their plans.

For this little site the savings are going to be pretty modest, but it proves that the approach works. Sites which use a lot more JS with frequent small changes stand to benefit a lot more.

Bundling versus small modules

I'd still rather not bundle though. The pool of small modules I've written for my experiments are intended for reuse. The bundle for each page would significantly overlap with the bundles of other pages, but would be cached independently. A single character change to any module will invalidate all bundles which include it.

It would be far better to cache these modules independently. Consider this tree of dependencies (imagine that /a.js is the entry point):

This is more or less how all browsers saw JS on this site until recently. It suffers from some problems... You can cache using ETags, but that's about as far as you can get without transforming the JS in the modules. It means that the browser only knows about each dependency as it encounters it in import statements. We could do some preloading, but whatever we do it's still a request per module, even when the module has been downloaded before and there are no changes.

We could try to hash the content of each file, and put it in the filenames (the cache-busting pattern):

This lets us cache each file immutably. If there is a change to a file, then the hash changes and the new file is downloaded.

But there's a problem with this... when a file changes, its name will change. When it's name changes, any files which depend on it will change since the imports need to be updated! For example, if /e.js changes, the imports in /c.js and /d.js must be updated, which leads to a change in /a.js too. The changes go all the way from the updated module to the top!

This is a problem. I don't want to invalidate caching for modules which have no meaningful code changes.

Import maps provide a way out of this quandary. They let us tell the browser how to resolve imports. This means that instead of rewriting the imports of each module, we keep them as they originally were. The engine looks in the import map and resolves accordingly. Changing the content of a module will mean only its entry in the import map gets updated.

{
  "imports": {
    "/a.js": "/a-5b5be8.js",
    "/b.js": "/b-913d04.js",
    "/c.js": "/c-7534a8.js",
    "/d.js": "/d-25255e.js",
    "/e.js": "/e-bb438d.js"
  }
}

For caching this is really nice. Your whole JS application can be cached, and whenever a module is updated the browser only needs to make a request for the updated content for that one module.

Implementation

For a static site generator the setup is:

Copy the files with their original names into the target directory.
Copy each file a second time to the same target, but include a prefix and the hash in the file name.
Add an import map to each page which uses JS.
Update the script entry point to use the hashed file.

The first point allows browsers which don't understand import maps yet to work as they did before. Temporary redirects would work too, but mean more requests. The prefixes of the files with content hashed in their names allows immutable cache headers to be added when they're served (the second point).

The last point was unexpected, but to spec (it's not just a Chrome quirk). Browsers which don't understand import maps actually benefit from this change, because they can at least cache the entry point.

The third point hides a lot of pain. As I add more experiments, I don't want the import maps of other pages to grow, so I don't want just one import map for all pages, but rather an import map for each page.

When the files are copied and hashed, a big import map is generated. The code I've written also parses (but does not modify) each module to determine its dependencies.

{
  "/a.js": {
    "hashedFileName": "/hashed-a-5b5be8.js",
    "dependencies": ["/b.js", "/c.js"]
  },
  "/b.js": {
    "hashedFileName": "/hashed-b-913d04.js",
    "dependencies": []
  },
  "/c.js": {
    "hashedFileName": "/hashed-c-7534a8.js",
    "dependencies": ["/d.js", "e.js"]
  },
  "/d.js": {
    "hashedFileName": "/hashed-d-25255e.js",
    "dependencies": ["e.js"]
  },
  "/e.js": {
    "hashedFileName": "/hashed-e-bb438d.js",
    "dependencies": []
  }
}

When the map of each page is generated, the entry point is looked up in the big import map, and its dependencies recursively added too. The hashed entry point is used in the entry script tag.

Content Security Policy

While the import map specification allows for them to be external to the HTML of a page, no browsers currently implement this^[2]. The HTML pages of my static site have quite strict Content Security Policy (CSP) headers. These headers prevent the execution of inline scripts and inline CSS, and I prefer not to relax them.

The way to resolve the conflict is to add the hash of the import map to the CSP header of a page. This is like the server telling the browser you can't run scripts inlined in the page, except for those with this hash.

I once used Netlify edge functions to accomplish this^[3], but now I template a headers file with an entry for each HTML page which needs a custom CSP header for an import map hash.

Preloading

One problem with lots of small modules I've not addressed yet is that on first load, the browser only discovers what it needs to fetch as it reads the imports of each module.

Since the import maps are created specifically for each page, their values are a list of URLs for the hashed files which will be needed. It's possible to preload the modules by adding module preload links in the header. Like the entry point, the browser won't use the import map to resolve JS modules. For stuff in HTML you have to do that.

<link rel="modulepreload" href="/hashed-every-90ec60.js">
<link rel="modulepreload" href="/hashed-javascript-dc77dc.js">
<link rel="modulepreload" href="/hashed-module-8d541a.js">
<link rel="modulepreload" href="/hashed-needed-54bb3e.js">
<link rel="modulepreload" href="/hashed-later-05fbf2.js">

The solution isn't perfect though. A hypothetical problem (there are no browsers like this at the time of writing) is when module preload is supported, but import maps are not. This would lead to modules being fetched twice (hashed and original file names). One way out of that would be to have the original filenames respond with a temporary redirect to the file names with the hashes in.

One potential problem is the head of a document growing very large due to lots of modules mapping to lots of links. In my case each experiment uses a handful of modules, so the increase in the size of the head is not problematic. The module preloads spec allows (but does not require) the browser to resolve the child imports of preloaded modules, so a middle ground may be to preload just some modules.

One small saving I make is to exclude preload links for scripts which will appear in script tags. This includes the index.js file which applies to all pages (mostly there to install a service worker), the entry point(s) of any experiments, and the ruby state persistence script on any page with ruby annotations. This means that most pages don't have an import map or any preload links.

Interactions with the service worker

When a file is served with the immutable cache-control header, it'll either be in the browser cache or it won't be (and will trigger a request to populate the cache). The service worker isn't needed for such files. It's only there to make decisions about more weakly cached files when browsing offline.

The service worker intercepts all requests, but we don't want it to handle ones cached immutably. By returning early from a fetch event handler the handler delegates the request back to the browser and the browser cache will be used as if the service worker isn't there at all.

Since I'm prefixing immutable resources, with hashed-, I can check the URL of the request in the fetch event, and use the prefix to know not to continue.

addEventListener('fetch', fetchEvent => {
  'use strict';

  const isHashed = fetchEvent.request.url
    .split('/')
    .pop()
    .startsWith('hashed-');

  if (isHashed) {
    return; // Delegate to the browser.
  }

  // ...
});

I like this. My site only uses a service worker to handle caching, and I'd rather not have one at all. The less it does the better! In an ideal world, all resources would be immutable, and only the HTML document would not be. In such a world a request for a page resulting in a 304 response (not changed) would also mean all resources are already cached, and no further requests are needed.

Conclusion and future work

When I first published this article, the only browsers to support import maps were those based on Chrome. Firefox shipped support about a month after, and finally Safari in March 2023. The big browsers now all support import maps!

This work so far handles all JavaScript files I deploy except for the service worker. The service worker may be tricky to handle (the URL of the service worker script is important for the running worker and its cache). I'll update this post as I improve coverage.

For smaller applications which differ in composition from page to page (while sharing many source modules) I think this solution has advantages over bundling. There's less to cache (eliminates bundle overlap) and less to fetch when there are changes (only fetch changed modules, not a whole bundle).

Chrome doesn't support immutable, but immutable is paired with a long max-age and the result in Chrome is very similar.ꜛ
It's possible to dynamically create an importmap element with an external script, but that element will be considered an inline script and banned by the CSP!ꜛ
I used to place the hash of the import map in each HTML page, and use a Netlify edge function with an elaborate caching strategy to pluck the hash out and append CSP headers per page (so the function would run once per page, per deployment to avoid unnecessary invocations and cost), but in the end edge functions turned out to be poorly documented and buggy (they broke on more than one occasion though no action on my part).ꜛ

Feel like sharing or responding?

Comments or corrections? Toot to me!

Edit page.