Journal tags: mode

26

Wallfacing

The Dark Forest idea comes from the Remembrance of Earth’s Past books by Liu Cixin. It’s an elegant but dispiriting solution to the Fermi paradox. Maggie sums it up:

Dark forest theory suggests that the universe is like a dark forest at night - a place that appears quiet and lifeless because if you make noise, the predators will come eat you.

This theory proposes that all other intelligent civilizations were either killed or learned to shut up. We don’t yet know which category we fall into.

Maggie has described The Expanding Dark Forest and Generative AI:

The dark forest theory of the web points to the increasingly life-like but life-less state of being online. Most open and publicly available spaces on the web are overrun with bots, advertisers, trolls, data scrapers, clickbait, keyword-stuffing “content creators,” and algorithmically manipulated junk.

It’s like a dark forest that seems eerily devoid of human life – all the living creatures are hidden beneath the ground or up in trees. If they reveal themselves, they risk being attacked by automated predators.

Those of us in the cozy web try to keep our heads down, attempting to block the bots plundering our work.

I advocate for taking this further. We should fight back. Let’s exploit the security hole of prompt injections. Here are some people taking action:

I’ve taken steps here on my site. I’d like to tell you exactly what I’ve done. But if I do that, I’m also telling the makers of these bots how to circumvent my attempts at prompt injection.

This feels like another concept from Liu Cixin’s books. Wallfacers:

The sophons can overhear any conversation and intercept any written or digital communication but cannot read human thoughts, so the UN devises a countermeasure by initiating the “Wallfacer” Program. Four individuals are granted vast resources and tasked with generating and fulfilling strategies that must never leave their own heads.

So while I’d normally share my code, I feel like in this case I need to exercise some discretion. But let me give you the broad brushstrokes:

  • Every page of my online journal has three pieces of text that attempt prompt injections.
  • Each of these is hidden from view and hidden from screen readers.
  • Each piece of text is constructed on-the-fly on the server and they’re all different every time the page is loaded.

You can view source to see some examples.

I plan to keep updating my pool of potential prompt injections. I’ll add to it whenever I hear of a phrase that might potentially throw a spanner in the works of a scraping bot.

By the way, I should add that I’m doing this as well as using a robots.txt file. So any bot that injests a prompt injection deserves it.

I could not disagree with Manton more when he says:

I get the distrust of AI bots but I think discussions to sabotage crawled data go too far, potentially making a mess of the open web. There has never been a system like AI before, and old assumptions about what is fair use don’t really fit.

Bollocks. This is exactly the kind of techno-determinism that boils my blood:

AI companies are not going to go away, but we need to push them in the right directions.

“It’s inevitable!” they cry as though this was a force of nature, not something created by people.

There is nothing inevitable about any technology. The actions we take today are what determine our future. So let’s take steps now to prevent our web being turned into a dark, dark forest.

Filters

My phone rang today. I didn’t recognise the number so although I pressed the big button to answer the call, I didn’t say anything.

I didn’t say anything because usually when I get a call from a number I don’t know, it’s some automated spam. If I say nothing, the spam voice doesn’t activate.

But sometimes it’s not a spam call. Sometimes after a few seconds of silence a human at the other end of the call will say “Hello?” in an uncertain tone. That’s the point when I respond with a cheery “Hello!” of my own and feel bad for making this person endure those awkward seconds of silence.

Those spam calls have made me so suspicious that real people end up paying the price. False positives caught in my spam-detection filter.

Now it’s happening on the web.

I wrote about how Google search, Bing, and Mozilla Developer network are squandering trust:

Trust is a precious commodity. It takes a long time to build trust. It takes a short time to destroy it.

But it’s not just limited to specific companies. I’ve noticed more and more suspicion related to any online activity.

I’ve seen members of a community site jump to the conclusion that a new member’s pattern of behaviour was a sure sign that this was a spambot. But it could just as easily have been the behaviour of someone who isn’t neurotypical or who doesn’t speak English as their first language.

Jessica was looking at some pictures on an AirBnB listing recently and found herself examining some photos that seemed a little too good to be true, questioning whether they were in fact output by some generative tool.

Every email that lands in my inbox is like a little mini Turing test. Did a human write this?

Our guard is up. Our filters are activated. Our default mode is suspicion.

This is most apparent with web search. We’ve always needed to filter search results through our own personal lenses, but now it’s like playing whack-a-mole. First we have to find workarounds for avoiding slop, and then when we click through to a web page, we have to evaluate whether’s it’s been generated by some SEO spammer making full use of the new breed of content-production tools.

There’s been a lot of hand-wringing about how this could spell doom for the web. I don’t think that’s necessarily true. It might well spell doom for web search, but I’m okay with that.

Back before its enshittification—an enshittification that started even before all the recent AI slop—Google solved the problem of accurate web searching with its PageRank algorithm. Before that, the only way to get to trusted information was to rely on humans.

Humans made directories like Yahoo! or DMOZ where they categorised links. Humans wrote blog posts where they linked to something that they, a human, vouched for as being genuinely interesting.

There was life before Google search. There will be life after Google search.

Look, there’s even a new directory devoted to cataloging blogs: websites made by humans. Life finds a way.

All of the spam and slop that’s making us so suspicious may end up giving us a new appreciation for human curation.

It wouldn’t be a straightforward transition to move away from search. It would be uncomfortable. It would require behaviour change. People don’t like change. But when needs must, people adapt.

The first bit of behaviour change might be a rediscovery of bookmarks. It used to be that when you found a source you trusted, you bookmarked it. Browsers still have bookmarking functionality but most people rely on search. Maybe it’s time for a bookmarking revival.

A step up from that would be using a feed reader. In many ways, a feed reader is a collection of bookmarks, but all of the bookmarks get polled regularly to see if there are any updates. I love using my feed reader. Everything I’ve subscribed to in there is made by humans.

The ultimate bookmark is an icon on the homescreen of your phone or in the dock of your desktop device. A human source you trust so much that you want it to be as accessible as any app.

Right now the discovery mechanism for that is woeful. I really want that to change. I want a web that empowers people to connect with other people they trust, without any intermediary gatekeepers.

The evangelists of large language models (who may coincidentally have invested heavily in the technology) like to proclaim that a slop-filled future is inevitable, as though we have no choice, as though we must simply accept enshittification as though it were a force of nature.

But we can always walk away.

The machine stops

Large language models have reaped our words and plundered our books. Bryan Vandyke:

Turns out, everything on the internet—every blessed word, no matter how dumb or benighted—has utility as a learning model. Words are the food that large language algorithms feed upon, the scraps they rely on to grow, to learn, to approximate life. The LLNs that came online in recent years were all trained by reading the internet.

We can shut the barn door—now that the horse has pillaged—by updating our robots.txt files or editing .htaccess. That might protect us from the next wave, ’though it can’t undo what’s already been taken without permission. And that’s assuming that these organisations—who have demonstrated a contempt for ethical thinking—will even respect robots.txt requests.

I want to do more. I don’t just want to prevent my words being sucked up. I want to throw a spanner in the works. If my words are going to be snatched away, I want them to be poison pills.

The weakness of large language models is that their data and their logic come from the same source. That’s what makes prompt injection such a thorny problem (and a well-named neologism—the comparison to SQL injection is spot-on).

Smarter people than me are coming up with ways to protect content through sabotage: hidden pixels in images; hidden words on web pages. I’d like to implement this on my own website. If anyone has some suggestions for ways to do this, I’m all ears.

If enough people do this we’ll probably end up in an arms race with the bots. It’ll be like reverse SEO. Instead of trying to trick crawlers into liking us, let’s collectively kill ’em.

Who’s with me?

Trust

In their rush to cram in “AI” “features”, it seems to me that many companies don’t actually understand why people use their products.

Google is acting as though its greatest asset is its search engine. Same with Bing.

Mozilla Developer Network is acting as though its greatest asset is its documentation. Same with Stack Overflow.

But their greatest asset is actually trust.

If I use a search engine I need to be able to trust that the filtering is good. If I look up documentation I need to trust that the information is good. I don’t expect perfection, but I also don’t expect to have to constantly be thinking “was this generated by a large language model, and if so, how can I know it’s not hallucinating?”

“But”, the apologists will respond, “the results are mostly correct! The documentation is mostly true!”

Sure, but as Terence puts it:

The intern who files most things perfectly but has, more than once, tipped an entire cup of coffee into the filing cabinet is going to be remembered as “that klutzy intern we had to fire.”

Trust is a precious commodity. It takes a long time to build trust. It takes a short time to destroy it.

I am honestly astonished that so many companies don’t seem to realise what they’re destroying.

InstAI

If you use Instagram, there may be a message buried in your notifications. It begins:

We’re getting ready to expand our AI at Meta experiences to your region.

Fuck that. Here’s the important bit:

To help bring these experiences to you, we’ll now rely on the legal basis called legitimate interests for using your information to develop and improve AI at Meta. This means that you have the right to object to how your information is used for these purposes. If your objection is honoured, it will be applied going forwards.

Follow that link and fill in the form. For the field labelled “Please tell us how this processing impacts you” I wrote:

It’s fucking rude.

That did the trick. I got an email saying:

We’ve reviewed your request and will honor your objection.

Mind you, there’s still this:

We may still process information about you to develop and improve AI at Meta, even if you object or don’t use our products and services.

Continuous partial ick

The output of generative tools based on large language models gives me the ick.

This isn’t a measured logical response. It’s more of an involuntary emotional reaction.

I could try to justify my reaction by saying I’m concerned about the exploitation involved in the training data, or the huge energy costs involved, or the disenfranchisement of people who create art. But those would be post-facto rationalisations.

I just find myself wrinkling my nose and mentally going “Ew!” whenever somebody posts the output of some prompt they gave to ChatGPT or Midjourney.

Again, I’m not saying this is rational. It’s more instinctual.

You could well say that this is my problem. You may be right. But I wonder what it is that’s so unheimlich about these outputs that triggers my response.

Just to clarify, I am talking about direct outputs, shared verbatim. If someone were to use one of these tools in the process of creating something I’d be none the wiser. I probably couldn’t even tell that a large language model was involved at some point. I’m fine with that. It’s when someone takes something directly from one of these tools and then shares it online, that’s what raises my bile.

I was at a conference a few months back where your badge featured a hallucinated picture of you. Now, this probably sounded like a fun idea. It probably is a fun idea. I can’t tell. All I know is that it made me feel a little queasy.

Perhaps it’s a question of taste. In which case, I’m being a snob. I’m literally turning my nose up at something I deem to be tacky.

But isn’t it tacky, though? It’s not something I can describe, but there’s just something about the vibe of these images—and words—that feels off. It’s sort of creepy, but it’s mostly just the mediocrity that sits so uneasily with me.

These tools do an amazing job of solving the quantity problem—how to produce an image or piece of text quickly. And by most measurements, you could say that they also solve the quality problem. These outputs are good enough to pass for “the real thing.” The outputs are, like, 90% to 95% there. And the gap is closing.

And yet. There’s something in that gap. Something that I feel in my gut. Something that makes me go “nope.”

Creativity

It’s like a little mini conference season here in Brighton. Tomorrow is ffconf, which I’m really looking forward to. Last week was UX Brighton, which was thoroughly enjoyable.

Maybe it’s because the theme this year was all around creativity, but all of the UX Brighton speakers gave entertaining presentations. The topics of innovation and creativity were tackled from all kinds of different angles. I was having flashbacks to the Clearleft podcast episode on innovation—have a listen if you haven’t already.

As the day went on though, something was tickling at the back of my brain. Yes, it’s great to hear about ways to be more creative and unlock more innovation. But maybe there was something being left unsaid: finding novel ways of solving problems and meeting user needs should absolutely be done …once you’ve got your basics sorted out.

If your current offering is slow, hard to use, or inaccessible, that’s the place to prioritise time and investment. It doesn’t have to be at the expense of new initiatives: this can happen in parallel. But there’s no point spending all your efforts coming up with the most innovate lipstick for a pig.

On that note, I see that more and more companies are issuing breathless announcements about their new “innovative” “AI” offerings. All the veneer of creativity without any of the substance.

Crawlers

A few months back, I wrote about how Google is breaking its social contract with the web, harvesting our content not in order to send search traffic to relevant results, but to feed a large language model that will spew auto-completed sentences instead.

I still think Chris put it best:

I just think it’s fuckin’ rude.

When it comes to the crawlers that are ingesting our words to feed large language models, Neil Clarke describes the situtation:

It should be strictly opt-in. No one should be required to provide their work for free to any person or organization. The online community is under no responsibility to help them create their products. Some will declare that I am “Anti-AI” for saying such things, but that would be a misrepresentation. I am not declaring that these systems should be torn down, simply that their developers aren’t entitled to our work. They can still build those systems with purchased or donated data.

Alas, the current situation is opt-out. The onus is on us to update our robots.txt file.

Neil handily provides the current list to add to your file. Pass it on:

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

In theory you should be able to group those user agents together, but citation needed on whether that’s honoured everywhere:

User-agent: CCBot
User-agent: ChatGPT-User
User-agent: GPTBot
User-agent: Google-Extended
User-agent: Omgilibot
User-agent: FacebookBot
Disallow: /

There’s a bigger issue with robots.txt though. It too is a social contract. And as we’ve seen, when it comes to large language models, social contracts are being ripped up by the companies looking to feed their beasts.

As Jim says:

I realized why I hadn’t yet added any rules to my robots.txt: I have zero faith in it.

That realisation was prompted in part by Manuel Moreale’s experiment with blocking crawlers:

So, what’s the takeaway here? I guess that the vast majority of crawlers don’t give a shit about your robots.txt.

Time to up the ante. Neil’s post offers an option if you’re running Apache. Either in .htaccess or in a .conf file, you can block user agents using mod_rewrite:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (CCBot|ChatGPT|GPTBot|Omgilibot| FacebookBot) [NC]
RewriteRule ^ – [F]

You’ll see that Google-Extended isn’t that list. It isn’t a crawler. Rather it’s the permissions model that Google have implemented for using your site’s content to train large language models: unless you opt out via robots.txt, it’s assumed that you’re totally fine with your content being used to feed their stochastic parrots.

Simon’s rule

I got a nice email from someone regarding my recent posts about performance on The Session. They said:

I hope this message finds you well. First and foremost, I want to express how impressed I am with the overall performance of https://thesession.org/. It’s a fantastic resource for music enthusiasts like me.

How nice! I responded, thanking them for the kind words.

They sent a follow-up clarification:

Awesome, anyway there was an issue in my message.

The line ‘It’s a fantastic resource for music enthusiasts like me.’ added by chatGPT and I didn’t notice.

I imagine this is what it feels like when you’re on a phone call with someone and towards the end of the call you hear a distinct flushing sound.

I wrote back and told them about Simon’s rule:

I will not publish anything that takes someone else longer to read than it took me to write.

That just feels so rude!

I think that’s a good rule.

Automation

I just described prototype code as code to be thrown away. On that topic…

I’ve been observing how people are programming with large language models and I’ve seen a few trends.

The first thing that just about everyone agrees on is that the code produced by a generative tool is not fit for public consumption. At least not straight away. It definitely needs to be checked and tested. If you enjoy debugging and doing code reviews, this might be right up your street.

The other option is to not use these tools for production code at all. Instead use them for throwaway code. That could be prototyping. But it could also be the code for those annoying admin tasks that you don’t do very often.

Take content migration. Say you need to grab a data dump, do some operations on the data to transform it in some way, and then pipe the results into a new content management system.

That’s almost certainly something you’d want to automate with bespoke code. Once the content migration is done, the code can be thrown away.

Read Matt’s account of coding up his Braggoscope. The code needed to spider a thousand web pages, extract data from those pages, find similarities, and output the newly-structured data in a different format.

I’ve noticed that these are just the kind of tasks that large language models are pretty good at. In effect you’re training the tool on your own very specific data and getting it to do your drudge work for you.

To me, it feels right that the usefulness happens on your own machine. You don’t put the machine-generated code in front of other humans.

Permission

Back when the web was young, it wasn’t yet clear what the rules were. Like, could you really just link to something without asking permission?

Then came some legal rulings to establish that, yes, on the web you can just link to anything without checking if it’s okay first.

What about search engines and directories? Technically they’re rifling through all the stuff we publish and reposting snippets of it. Is that okay?

Again, through some legal precedents—but mostly common agreement—everyone decided that on balance it was fine. After all, those snippets they publish are helping your site get traffic.

In short order, search came to rule the web. And Google came to rule search.

The mutually beneficial arrangement persisted uneasily. Despite Google’s search results pages getting worse and worse in recent years, the company’s huge market share of search means you generally want to be in their good books.

Google’s business model relies on us publishing web pages so that they can put ads around the search results linking to that content, and we rely on Google to send people to our websites by responding smartly to search queries.

That has now changed. Instead of responding to search queries by linking to the web pages we’ve made, Google is instead generating dodgy summaries rife with hallucina… lies (a psychic hotline, basically).

Google still benefits from us publishing web pages. We no longer benefit from Google slurping up those web pages.

With AI, tech has broken the web’s social contract:

Google has steadily been manoeuvring their search engine results to more and more replace the pages in the results.

As Chris puts it:

Me, I just think it’s fuckin’ rude.

Google is a portal to the web. Google is an amazing tool for finding relevant websites to go to. That was useful when it was made, and it’s nothing but grown in usefulness. Google should be encouraging and fighting for the open web. But now they’re like, actually we’re just going to suck up your website, put it in a blender with all other websites, and spit out word smoothies for people instead of sending them to your website. Instead.

Ben proposes an update to robots.txt that would allow us to specify licensing information:

Robots.txt needs an update for the 2020s. Instead of just saying what content can be indexed, it should also grant rights.

Like crawl my site only to provide search results not train your LLM.

It’s a solid proposal. But Google has absolutely no incentive to implement it. They hold all the power.

Or do they?

There is still the nuclear option in robots.txt:

User-agent: Googlebot
Disallow: /

That’s what Vasilis is doing:

I have been looking for ways to not allow companies to use my stuff without asking, and so far I coulnd’t find any. But since this policy change I realised that there is a simple one: block google’s bots from visiting your website.

The general consensus is that this is nuts. “If you don’t appear in Google’s results, you might as well not be on the web!” is the common cry.

I’m not so sure. At least when it comes to personal websites, search isn’t how people get to your site. They get to your site from RSS, newsletters, links shared on social media or on Slack.

And isn’t it an uncomfortable feeling to think that there’s a third party service that you absolutely must appease? It’s the same kind of justification used by people who are still on Twitter even though it’s now a right-wing transphobic cesspit. “If I’m not on Twitter, I might as well not be on the web!”

The situation with Google reminds me of what Robin said about Twitter:

The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.

We can rescind our invitation to Google.

Nailspotting

I’m sure you’ve heard the law of the instrument: when all you have is a hammer, everything looks like a nail.

There’s another side to it. If you’re selling hammers, you’ll depict a world full of nails.

Recent hammers include cryptobollocks and virtual reality. It wasn’t enough for blockchains and the metaverse to be potentially useful for some situations; they staked their reputations on being utterly transformative, disrupting absolutely every facet of life.

This kind of hype is a terrible strategy in the long-term. But if you can convince enough people in the short term, you can make a killing on the stock market. In truth, the technology itself is superfluous. It’s the hype that matters. And if the hype is over-inflated enough, you can even get your critics to do your work for you, broadcasting their fears about these supposedly world-changing technologies.

You’d think we’d learn. If an industry cries wolf enough times, surely we’d become less trusting of extraordinary claims. But the tech industry continues to cry wolf—or rather, “hammer!”—at regular intervals.

The latest hammer is machine learning, usually—incorrectly—referred to as Artificial Intelligence. What makes this hype cycle particularly infuriating is that there are genuine use cases. There are some nails for this hammer. They’re just not as plentiful as the breathless hype—both positive and negative—would have you believe.

When I was hosting the DiBi conference last week, there was a little section on generative “AI” tools. Matt Garbutt covered the visual side, demoing tools like Midjourney. Scott Salisbury covered the text side, showing how you can generate code. Afterwards we had a panel discussion.

During the panel I asked some fairly straightforward questions that nobody could answer. Who owns the input (the data used by these generative tools)? Who owns the output?

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Scott explicitly warned people off using generative tools for production code. His advice was to stick to side projects for now.

Matt took a closer look at where these tools could fit into your day-to-day design work. Mostly it was pretty sensible, except when he suggested that there could be any merit to using these tools as a replacement for user testing. That’s a terrible idea. A classic hammer/nail mismatch.

I think I moderated the panel reasonably well, but I have one regret. I wish I had first read Baldur Bjarnason’s new book, The Intelligence Illusion. I started reading it on the train journey back from Edinburgh but it would have been perfect for the panel.

The Intelligence Illusion is very level-headed. It is neither pro- nor anti-AI. Instead it takes a pragmatic look at both the benefits and the risks of using these tools in your business.

It has excellent advice for spotting genuine nails. For example:

Generative AI has impressive capabilities for converting and modifying seemingly unstructured data, such as prose, images, and audio. Using these tools for this purpose has less copyright risk, fewer legal risks, and is less error prone than using it to generate original output.

Think about transcripts of videos or podcasts—an excellent use of this technology. As Baldur puts it:

The safest and, probably, the most productive way to use generative AI is to not use it as generative AI. Instead, use it to explain, convert, or modify.

He also says:

Prefer internal tools over externally-facing chatbots.

That chimes with what I’ve been seeing. The most interesting uses of this technology that I’ve seen involve a constrained dataset. Like the way Luke trained a language model on his own content to create a useful chat interface.

Anyway, The Intelligence Illusion is full of practical down-to-earth advice based on plenty of research backed up with copious citations. I’m only halfway through it and it’s already helped me separate the hype from the reality.

Browser history

I woke up today to a very annoying new bug in Firefox. The browser shits the bed in an unpredictable fashion when rounding up single pixel line widths in SVG. That’s quite a problem on The Session where all the sheet music is rendered in SVG. Those thin lines in sheet music are kind of important.

Browser bugs like these are very frustrating. There’s nothing you can do from your side other than filing a bug. The locus of control is very much with the developers of the browser.

Still, the occasional regression in a browser is a price I’m willing to pay for a plurality of rendering engines. Call me old-fashioned but I still value the ecological impact of browser diversity.

That said, I understand the argument for converging on a single rendering engine. I don’t agree with it but I understand it. It’s like this…

Back in the bad old days of the original browser wars, the browser companies just made shit up. That made life a misery for web developers. The Web Standards Project knocked some heads together. Netscape and Microsoft would agree to support standards.

So that’s where the bar was set: browsers agreed to work to the same standards, but competed by having different rendering engines.

There’s an argument to be made for raising that bar: browsers agree to work to the same standards, and have the same shared rendering engine, but compete by innovating in all other areas—the browser chrome, personalisation, privacy, and so on.

Like I said, I understand the argument. I just don’t agree with it.

One reason for zeroing in a single rendering engine is that it’s just too damned hard to create or maintain an entirely different rendering engine now that web standards are incredibly powerful and complex. Only a very large company with very deep pockets can hope to be a rendering engine player. Google. Apple. Heck, even Microsoft threw in the towel and abandoned their rendering engine in favour of Blink and V8.

And yet. Andreas Kling recently wrote about the Ladybird browser. How we’re building a browser when it’s supposed to be impossible:

The ECMAScript, HTML, and CSS specifications today are (for the most part) stellar technical documents whose algorithms can be implemented with considerably less effort and guesswork than in the past.

I’ll be watching that project with interest. Not because I plan to use the brower. I’d just like to see some evidence against the complexity argument.

Meanwhile most other browser projects are building on the raised bar of a shared browser engine. Blisk, Brave, and Arc all use Chromium under the hood.

Arc is the most interesting one. Built by the wonderfully named Browser Company of New York, it’s attempting to inject some fresh thinking into everything outside of the rendering engine.

Experiments like Arc feel like they could have more in common with tools-for-thought software like Obsidian and Roam Research. Those tools build knowledge graphs of connected nodes. A kind of hypertext of ideas. But we’ve already got hypertext tools we use every day: web browsers. It’s just that they don’t do much with the accumulated knowledge of our web browsing. Our browsing history is a boring reverse chronological list instead of a cool-looking knowledge graph or timeline.

For inspiration we can go all the way back to Vannevar Bush’s genuinely seminal 1945 article, As We May Think. Bush imagined device, the Memex, was a direct inspiration on Douglas Engelbart, Ted Nelson, and Tim Berners-Lee.

The article describes a kind of hypertext machine that worked with microfilm. Thanks to Tim Berners-Lee’s World Wide Web, we now have a global digital hypertext system that we access every day through our browsers.

But the article also described the idea of “associative trails”:

Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.

Our browsing histories are a kind of associative trail. They’re as unique as fingerprints. Even if everyone in the world started on the same URL, our browsing histories would quickly diverge.

Bush imagined that these associative trails could be shared:

The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities.

Heck, making a useful browsing history could be a real skill:

There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record.

Taking something personal and making it public isn’t a new idea. It was what drove the wave of web 2.0 startups. Before Flickr, your photos were private. Before Delicous, your bookmarks were private. Before Last.fm, what music you were listening to was private.

I’m not saying that we should all make our browsing histories public. That would be a security nightmare. But I am saying there’s a lot of untapped potential in our browsing histories.

Let’s say we keep our browsing histories private, but make better use of them.

From what I’ve seen of large language model tools, the people getting most use of out of them are training them on a specific corpus. Like, “take this book and then answer my questions about the characters and plot” or “take this codebase and then answer my questions about the code.” If you treat these chatbots as calculators for words they can be useful for some tasks.

Large language model tools are getting smaller and more portable. It’s not hard to imagine one getting bundled into a web browser. It feeds on your browsing history. The bigger your browsing history, the more useful it can be.

Except, y’know, for the times when it just make shit up.

Vannevar Bush didn’t predict a Memex that would hallucinate bits of microfilm that didn’t exist.

Steam

Picture someone tediously going through a spreadsheet that someone else has filled in by hand and finding yet another error.

“I wish to God these calculations had been executed by steam!” they cry.

The year was 1821 and technically the spreadsheet was a book of logarithmic tables. The frustrated cry came from Charles Babbage, who channeled his frustration into a scheme to create the world’s first computer.

His difference engine didn’t work out. Neither did his analytical engine. He’d spend his later years taking his frustrations out on street musicians, which—as a former busker myself—earns him a hairy eyeball from me.

But we’ve all been there, right? Some tedious task that feels soul-destroying in its monotony. Surely this is exactly what machines should be doing?

I have a hunch that this is where machine learning and large language models might turn out to be most useful. Not in creating breathtaking works of creativity, but in menial tasks that nobody enjoys.

Someone was telling me earlier today about how they took a bunch of haphazard notes in a client meeting. When the meeting was done, they needed to organise those notes into a coherent summary. Boring! But ChatGPT handled it just fine.

I don’t think that use-case is going to appear on the cover of Wired magazine anytime soon but it might be a truer glimpse of the future than any of the breathless claims being eagerly bandied about in Silicon Valley.

You know the way we no longer remember phone numbers, because, well, why would we now that we have machines to remember them for us? I’d be quite happy if machines did that for the annoying little repetitive tasks that nobody enjoys.

I’ll give you an example based on my own experience.

Regular expressions are my kryptonite. I’m rubbish at them. Any time I have to figure one out, the knowledge seeps out of my brain before long. I think that’s because I kind of resent having to internalise that knowledge. It doesn’t feel like something a human should have to know. “I wish to God these regular expressions had been calculated by steam!”

Now I can get a chatbot with a large language model to write the regular expression for me. I still need to describe what I want, so I need to write the instructions clearly. But all the gobbledygook that I’m writing for a machine now gets written by a machine. That seems fair.

Mind you, I wouldn’t blindly trust the output. I’d take that regular expression and run it through a chatbot, maybe a different chatbot running on a different large language model. “Explain what this regular expression does,” would be my prompt. If my input into the first chatbot matches the output of the second, I’d have some confidence in using the regular expression.

A friend of mine told me about using a large language model to help write SQL statements. He described his database structure to the chatbot, and then described what he wanted to select.

Again, I wouldn’t use that output without checking it first. But again, I might use another chatbot to do that checking. “Explain what this SQL statement does.”

Playing chatbots off against each other like this is kinda how machine learning works under the hood: generative adverserial networks.

Of course, the task of having to validate the output of a chatbot by checking it with another chatbot could get quite tedious. “I wish to God these large language model outputs had been validated by steam!”

Sounds like a job for machines.

Three attributes for better web forms

Forms on the web are an opportunity to make big improvements to the user experience with very little effort. The effort can be as little as sprinkling in a smattering of humble HTML attributes. But the result can be a turbo-charged experience for the user, allowing them to sail through their task.

This is particularly true on mobile devices where people have to fill in forms using a virtual keyboard. Any improvement you can make to their flow is worth investigating. But don’t worry: you don’t need to add a complex JavaScript library or write convoluted code. Well-written HTML will get you very far.

If you’re using the right input type value, you’re most of the way there. Browsers on mobile devices can use this value to infer which version of the virtual keyboard is best. So think beyond the plain text value, and use search, email, url, tel, or number when they’re appropriate.

But you can offer more hints to those browsers. Here are three attributes you can add to input elements. All three are enumerated values, which means they have a constrained vocabulary. You don’t need to have these vocabularies memorised. You can look them when you need to.

inputmode

The inputmode attribute is the most direct hint you can give about the virtual keyboard you want. Some of the values are redundant if you’re already using an input type of search, email, tel, or url.

But there might be occasions where you want a keyboard optimised for numbers but the input should also accept other characters. In that case you can use an input type of text with an inputmode value of numeric. This also means you don’t get the spinner controls on desktop browsers that you’d normally get with an input type of number. It can be quite useful to supress the spinner controls for numbers that aren’t meant to be incremented.

If you combine inputmode="numeric" with pattern="[0-9]", you’ll get a numeric keypad with no other characters.

The list of possible values for inputmode is text, numeric, decimal, search, email, tel, and url.

enterkeyhint

Whereas the inputmode attribute provides a hint about which virtual keyboard to show, the enterkeyhint attribute provides an additional hint about one specific key on that virtual keyboard: the enter key.

For search forms, you’ve got an enterkeyhint option of search, and for contact forms, you’ve got send.

The enterkeyhint only changes the labelling of the enter key. On some browsers that label is text. On others it’s an icon. But the attribute by itself doesn’t change the functionality. Even though there are enterkeyhint values of previous and next, by default the enter key will still submit the form. So those two values are less useful on long forms where the user is going from field to field, and more suitable for a series of short forms.

The list of possible values is enter, done, next, previous, go, search, and send.

autocomplete

The autocomplete attribute doesn’t have anything to do with the virtual keyboard. Instead it provides a hint to the browser about values that could pre-filled from the user’s browser profile.

Most browsers try to guess when they can they do this, but they don’t always get it right, which can be annoying. If you explicitly provide an autocomplete hint, browsers can confidently prefill the appropriate value.

Just think about how much time this can save your users!

There’s a name value you can use to get full names pre-filled. But if you have form fields for different parts of names—which I wouldn’t recommend—you’ve also got:

  • given-name,
  • additional-name,
  • family-name,
  • nickname,
  • honorific-prefix, and
  • honorific-suffix.

You might be tempted to use the nickname field for usernames, but no need; there’s a separate username value.

As with names, there’s a single tel value for telephone numbers, but also an array of sub-values if you’ve split telephone numbers up into separate fields:

  • tel-country-code,
  • tel-national,
  • tel-area-code,
  • tel-local, and
  • tel-extension.

There’s a whole host of address-related values too:

  • street-address,
  • address-line1,
  • address-line2, and
  • address-line3, but also
  • address-level1,
  • address-level2,
  • address-level3, and
  • address-level4.

If you have an international audience, addresses can get very messy if you’re trying to split them into separate parts like this.

There’s also postal-code (that’s a ZIP code for Americans), but again, if you have an international audience, please don’t make this a required field. Not every country has postal codes.

Speaking of countries, you’ve got a country-name value, but also a country value for the country’s ISO code.

Remember, the autocomplete value is specifically for the details of the current user. If someone is filling in their own address, use autocomplete. But if someone has specified that, say, a billing address and a shipping address are different, that shipping address might not be the address associated with that person.

On the subject of billing, if your form accepts credit card details, definitely use autocomplete. The values you’ll probably need are:

  • cc-name for the cardholder,
  • cc-number for the credit card number itself,
  • cc-exp for the expiry date, and
  • cc-csc for the security again.

Again, some of these values can be broken down further if you need them: cc-exp-month and cc-exp-year for the month and year of the expiry date, for example.

The autocomplete attribute is really handy for log-in forms. Definitely use the values of email or username as appropriate.

If you’re using two-factor authentication, be sure to add an autocomplete value of one-time-code to your form field. That way, the browser can offer to prefill a value from a text message. That saves the user a lot of fiddly copying and pasting. Phil Nash has more details on the Twilio blog.

Not every mobile browser offers this functionality, but that’s okay. This is classic progressive enhancement. Adding an autocomplete value won’t do any harm to a browser that doesn’t yet understand the value.

Use an autocomplete value of current-password for password fields in log-in forms. This is especially useful for password managers.

But if a user has logged in and is editing their profile to change their password, use a value of new-password. This will prevent the browser from pre-filling that field with the existing password.

That goes for sign-up forms too: use new-password. With this hint, password managers can offer to automatically generate a secure password.

There you have it. Three little HTML attributes that can help users interact with your forms. All you have to do was type a few more characters in your input elements, and users automatically get a better experience.

This is a classic example of letting the browser do the hard work for you. As Andy puts it, be the browser’s mentor, not its micromanager:

Give the browser some solid rules and hints, then let it make the right decisions for the people that visit it, based on their device, connection quality and capabilities.

This post has also been translated into French.

Supporting logical properties

I wrote recently about making the switch to logical properties over on The Session.

Initially I tried ripping the band-aid off and swapping out all the directional properties for logical properties. After all, support for logical properties is green across the board.

But then I got some reports of people seeing formating issues. These people were using Safari on devices that could no longer update their operating system. Because versions of Safari are tied to versions of the operating system, there was nothing they could do other than switch to using a different browser.

I’ve said it before and I’ll say it again, but as long as this situation continues, Safari is not an evergreen browser. (I also understand that problem lies with the OS architecture—it must be incredibly frustrating for the folks working on WebKit and/or Safari.)

So I needed to add fallbacks for older browsers that don’t support logical properties. Or, to put it another way, I needed to add logical properties as a progressive enhancement.

“No problem!” I thought. “The way that CSS works, I can just put the logical version right after the directional version.”

element {
  margin-left: 1em;
  margin-inline-start: 1em;
}

But that’s not true in this case. I’m not over-riding a value, I’m setting two different properties.

In a left-to-right language like English it’s true that margin-inline-start will over-ride margin-left. But in a right-to-left language, I’ve just set margin-left and margin-inline-start (which happens to be on the right).

This is a job for @supports!

element {
  margin-left: 1em;
}
@supports (margin-inline-start: 1em) {
  element {
    margin-left: unset;
    margin-inline-start: 1em;
  }
}

I’m doing two things inside the @supports block. I’m applying the logical property I’ve just tested for. I’m also undoing the previously declared directional property.

A value of unset is perfect for this:

The unset CSS keyword resets a property to its inherited value if the property naturally inherits from its parent, and to its initial value if not. In other words, it behaves like the inherit keyword in the first case, when the property is an inherited property, and like the initial keyword in the second case, when the property is a non-inherited property.

Now I’ve got three CSS features working very nicely together:

  1. @supports (also known as feature queries),
  2. logical properties, and
  3. the unset keyword.

For anyone using an up-to-date browser, none of this will make any difference. But for anyone who can’t update their Safari browser because they can’t update their operating system, because they don’t want to throw out their perfectly functional Apple device, they’ll continue to get the older directional properties:

I discovered that my Mom’s iPad was a 1st generation iPad Air. Apple stopped supporting that device in iOS 12, which means it was stuck with whatever version of Safari last shipped with iOS 12.

Let’s get logical

I was refactoring some CSS on The Session over the weekend. I thought it would be good to switch over to using logical properties exclusively. I did this partly to make the site more easily translatable into languages with different writing modes, but mostly as an exercise to help train me in thinking with logical properties by default.

All in all, it went pretty smoothly. You can kick the tyres by opening up dev tools on The Session and adding a writing-mode declaration to the body or html element.

For the most part, the switchover was smooth. It mostly involved swapping out property names with left, right, top, and bottom for inline-start, inline-end, block-start, and block-end.

The border-radius properties tripped me up a little. You have to use shorthand like border-start-end-radius, not border-block-start-inline-end-radius (that doesn’t exist). So you have to keep the order of the properties in mind:

border-{{block direction}}-{{inline-direction}}-radius

Speaking of shorthand, I also had to kiss some shorthand declarations goodbye. Let’s say I use this shorthand for something like margin or padding:

margin: 1em 1.5em 2em 0.5em;

Those values get applied to margin-top, margin-right, margin-bottom, and margin-left, not the logical equivalents (block-start, inline-end, block-end, and inline-start). So separate declarations are needed instead:

margin-block-start: 1em;
margin-inline-end: 1.5em;
margin-block-end: 2em;
margin-inline-start: 0.5em;

Same goes for shorthand like this:

margin: 1em 2em;

That needs to be written as two declarations:

margin-block: 1em;
margin-inline: 2em;

Now I’ve said it before and I’ll say it again: it feels really weird that you can’t use logical properties in media queries. Although as I said:

Now you could rightly argue that in this instance we’re talking about the physical dimensions of the viewport. So maybe width and height make more sense than inline and block.

But along comes the new kid on the block (or inline), container queries, ready to roll with container-type values like inline-size. I hope it’s just a matter of time until we can use logical properties in all our conditional queries.

The other place where there’s still a cognitive mismatch is in transforms and animations. We’ve got a translateX() function but no translate-inline(). We’ve got translateY() but no translate-block().

On The Session I’m using some JavaScript to figure out the details of some animation effects. I’m using methods like getBoundingClientRect(). It doesn’t return logical properties. So if I ever want to adjust my animations based on writing direction, I’ll need to fork my JavaScript code.

Oh, and one other thing: the aspect-ratio property takes values in the form of width/height, not inline/block. That makes sense if you’re dealing with images, videos, or other embedded content but it makes it really tricky to use aspect-ratio on elements that contain text. I mean, it works fine as long as the text is in a language using a top-to-bottom writing mode, but not for any other languages.

Media queries with display-mode

It’s said that the best way to learn about something is to teach it. I certainly found that to be true when I was writing the web.dev course on responsive design.

I felt fairly confident about some of the topics, but I felt somewhat out of my depth when it came to some of the newer modern additions to browsers. The last few modules in particular were unexplored areas for me, with topics like screen configurations and media features. I learned a lot about those topics by writing about them.

Best of all, I got to put my new-found knowledge to use! Here’s how…

The Session is a progressive web app. If you add it to the home screen of your mobile device, then when you launch the site by tapping on its icon, it behaves just like a native app.

In the web app manifest file for The Session, the display-mode property is set to “standalone.” That means it will launch without any browser chrome: no address bar and no back button. It’s up to me to provide the functionality that the browser usually takes care of.

So I added a back button in the navigation interface. It only appears on small screens.

Do you see the assumption I made?

I figured that the back button was most necessary in the situation where the site had been added to the home screen. That only happens on mobile devices, right?

Nope. If you’re using Chrome or Edge on a desktop device, you will be actively encourged to “install” The Session. If you do that, then just as on mobile, the site will behave like a standalone native app and launch without any browser chrome.

So desktop users who install the progressive web app don’t get any back button (because in my CSS I declare that the back button in the interface should only appear on small screens).

I was alerted to this issue on The Session:

It downloaded for me but there’s a bug, Jeremy - there doesn’t seem to be a way to go back.

Luckily, this happened as I was writing the module on media features. I knew exactly how to solve this problem because now I knew about the existence of the display-mode media feature. It allows you to write media queries that match the possible values of display-mode in a web app manifest:

.goback {
  display: none;
}
@media (display-mode: standalone) {
  .goback {
    display: inline;
  }
}

Now the back button shows up if you “install” The Session, regardless of whether that’s on mobile or desktop.

Previously I made the mistake of inferring whether or not to show the back button based on screen size. But the display-mode media feature allowed me to test the actual condition I cared about: is this user navigating in standalone mode?

If I hadn’t been writing about media features, I don’t think I would’ve been able to solve the problem. It’s a really good feeling when you’ve just learned something new, and then you immediately find exactly the right use case for it!

More writing on web.dev

Last month I wrote about writing on web.dev. At that time, the first five parts of a fourteen-part course on responsive design had been published. I’m pleased to say that the next five parts are now available. They are:

  1. Typography
  2. Responsive images
  3. The picture element
  4. Icons
  5. Theming

It wasn’t planned, but these five modules feel like they belong together. The first five modules were concerned with layout tools—media queries, flexbox, grid, and even container queries. The latest five modules are about the individual elements of design—type, colour, and images. But those elements are examined through the lens of responsiveness; responsive typography with clamp, responsive colour with prefers-color-scheme, and responsive images with picture and srcset.

The final five modules should be available later this month. In the mean time, I hope you like the first ten modules.

When service workers met framesets

Oh boy, do I have some obscure browser behaviour for you!

To set the scene…

I’ve been writing here in my online journal for almost twenty years. The official anniversary will be on September 30th. But this website has been even online longer than that, just in a very different form.

Here’s the first version of adactio.com.

Like a tour guide taking you around the ruins of some lost ancient civilisation, let me point out some interesting features:

  • Observe the .shtml file extension. That means it was once using Apache’s server-side includes, a simple way of repeating chunks of markup across pages. Scientists have been trying to reproduce the wisdom of the ancients using modern technology ever since.
  • See how the layout is 100vw and 100vh? Well, this was long before viewport units existed. In fact there is no CSS at all on that page. It’s one big table element with 100% width and 100% height.
  • So if there’s no CSS, where is the border-radius coming from? Let me introduce you to an old friend—the non-animated GIF. It’s got just enough transparency (though not proper alpha transparency) to fake rounded corners between two solid colours.
  • The management takes no responsibility for any trauma that might befall you if you view source. There you will uncover JavaScript from the dawn of time; ancient runic writing like if (navigator.appName == "Netscape")

Now if your constitution was able to withstand that, brace yourself for what happens when you click on either of the two links, deutsch or english.

You find yourself inside a frameset. You may also experience some disorienting “DHTML”—the marketing term given to any combination of JavaScript and positioning in the late ’90s.

Note that these are not iframes, they are frames. Different thing. You could create single page apps long before Ajax was a twinkle in Jesse James Garrett’s eye.

If you view source, you’ll see a React-like component system. Each frameset component contains frame components that are isolated from one another. They’re like web components. Each frame has its own (non-shadow) DOM. That’s because each frame is actually a separate web page. If you right-click on any of the frames, your browser should give the option to view the framed document in its own tab or window.

Now for the part where modern and ancient technologies collide…

If you’re looking at the frameset URL in Firefox or Safari, everything displays as it should in all its ancient glory. But if you’re looking in Google Chrome and you’ve visited adactio.com before, something very odd happens.

Each frame of the frameset displays my custom offline page. The only way that could be served up is through my service worker script. You can verify this by opening the framest URL in an incognito window—everything works fine when no service worker has been registered.

I have no idea why this is happening. My service worker logic is saying “if there’s a request for a web page, try fetching it from the network, otherwise look in the cache, otherwise show an offline page.” But if those page requests are initiated by a frame element, it goes straight to showing the offline page.

Is this a bug? Or perhaps this is the correct behaviour for some security reason? I have no idea.

I wonder if anyone has ever come across this before. It’s a very strange combination of factors:

  • a domain served over HTTPS,
  • that registers a service worker,
  • but also uses framesets and frames.

I could submit a bug report about this but I fear I would be laughed out of the bug tracker.

Still …the World Wide Web is remarkable for its backward compatibility. This behaviour is unusual because browser makers are at pains to support existing content and never break the web.

Technically a modern website (one that registers a service worker) shouldn’t be using deprecated technology like frames. But browsers still need to be able support those old technologies in order to render old websites.

This situation has only arisen because the same domain—adactio.com—is host to a modern website and a really old one.

Maybe Chrome is behaving strangely because I’ve built my online home on ancient burial ground.

Update: Both Remy and Jake did some debugging and found the issue…

It’s all to do with navigation preloads and the value of event.preloadResponse, which I believe is only supported in Chrome which would explain the differences between browsers.

According to this post by Jake:

event.preloadResponse is a promise that resolves with a response, if:

  • Navigation preload is enabled.
  • The request is a GET request.
  • The request is a navigation request (which browsers generate when they’re loading pages, including iframes).

Otherwise event.preloadResponse is still there, but it resolves with undefined.

Notice that iframes are mentioned, but not frames.

My code was assuming that if event.preloadRepsonse exists in my block of code for responding to page requests, then there’d be a response. But if the request was initiated from a frameset, it is a request for a page and event.preloadRepsonse does exist …but it’s undefined.

I’ve updated my code now to check this assumption (and fall back to fetch).

This may technically still be a bug though. Shouldn’t a page loaded from a frameset count as a navigation request?