Journal tags: mission

10

Crawlers

A few months back, I wrote about how Google is breaking its social contract with the web, harvesting our content not in order to send search traffic to relevant results, but to feed a large language model that will spew auto-completed sentences instead.

I still think Chris put it best:

I just think it’s fuckin’ rude.

When it comes to the crawlers that are ingesting our words to feed large language models, Neil Clarke describes the situtation:

It should be strictly opt-in. No one should be required to provide their work for free to any person or organization. The online community is under no responsibility to help them create their products. Some will declare that I am “Anti-AI” for saying such things, but that would be a misrepresentation. I am not declaring that these systems should be torn down, simply that their developers aren’t entitled to our work. They can still build those systems with purchased or donated data.

Alas, the current situation is opt-out. The onus is on us to update our robots.txt file.

Neil handily provides the current list to add to your file. Pass it on:

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

In theory you should be able to group those user agents together, but citation needed on whether that’s honoured everywhere:

User-agent: CCBot
User-agent: ChatGPT-User
User-agent: GPTBot
User-agent: Google-Extended
User-agent: Omgilibot
User-agent: FacebookBot
Disallow: /

There’s a bigger issue with robots.txt though. It too is a social contract. And as we’ve seen, when it comes to large language models, social contracts are being ripped up by the companies looking to feed their beasts.

As Jim says:

I realized why I hadn’t yet added any rules to my robots.txt: I have zero faith in it.

That realisation was prompted in part by Manuel Moreale’s experiment with blocking crawlers:

So, what’s the takeaway here? I guess that the vast majority of crawlers don’t give a shit about your robots.txt.

Time to up the ante. Neil’s post offers an option if you’re running Apache. Either in .htaccess or in a .conf file, you can block user agents using mod_rewrite:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (CCBot|ChatGPT|GPTBot|Omgilibot| FacebookBot) [NC]
RewriteRule ^ – [F]

You’ll see that Google-Extended isn’t that list. It isn’t a crawler. Rather it’s the permissions model that Google have implemented for using your site’s content to train large language models: unless you opt out via robots.txt, it’s assumed that you’re totally fine with your content being used to feed their stochastic parrots.

Permission

Back when the web was young, it wasn’t yet clear what the rules were. Like, could you really just link to something without asking permission?

Then came some legal rulings to establish that, yes, on the web you can just link to anything without checking if it’s okay first.

What about search engines and directories? Technically they’re rifling through all the stuff we publish and reposting snippets of it. Is that okay?

Again, through some legal precedents—but mostly common agreement—everyone decided that on balance it was fine. After all, those snippets they publish are helping your site get traffic.

In short order, search came to rule the web. And Google came to rule search.

The mutually beneficial arrangement persisted uneasily. Despite Google’s search results pages getting worse and worse in recent years, the company’s huge market share of search means you generally want to be in their good books.

Google’s business model relies on us publishing web pages so that they can put ads around the search results linking to that content, and we rely on Google to send people to our websites by responding smartly to search queries.

That has now changed. Instead of responding to search queries by linking to the web pages we’ve made, Google is instead generating dodgy summaries rife with hallucina… lies (a psychic hotline, basically).

Google still benefits from us publishing web pages. We no longer benefit from Google slurping up those web pages.

With AI, tech has broken the web’s social contract:

Google has steadily been manoeuvring their search engine results to more and more replace the pages in the results.

As Chris puts it:

Me, I just think it’s fuckin’ rude.

Google is a portal to the web. Google is an amazing tool for finding relevant websites to go to. That was useful when it was made, and it’s nothing but grown in usefulness. Google should be encouraging and fighting for the open web. But now they’re like, actually we’re just going to suck up your website, put it in a blender with all other websites, and spit out word smoothies for people instead of sending them to your website. Instead.

Ben proposes an update to robots.txt that would allow us to specify licensing information:

Robots.txt needs an update for the 2020s. Instead of just saying what content can be indexed, it should also grant rights.

Like crawl my site only to provide search results not train your LLM.

It’s a solid proposal. But Google has absolutely no incentive to implement it. They hold all the power.

Or do they?

There is still the nuclear option in robots.txt:

User-agent: Googlebot
Disallow: /

That’s what Vasilis is doing:

I have been looking for ways to not allow companies to use my stuff without asking, and so far I coulnd’t find any. But since this policy change I realised that there is a simple one: block google’s bots from visiting your website.

The general consensus is that this is nuts. “If you don’t appear in Google’s results, you might as well not be on the web!” is the common cry.

I’m not so sure. At least when it comes to personal websites, search isn’t how people get to your site. They get to your site from RSS, newsletters, links shared on social media or on Slack.

And isn’t it an uncomfortable feeling to think that there’s a third party service that you absolutely must appease? It’s the same kind of justification used by people who are still on Twitter even though it’s now a right-wing transphobic cesspit. “If I’m not on Twitter, I might as well not be on the web!”

The situation with Google reminds me of what Robin said about Twitter:

The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.

We can rescind our invitation to Google.

Mars distracts

A few years ago, I wrote about how much I enjoyed the book Aurora by Kim Stanley Robinson.

Not everyone liked that book. A lot of people were put off by its structure, in which the dream of interstellar colonisation meets the harsh truth of reality and the book follows where that leads. It pours cold water over the very idea of humanity becoming interplanetary.

But our own solar system is doable, right? I mean, Kim Stanley Robinson is the guy who wrote the Mars trilogy and 2312, both of which depict solar system colonisation in just a few centuries.

I wonder if the author might regret the way that some have taken his Mars trilogy as a sort of manual, Torment Nexus style. Kim Stanley Robinson is very much concerned with this planet in this time period, but others use his work to do the opposite.

But the backlash to Mars has begun.

Maciej wrote Why Not Mars:

The goal of this essay is to persuade you that we shouldn’t send human beings to Mars, at least not anytime soon. Landing on Mars with existing technology would be a destructive, wasteful stunt whose only legacy would be to ruin the greatest natural history experiment in the Solar System. It would no more open a new era of spaceflight than a Phoenician sailor crossing the Atlantic in 500 B.C. would have opened up the New World. And it wouldn’t even be that much fun.

Manu Saadia is writing a book about humanity in space, and he has a corresponding newsletter called Against Mars: Space Colonization and its Discontents:

What if space colonization was merely science-fiction, a narrative, or rather a meta-narrative, a myth, an ideology like any other? And therefore, how and why did it catch on? What is so special and so urgent about space colonization that countless scientists, engineers, government officials, billionaire oligarchs and indeed, entire nations, have committed work, ingenuity and treasure to make it a reality.

What if, and hear me out, space colonization was all bullshit?

I mean that quite literally. No hyperbole. Once you peer under the hood, or the nose, of the rocket ship, you encounter a seemingly inexhaustible supply of ghoulish garbage.

Two years ago, Shannon Stirone went into the details of why Mars Is a Hellhole

The central thing about Mars is that it is not Earth, not even close. In fact, the only things our planet and Mars really have in common is that both are rocky planets with some water ice and both have robots (and Mars doesn’t even have that many).

Perhaps the most damning indictment of the case for Mars colonisation is that its most ardent advocate turns out to be an idiotic small-minded eugenicist who can’t even run a social media company, much less a crewed expedition to another planet.

But let’s be clear: we’re talking here about the proposition of sending humans to Mars—ugly bags of mostly water that probably wouldn’t survive. Robots and other uncrewed missions in our solar system …more of that, please!

Install prompt

There’s an interesting thread on Github about the tongue-twistingly named beforeinstallpromt JavaScript event.

Let me back up…

Progressive web apps. You know what they are, right? They’re websites that have taken their vitamins. Specifically, they’re responsive websites that:

  1. are served over HTTPS,
  2. have a web app manifest, and
  3. have a service worker handling the offline scenario.

The web app manifest—a JSON file of metadata—is particularly useful for describing how your site should behave if someone adds it to their home screen. You can specify what icon should be used. You can specify whether the site should launch in a browser or as a standalone app (practically indistinguishable from a native app). You can specify which URL on the site should be used as the starting point when the site is launched from the home screen.

So progressive web apps work just fine when you visit them in a browser, but they really shine when you add them to your home screen. It seems like pretty much everyone is in agreement that adding a progressive web app to your home screen shouldn’t be an onerous task. But how does the browser let the user know that it might be a good idea to “install” the web site they’re looking at?

The Samsung Internet browser does ambient badging—a + symbol shows up to indicate that a website can be installed. This is a great approach!

I hope that Chrome on Android will also use ambient badging at some point. To start with though, Chrome notified users that a site was installable by popping up a notification at the bottom of the screen. I think these might be called “toasts”.

Getting the “add to home screen” prompt for https://huffduffer.com/ on Android Chrome. And there’s the “add to home screen” prompt for https://html5forwebdesigners.com/ HTTPS + manifest.json + Service Worker = “Add to Home Screen” prompt. Add to home screen.

Needless to say, the toast notification wasn’t very effective. That’s because we web designers and developers have spent years teaching people to immediately dismiss those notifications without even reading them. Accept our cookies! Sign up to our newsletter! Install our native app! Just about anything that’s user-hostile gets put in a notification (either a toast or an overlay) and shoved straight in the user’s face before they’ve even had time to start reading the content they came for in the first place. Users will then either:

  1. turn around and leave, or
  2. use muscle memory reach for that X in the corner of the notification.

A tiny fraction of users might actually click on the call to action, possibly by mistake.

Chrome didn’t abandon the toast notification for progressive web apps, but it did change when they would appear. Rather than the browser deciding when to show the prompt—usually when the user has just arrived on the site—a new JavaScript event called beforeinstallprompt can be used.

It’s a bit weird though. You have to “capture” the event that fires when the prompt would have normally been shown, subdue it, hold on to that event, and then re-release it when you think it should be shown (like when the user has completed a transaction, for example, and having your site on the home screen would genuinely be useful). That’s a lot of hoops. Here’s the code I use on The Session to only show the installation prompt to users who are logged in.

The end result is that the user is still shown a toast notification, but at least this time it’s the site owner who has decided when it will be shown. The Chrome team call this notification “the mini-info bar”, and Pete acknowledges that it’s not ideal:

The mini-infobar is an interim experience for Chrome on Android as we work towards creating a consistent experience across all platforms that includes an install button into the omnibox.

I think “an install button in the omnibox” means ambient badging in the browser interface, which would be great!

Anyway, back to that thread on Github. Basically, neither Apple nor Mozilla are going to implement the beforeinstallprompt event (well, technically Mozilla have implemented it but they’re not going to ship it). That’s fair enough. It’s an interim solution that’s not ideal for all reasons I’ve already covered.

But there’s a lot of pushback. Even if the details of beforeinstallprompt are troublesome, surely there should be some way for site owners to let users know that can—or should—install a progressive web app? As a site owner, I have a lot of sympathy for that viewpoint. But I also understand the security and usability issues that can arise from bad actors abusing this mechanism.

Still, I have to hand it to Chrome: even if we put the beforeinstallprompt event to one side, the browser still has a mechanism for letting users know that a progressive web app can be installed—the mini info bar. It’s not a great mechanism, but it’s better than nothing. Nothing is precisely what Firefox and Safari currently offer (though Firefox is experimenting with something).

In the case of Safari, not only do they not provide a mechanism for letting the user know that a site can be installed, but since the last iOS update, they’ve buried the “add to home screen” option even deeper in the “sharing sheet” (the list of options that comes up when you press the incomprehensible rectangle-with-arrow-emerging-from-it icon). You now have to scroll below the fold just to find the “add to home screen” option.

So while I totally get the misgivings about beforeinstallprompt, I feel that a constructive alternative wouldn’t go amiss.

And that’s all I have to say about that.

Except… there’s another interesting angle to that Github thread. There’s talk of allowing sites that are launched from the home screen to have access to more features than a site inside a web browser. Usually permissions on the web are explicitly granted or denied on a case-by-case basis: geolocation; notifications; camera access, etc. I think this is the first time I’ve heard of one action—adding to the home screen—being used as a proxy for implicitly granting more access. Very interesting. Although that idea seems to be roundly rejected here:

A key argument for using installation in this manner is that some APIs are simply so powerful that the drive-by web should not be able to ask for them. However, this document takes the position that installation alone as a restriction is undesirable.

Then again:

I understand that Chromium or Google may hold such a position but Apple’s WebKit team may not necessarily agree with such a position.

Periodic background sync

Yesterday I wrote about how much I’d like to see silent push for the web:

I’d really like silent push for the web—the ability to update a cache with fresh content as soon as it’s published; that would be nifty! At the same time, I understand the concerns. It feels more powerful than other permission-based APIs like notifications.

Today, John Holt Ripley responded on Twitter:

hi there, just read your blog post about Silent Push for acthe web, and wondering if Periodic Background Sync would cover a few of those use cases?

Periodic background sync looks very interesting indeed!

It’s not the same as silent push. As the name suggests, this is about your service worker waking up periodically and potentially fetching (and caching) fresh content from the network. So the service worker is polling rather than receiving a push. But I’ll take it! It’s definitely close enough for the kind of use-cases I’ve been thinking about.

Interestingly, periodic background sync also ties into the other part of what I was writing about: permissions. I mentioned that adding a site the home screen could be interpreted as a signal to potentially allow more permissions (or at least allow prompts for more permissions).

Well, Chromium has a document outlining metrics for attempting to gauge site engagement. There’s some good thinking in there.

Silent push for the web

After Indie Web Camp in Berlin last year, I wrote about Seb’s nifty demo of push without notifications:

While I’m very unwilling to grant permission to be interrupted by intrusive notifications, I’d be more than willing to grant permission to allow a website to silently cache timely content in the background. It would be a more calm technology.

Phil Nash left a comment on the Medium copy of my post explaining that Seb’s demo of using the Push API without showing a notification wouldn’t work for long:

The browsers allow a certain number of mistakes(?) before they start to show a generic notification to say that your site sent a push notification without showing a notification. I believe that after ~10 or so notifications, and that’s different between browsers, they run out of patience.

He also provided me with the name to describe what I’m after:

You’re looking for “silent push” as are many others.

Silent push is something that is possible in native apps. It isn’t (yet?) available on the web, presumably because of security concerns.

It’s an API that would ripe for abuse. I mean, just look at the mess we’ve made with APIs like notifications and geolocation. Sure, they require explicit user opt-in, but these opt-ins are seen so often that users are sick of seeing them. Silent push would be one more permission-based API to add to the stack of annoyances.

Still, I’d really like silent push for the web—the ability to update a cache with fresh content as soon as it’s published; that would be nifty! At the same time, I understand the concerns. It feels more powerful than other permission-based APIs like notifications.

Maybe there could be another layer of permissions. What if adding a site to your home screen was the first step? If a site is running on HTTPS, has a service worker, has a web app manifest, and has been added to the homescreen, maybe then and only then should it be allowed to prompt for permission to do silent push.

In other words, what if certain very powerful APIs were only available to progressive web apps that have successfully been added to the home screen?

Frankly, I’d be happy if the same permissions model applied to web notifications too, but I guess that ship has sailed.

Anyway, all this is pure conjecture on my part. As far as I know, silent push isn’t on the roadmap for any of the browser vendors right now. That’s fair enough. Although it does annoy me that native apps have this capability that web sites don’t.

It used to be that there was a long list of features that only native apps could do, but that list has grown shorter and shorter. The web’s hare is catching up to native’s tortoise.

Summer of Apollo

It’s July, 2019. You know what that means? The 50th anniversary of the Apollo 11 mission is this month.

I’ve already got serious moon fever, and if you’d like to join me, I have some recommendations…

Watch the Apollo 11 documentary in a cinema. The 70mm footage is stunning, the sound design is immersive, the music is superb, and there’s some neat data visualisation too. Watching a preview screening in the Duke of York’s last week was pure joy from start to finish.

Listen to 13 Minutes To The Moon, the terrific ongoing BBC podcast by Kevin Fong. It’s got all my favourite titans of NASA: Michael Collins, Margaret Hamilton, and Charlie Duke, amongst others. And it’s got music by Hans Zimmer.

Experience the website Apollo 11 In Real Time on the biggest monitor you can. It’s absolutely wonderful! From July 16th, you can experience the mission timeshifted by exactly 50 years, but if you don’t want to wait, you can dive in right now. It genuinely feels like being in Mission Control!

Push without notifications

On the first day of Indie Web Camp Berlin, I led a session on going offline with service workers. This covered all the usual use-cases: pre-caching; custom offline pages; saving pages for offline reading.

But on the second day, Sebastiaan spent a fair bit of time investigating a more complex use of service workers with the Push API.

The Push API is what makes push notifications possible on the web. There are a lot of moving parts—browser, server, service worker—and, frankly, it’s way over my head. But I’m familiar with the general gist of how it works. Here’s a typical flow:

  1. A website prompts the user for permission to send push notifications.
  2. The user grants permission.
  3. A whole lot of complicated stuff happens behinds the scenes.
  4. Next time the website publishes something relevant, it fires a push message containing the details of the new URL.
  5. The user’s service worker receives the push message (even if the site isn’t open).
  6. The service worker creates a notification linking to the URL, interrupting the user, and generally adding to the weight of information overload.

Here’s what Sebastiaan wanted to investigate: what if that last step weren’t so intrusive? Here’s the alternate flow he wanted to test:

  1. A website prompts the user for permission to send push notifications.
  2. The user grants permission.
  3. A whole lot of complicated stuff happens behinds the scenes.
  4. Next time the website publishes something relevant, it fires a push message containing the details of the new URL.
  5. The user’s service worker receives the push message (even if the site isn’t open).
  6. The service worker fetches the contents of the URL provided in the push message and caches the page. Silently.

It worked.

I think this could be a real game-changer. I don’t know about you, but I’m very, very wary of granting websites the ability to send me push notifications. In fact, I don’t think I’ve ever given a website permission to interrupt me with push notifications.

You’ve seen the annoying permission dialogues, right?

In Firefox, it looks like this:

Will you allow name-of-website to send notifications?

[Not Now] [Allow Notifications]

In Chrome, it’s:

name-of-website wants to

Show notifications

[Block] [Allow]

But in actual fact, these dialogues are asking for permission to do two things:

  1. Receive messages pushed from the server.
  2. Display notifications based on those messages.

There’s no way to ask for permission just to do the first part. That’s a shame. While I’m very unwilling to grant permission to be interrupted by intrusive notifications, I’d be more than willing to grant permission to allow a website to silently cache timely content in the background. It would be a more calm technology.

Think of the use cases:

  • I grant push permission to a magazine. When the magazine publishes a new article, it’s cached on my device.
  • I grant push permission to a podcast. Whenever a new episode is published, it’s cached on my device.
  • I grant push permission to a blog. When there’s a new blog post, it’s cached on my device.

Then when I’m on a plane, or in the subway, or in any other situation without a network connection, I could still visit these websites and get content that’s fresh to me. It’s kind of like background sync in reverse.

There’s plenty of opportunity for abuse—the cache could get filled with content. But websites can already do that, and they don’t need to be granted any permissions to do so; just by visiting a website, it can add multiple files to a cache.

So it seems that the reason for the permissions dialogue is all about displaying notifications …not so much about receiving push messages from the server.

I wish there were a way to implement this background-caching pattern without requiring the user to grant permission to a dialogue that contains the word “notification.”

I wonder if the act of adding a site to the home screen could implicitly grant permission to allow use of the Push API without notifications?

In the meantime, the proposal for periodic synchronisation (using background sync) could achieve similar results, but in a less elegant way; periodically polling for new content instead of receiving a push message when new content is published. Also, it requires permission. But at least in this case, the permission dialogue should be more specific, and wouldn’t include the word “notification” anywhere.

August in America, day eighteen

UX Week kicked off today. It’s a four-day event: one day of talks, followed by two days of workshops, followed by another day of talks. I’ll be spending all of the third day doing workshops back-to-back.

Bizarrely, even though it’s a four-day event, they only offer speakers three nights of accommodation. Seems odd to me: I would’ve thought they’d want us to stick around for the whole thing.

So, as I don’t get my hotel room until tomorrow, today I had to make my way from Tantek’s place in the Haight all the way over to the Mission Bay Conference Center—a fairly long MUNI ride. Alas, that meant I missed Steven Johnson’s opening talk. Curses!

Fortunately I did make it time for Ian Bogost’s talk, which was excellent.

In the afternoon, I walked over to Four Barrel, the excellent coffee shop that was celebrating its fifth birthday. They had a balloons, a photo both, a petting zoo, games, and best of all, free coffee. Tom popped by and we had a lovely time chatting in the sun (and drinking free coffee).

Seeing as I was in the Mission anyway, it would’ve been crazy not to have a mission burrito, so a trip to Papalote quickly followed. Best of all, Erin popped by. Then, as we were heading home via Dolores Park, we met up with Tess. Just like I hoped!

Twitter permissions

Twitter has come in for a lot of (justifiable) criticism for changes to its API that make it somewhat developer-hostile. But it has to be said that developers don’t always behave responsibly when they’re using the API.

The classic example of this is the granting of permissions. James summed it up nicely: it’s just plain rude to ask for write-access to my Twitter account before I’ve even started to use your service. I could understand it if the service needed to post to my timeline, but most of the time these services claim that they want me to sign up via Twitter so that I can find my friends who are also using the service — that doesn’t require write access. Quite often, these requests to authenticate are accompanied by reassurances like “we’ll never tweet without your permission” …in which case, why ask for write-access in the first place?

To be fair, it used to be a lot harder to separate out read and write permissions for Twitter authentication. But now it’s actually not that bad, although it’s still not as granular as it could be.

One of the services that used to require write-access to my Twitter account was Lanyrd. I gave it permission, but only because I knew the people behind the service (a decision-making process that doesn’t scale very well). I always felt uneasy that Lanyrd had write-access to my timeline. Eventually I decided that I couldn’t in good conscience allow the lovely Lanyrd people to be an exception just because I knew where they lived. Fortunately, they concurred with my unease. They changed their log-in system so that it only requires read-access. If and when they need write-access, that’s the point at which they ask for it:

We now ask for read-only permission the first time you sign in, and only ask to upgrade to write access later on when you do something that needs it; for example following someone on Twitter from the our attendee directory.

Far too many services ask for write-access up front, without providing a justification. When asked for an explanation, I’m sure most of them would say “well, that’s how everyone else does it”, and they would, alas, be correct.

What’s worse is that users grant write-access so freely. I was somewhat shocked by the amount of tech-savvy friends who unwittingly spammed my timeline with automated tweets from a service called Twitter Counter. Their reactions ranged from sheepish to embarrassed to angry.

I urge you to go through your Twitter settings and prune any services that currently have write-access that don’t actually need it. You may be surprised by the sheer volume of apps that can post to Twitter on your behalf. Do you trust them all? Are you certain that they won’t be bought up by a different, less trustworthy company?

If a service asks me to sign up but insists on having write-access to my Twitter account, it feels like being asked out on a date while insisting I sign a pre-nuptial agreement. Not only is somewhat premature, it shows a certain lack of respect.

Not every service behaves so ungallantly. Done Not Done, 1001 Beers, and Mapalong all use Twitter for log-in, but none of them require write-access up-front.

Branch and Medium are typical examples of bad actors in this regard. The core functionality of these sites has nothing to do with posting to Twitter, but both sites want write-access so that they can potentially post to Twitter on my behalf later on. I know that I won’t ever want either service to do that. I can either trust them, or not use the service at all. Signing up without granting write-access to my Twitter account isn’t an option.

I sent some feedback to Branch and part of their response was to say the problem was with the way Twitter lumps permissions together. That used to be true, but Lanyrd’s exemplary use of Twitter for log-in makes that argument somewhat hollow.

In the case of Branch, Medium, and many other services, Twitter authentication is the only way to sign up and start using the service. Using a username and password isn’t an option. On the face of it, requiring Twitter for authentication doesn’t sound all that different to requiring an email address for authentication. But demanding write-access to Twitter is the equivalent of demanding the ability to send emails from your email address.

The way that so many services unnecessarily ask for write-access to Twitter—and the way that so many users unquestioningly grant it—reminds me of the password anti-pattern all over again. Because this rude behaviour is so prevalent, it has now become the norm. If we want this situation to change, we need to demand more respect.

The next time that a service demands unwarranted write-access to your Twitter account, refuse to grant it. Then tell the people behind that service why you’re refusing to sign up.

And please take a moment to go through the services you’ve already authorised.