Journal tags: chat

7

Continuous partial ick

The output of generative tools based on large language models gives me the ick.

This isn’t a measured logical response. It’s more of an involuntary emotional reaction.

I could try to justify my reaction by saying I’m concerned about the exploitation involved in the training data, or the huge energy costs involved, or the disenfranchisement of people who create art. But those would be post-facto rationalisations.

I just find myself wrinkling my nose and mentally going “Ew!” whenever somebody posts the output of some prompt they gave to ChatGPT or Midjourney.

Again, I’m not saying this is rational. It’s more instinctual.

You could well say that this is my problem. You may be right. But I wonder what it is that’s so unheimlich about these outputs that triggers my response.

Just to clarify, I am talking about direct outputs, shared verbatim. If someone were to use one of these tools in the process of creating something I’d be none the wiser. I probably couldn’t even tell that a large language model was involved at some point. I’m fine with that. It’s when someone takes something directly from one of these tools and then shares it online, that’s what raises my bile.

I was at a conference a few months back where your badge featured a hallucinated picture of you. Now, this probably sounded like a fun idea. It probably is a fun idea. I can’t tell. All I know is that it made me feel a little queasy.

Perhaps it’s a question of taste. In which case, I’m being a snob. I’m literally turning my nose up at something I deem to be tacky.

But isn’t it tacky, though? It’s not something I can describe, but there’s just something about the vibe of these images—and words—that feels off. It’s sort of creepy, but it’s mostly just the mediocrity that sits so uneasily with me.

These tools do an amazing job of solving the quantity problem—how to produce an image or piece of text quickly. And by most measurements, you could say that they also solve the quality problem. These outputs are good enough to pass for “the real thing.” The outputs are, like, 90% to 95% there. And the gap is closing.

And yet. There’s something in that gap. Something that I feel in my gut. Something that makes me go “nope.”

Simon’s rule

I got a nice email from someone regarding my recent posts about performance on The Session. They said:

I hope this message finds you well. First and foremost, I want to express how impressed I am with the overall performance of https://thesession.org/. It’s a fantastic resource for music enthusiasts like me.

How nice! I responded, thanking them for the kind words.

They sent a follow-up clarification:

Awesome, anyway there was an issue in my message.

The line ‘It’s a fantastic resource for music enthusiasts like me.’ added by chatGPT and I didn’t notice.

I imagine this is what it feels like when you’re on a phone call with someone and towards the end of the call you hear a distinct flushing sound.

I wrote back and told them about Simon’s rule:

I will not publish anything that takes someone else longer to read than it took me to write.

That just feels so rude!

I think that’s a good rule.

Browser history

I woke up today to a very annoying new bug in Firefox. The browser shits the bed in an unpredictable fashion when rounding up single pixel line widths in SVG. That’s quite a problem on The Session where all the sheet music is rendered in SVG. Those thin lines in sheet music are kind of important.

Browser bugs like these are very frustrating. There’s nothing you can do from your side other than filing a bug. The locus of control is very much with the developers of the browser.

Still, the occasional regression in a browser is a price I’m willing to pay for a plurality of rendering engines. Call me old-fashioned but I still value the ecological impact of browser diversity.

That said, I understand the argument for converging on a single rendering engine. I don’t agree with it but I understand it. It’s like this…

Back in the bad old days of the original browser wars, the browser companies just made shit up. That made life a misery for web developers. The Web Standards Project knocked some heads together. Netscape and Microsoft would agree to support standards.

So that’s where the bar was set: browsers agreed to work to the same standards, but competed by having different rendering engines.

There’s an argument to be made for raising that bar: browsers agree to work to the same standards, and have the same shared rendering engine, but compete by innovating in all other areas—the browser chrome, personalisation, privacy, and so on.

Like I said, I understand the argument. I just don’t agree with it.

One reason for zeroing in a single rendering engine is that it’s just too damned hard to create or maintain an entirely different rendering engine now that web standards are incredibly powerful and complex. Only a very large company with very deep pockets can hope to be a rendering engine player. Google. Apple. Heck, even Microsoft threw in the towel and abandoned their rendering engine in favour of Blink and V8.

And yet. Andreas Kling recently wrote about the Ladybird browser. How we’re building a browser when it’s supposed to be impossible:

The ECMAScript, HTML, and CSS specifications today are (for the most part) stellar technical documents whose algorithms can be implemented with considerably less effort and guesswork than in the past.

I’ll be watching that project with interest. Not because I plan to use the brower. I’d just like to see some evidence against the complexity argument.

Meanwhile most other browser projects are building on the raised bar of a shared browser engine. Blisk, Brave, and Arc all use Chromium under the hood.

Arc is the most interesting one. Built by the wonderfully named Browser Company of New York, it’s attempting to inject some fresh thinking into everything outside of the rendering engine.

Experiments like Arc feel like they could have more in common with tools-for-thought software like Obsidian and Roam Research. Those tools build knowledge graphs of connected nodes. A kind of hypertext of ideas. But we’ve already got hypertext tools we use every day: web browsers. It’s just that they don’t do much with the accumulated knowledge of our web browsing. Our browsing history is a boring reverse chronological list instead of a cool-looking knowledge graph or timeline.

For inspiration we can go all the way back to Vannevar Bush’s genuinely seminal 1945 article, As We May Think. Bush imagined device, the Memex, was a direct inspiration on Douglas Engelbart, Ted Nelson, and Tim Berners-Lee.

The article describes a kind of hypertext machine that worked with microfilm. Thanks to Tim Berners-Lee’s World Wide Web, we now have a global digital hypertext system that we access every day through our browsers.

But the article also described the idea of “associative trails”:

Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.

Our browsing histories are a kind of associative trail. They’re as unique as fingerprints. Even if everyone in the world started on the same URL, our browsing histories would quickly diverge.

Bush imagined that these associative trails could be shared:

The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities.

Heck, making a useful browsing history could be a real skill:

There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record.

Taking something personal and making it public isn’t a new idea. It was what drove the wave of web 2.0 startups. Before Flickr, your photos were private. Before Delicous, your bookmarks were private. Before Last.fm, what music you were listening to was private.

I’m not saying that we should all make our browsing histories public. That would be a security nightmare. But I am saying there’s a lot of untapped potential in our browsing histories.

Let’s say we keep our browsing histories private, but make better use of them.

From what I’ve seen of large language model tools, the people getting most use of out of them are training them on a specific corpus. Like, “take this book and then answer my questions about the characters and plot” or “take this codebase and then answer my questions about the code.” If you treat these chatbots as calculators for words they can be useful for some tasks.

Large language model tools are getting smaller and more portable. It’s not hard to imagine one getting bundled into a web browser. It feeds on your browsing history. The bigger your browsing history, the more useful it can be.

Except, y’know, for the times when it just make shit up.

Vannevar Bush didn’t predict a Memex that would hallucinate bits of microfilm that didn’t exist.

Steam

Picture someone tediously going through a spreadsheet that someone else has filled in by hand and finding yet another error.

“I wish to God these calculations had been executed by steam!” they cry.

The year was 1821 and technically the spreadsheet was a book of logarithmic tables. The frustrated cry came from Charles Babbage, who channeled his frustration into a scheme to create the world’s first computer.

His difference engine didn’t work out. Neither did his analytical engine. He’d spend his later years taking his frustrations out on street musicians, which—as a former busker myself—earns him a hairy eyeball from me.

But we’ve all been there, right? Some tedious task that feels soul-destroying in its monotony. Surely this is exactly what machines should be doing?

I have a hunch that this is where machine learning and large language models might turn out to be most useful. Not in creating breathtaking works of creativity, but in menial tasks that nobody enjoys.

Someone was telling me earlier today about how they took a bunch of haphazard notes in a client meeting. When the meeting was done, they needed to organise those notes into a coherent summary. Boring! But ChatGPT handled it just fine.

I don’t think that use-case is going to appear on the cover of Wired magazine anytime soon but it might be a truer glimpse of the future than any of the breathless claims being eagerly bandied about in Silicon Valley.

You know the way we no longer remember phone numbers, because, well, why would we now that we have machines to remember them for us? I’d be quite happy if machines did that for the annoying little repetitive tasks that nobody enjoys.

I’ll give you an example based on my own experience.

Regular expressions are my kryptonite. I’m rubbish at them. Any time I have to figure one out, the knowledge seeps out of my brain before long. I think that’s because I kind of resent having to internalise that knowledge. It doesn’t feel like something a human should have to know. “I wish to God these regular expressions had been calculated by steam!”

Now I can get a chatbot with a large language model to write the regular expression for me. I still need to describe what I want, so I need to write the instructions clearly. But all the gobbledygook that I’m writing for a machine now gets written by a machine. That seems fair.

Mind you, I wouldn’t blindly trust the output. I’d take that regular expression and run it through a chatbot, maybe a different chatbot running on a different large language model. “Explain what this regular expression does,” would be my prompt. If my input into the first chatbot matches the output of the second, I’d have some confidence in using the regular expression.

A friend of mine told me about using a large language model to help write SQL statements. He described his database structure to the chatbot, and then described what he wanted to select.

Again, I wouldn’t use that output without checking it first. But again, I might use another chatbot to do that checking. “Explain what this SQL statement does.”

Playing chatbots off against each other like this is kinda how machine learning works under the hood: generative adverserial networks.

Of course, the task of having to validate the output of a chatbot by checking it with another chatbot could get quite tedious. “I wish to God these large language model outputs had been validated by steam!”

Sounds like a job for machines.

Disclosure

You know how when you’re on hold to any customer service line you hear a message that thanks you for calling and claims your call is important to them. The message always includes a disclaimer about calls possibly being recorded “for training purposes.”

Nobody expects that any training is ever actually going to happen—surely we would see some improvement if that kind of iterative feedback loop were actually in place. But we most certainly want to know that a call might be recorded. Recording a call without disclosure would be unethical and illegal.

Consider chatbots.

If you’re having a text-based (or maybe even voice-based) interaction with a customer service representative that doesn’t disclose its output is the result of large language models, that too would be unethical. But, at the present moment in time, it would be perfectly legal.

That needs to change.

I suspect the necessary legislation will pass in Europe first. We’ll see if the USA follows.

In a way, this goes back to my obsession with seamful design. With something as inherently varied as the output of large language models, it’s vital that people have some way of evaluating what they’re told. I believe we should be able to see as much of the plumbing as possible.

The bare minimum amount of transparency is revealing that a machine is in the loop.

This shouldn’t be a controversial take. But I guarantee we’ll see resistance from tech companies trying to sell their “AI” tools as seamless, indistinguishable drop-in replacements for human workers.

Hosting online events

Back in 2014 Vitaly asked me if I’d be the host for Smashing Conference in Freiburg. I jumped at the chance. I thought it would be an easy gig. All of the advantages of speaking at a conference without the troublesome need to actually give a talk.

As it turned out, it was quite a bit of work:

It wasn’t just a matter of introducing each speaker—there was also a little chat with each speaker after their talk, so I had to make sure I was paying close attention to each and every talk, thinking of potential questions and conversation points. After two days of that, I was a bit knackered.

Last month, I hosted an other event, but this time it was online: UX Fest. Doing the post-talk interviews was definitely a little weirder online. It’s not quite the same as literally sitting down with someone. But the online nature of the event did provide one big advantage…

To minimise technical hitches on the day, and to ensure that the talks were properly captioned, all the speakers recorded their talks ahead of time. That meant I had an opportunity to get a sneak peek at the talks and prepare questions accordingly.

UX Fest had a day of talks every Thursday in June. There were four talks per Thursday. I started prepping on the Monday.

First of all, I just watched all the talks and let them wash me over. At this point, I’d often think “I’m not sure if I can come up with any questions for this one!” but I’d let the talks sit there in my subsconscious for a while. This was also a time to let connections between talks bubble up.

Then on the Tuesday and Wednesday, I went through the talks more methodically, pausing the video every time I thought of a possible question. After a few rounds of this, I inevitably ended up with plenty of questions, some better than others. So I then re-ordered them in descending levels of quality. That way if I didn’t get to the questions at the bottom of the list, it was no great loss.

In theory, I might not get to any of my questions. That’s because attendees could also ask questions on the day via a chat window. I prioritised those questions over my own. Because it’s not about me.

On some days there was a good mix of audience questions and my own pre-prepared questions. On other days it was mostly my own questions.

Either way, it was important that I didn’t treat the interview like a laundry list of questions to get through. It was meant to be a conversation. So the answer to one question might touch on something that I had made a note of further down the list, in which case I’d run with that. Or the conversation might go in a really interesting direction completely unrelated to the questions or indeed the talk.

Above all, these segments needed to be engaging and entertaining in a personable way, more like a chat show than a post-game press conference. So even though I had done lots of prep for interviewing each speaker, I didn’t want to show my homework. I wanted each interview to feel like a natural flow.

To quote the old saw, this kind of spontaneity takes years of practice.

There was an added complication when two speakers shared an interview slot for a joint Q&A. Not only did I have to think of questions for each speaker, I also had to think of questions that would work for both speakers. And I had to keep track of how much time each person was speaking so that the chat wasn’t dominated by one person more than the other. This was very much like moderating a panel, something that I enjoy very much.

In the end, all of the prep paid off. The conversations flowed smoothly and I was happy with some of the more thought-provoking questions that I had researched ahead of time. The speakers seemed happy too.

Y’know, there are not many things I’m really good at. I’m a mediocre developer, and an even worse designer. I’m okay at writing. But I’m really good at public speaking. And I think I’m pretty darn good at this hosting lark too.

Podcasts

I’ve been on a few different podcasts recently.

The tenth episode of the Design Systems podcast is myself and Chris having a back-and-forth about design systems: Overcoming Entropy and Turning Chaos Into Order:

Chris and Jeremy Keith discuss imbuing teams with a shared sense of ownership of their design system, creating design systems able to address unforeseen scenarios, design ops as an essential part of an effective design system, and more.

Gerry has started a new podcast to accompany his new book, World Wide Waste. He invited me on for the first episode: ‘We’ve ruined the Web. Here’s how we fix it.’:

Welcome to World Wide Waste, a podcast about how digital is killing the planet, and what to do about it. In this session, I’m chatting with Jeremy Keith. Jeremy is a philosopher of the internet. Every time I see him speak, I’m struck by his calming presence, his brilliant mind and his deep humanity.

We talked about performance, energy consumption, and digital preservation. We agreed on a lot, but there were also points where we fundamentally disagreed. Good stuff!

If you like the sound of some Irishmen chatting on a podcast, then as well as listening to me and Gerry getting into it, you might also enjoy the episode of The Blarney Pilgrims podcast that I was on:

Jeremy Keith is the founder and keeper of thesession.org, probably the greatest irish music resource in the world. And this episode hopefully has something of the generous essence of that archive. We flow, from The North as a different planet to Galway as the centre of the ’90s slacker world. From the one-tune-a-week origin of thesession.org and managing an online community to the richness and value of constancy.

I’ve already written about how much this meant to me.

On the same topic—Irish music on the web—I made a brief appearance in the latest episode of Shannon Heaton’s Irish Music Stories, Irish Tunes in the Key of C-19:

How are traditional musicians and dancers continuing creative careers and group music events during the Covid-19 pandemic? How is social distancing affecting the jigs and reels? In this unexpected open of Season Four of Irish Music Stories, musicians from Ireland, England, Belgium, Sweden, and the U.S. address on and offline strategies… from a safe distance.