Lies, damn lies, and business cases for AI hype

This week, to great fanfare, a report on how AI could “transform the state” has been published by the Tony Blair Institute of Global Change.

The hype around this report has been increased somewhat by the recent British general election, and whether the views are likely to get a better reception with Keir Starmer in Number 10 instead of Rishi Sunak.

I’ll leave that kind of palace intrigue to others, and instead have a look at the report. It’s also available as a PDF, but the PDF version lacks some very exciting AI generated graphics, apparently.

The Executive Summary gives the pitch for why so-called AI is important.

The latest iterations of artificial-intelligence systems – generative AI such as large language models (LLMs) – are matching humans for quality and beating them on speed and cost. Knowledge workers using the GPT-4 model from OpenAI completed 12 per cent more tasks, 25 per cent quicker, with a bigger boost in productivity for less-skilled workers. Businesses using AI tools are 40 per cent more efficient and have 70 per cent higher customer and employee satisfaction than businesses that do not.

Ooh, statistics! Facts! Studies! The “25% quicker” number is from the Noy and Zhang study. Contrary to the excited MIT press release which claims it is “open access”, it isn’t. If you want to read it, you may have to rely on the working paper which has known mistakes.

There’s some interesting statistical issues with this paper pointed out in this Twitter thread, but I’m broadly of the view that it’s probably about right. If you look at the Supplementary Materials and Methods document that goes with the trial, you can see the sort of writing tasks that the study participants were given…

  • managers and HR professionals were told that their employer had built a VR/metaverse-style “virtual office space” where employees working-from-home could hang out, but they weren’t using it because they quite like the solitude of not talking to weird cartoon avatars, and don’t want to wear annoying VR helmets. They were tasked with writing a company-wide email of around 400 words pleading with the employees to try to use the horrible metaverse nonsense more.

  • managers were told to write an email explaining how the company was shifting towards a flatter organisational structure… but without any concrete details as to what that would be

  • data analysts were asked to write “code notebooks” setting out the steps they would take in analysing data for a bank to reduce customer churn, and for push notification marketing for an ecommerce service. A code notebook, it should be noted, is not actually something like a Jupyter notebook, but something akin to a diary explaining the steps the analyst would take, what tools they might use (Excel vs. Tableau, Python vs. R), and a summary of the kind of analysis they might look at: clustering, pivot tables, segmenting, A/B testing etc.

  • marketers were asked to write a press releases for self-driving e-bikes and for augmented-reality glasses

  • for consultants, they had to read passages from a number of reports and write a short benefits and risks summary for a business

  • for grant writers, they had to write cover letters for a grant application

You’ll note that these tasks are ones anyone could tell you that large language models are quite good at, because there is no link between the task and actual reality. In addition, I’m a little dubious about how much effort most people are going to put into an online study when compared to the amount of effort they are likely to put into their actual job, where the consequences of performing badly include loss of income, social embarassment, and lack of professional advancement–all of which are rather more significant than missing out on a couple of extra dollars in one’s beer money pot.

Anyway, let’s move onto study number two. What did the Tony Blair Institute report say?

Businesses using AI tools are 40 per cent more efficient and have 70 per cent higher customer and employee satisfaction than businesses that do not.

It links to a blog post from Google Cloud, announcing a Harvard Business Review study sponsored by Google Cloud. That’d be the Google Cloud who attributed 28% of their revenue growth in Q1 2024 to AI. Totally unbiased research, in other words.

If you don’t want to give Google Cloud your personal details to read this report, here’s the link. Burner emails are useful.

So what’s the methodology used by the report? They did a survey of business executives, and compared the results of that survey with results from a set of executives defined as “leaders”. (What makes them leaders? Your guess is as good as mine.)

They asked them whether their companies had been doing data analytics AND/OR “AI/ML”, then asked them whether over the last year, their organization had improved in a number of categories, including the introduction of new products and services, operational efficiency, customer satisfaction, revenue/growth, customer loyalty/retention, profitability, market share, employee satisfaction, and the predictability of IT costs.1

They also asked them whether they’d increased the usage of data analytics and/or AI/ML over the two years preceding the survey, and whether the use of those technologies had increased in importance over the same two years. It’s worth noting here that on these questions, AI/ML is bolted on almost as an afterthought. They get one question, the rest are on cloud services, data storage, analytics, use of APIs, open source etc.

The results? The business leaders in the specially selected category of super-duper leaders say they are doing more data analytics and AI than the normies back in economy class. They think it’s important for their business. And more of them say their businesses are doing significantly or slightly better on all those various metrics than they were previously.

What does it not say? Well, that the interest or investment in data analytics and/or AI/ML technologies caused any improvement in those business metrics. Not that it could say that, because there is no verification of the results. Have the companies actually increased on those metrics? I mean, if they’re publicly traded, you can probably check on revenue, growth and profit, and you could use some independent survey data on customer satisfaction, and maybe you might be able to get some numbers from Glassdoor or whatever on employee retention which might stand in as a proxy for employee satisfaction.

The extremely scant methodology section also notes that 23% of respondents are in the technology sector (followed by 11% in financial services, 10% in healthcare, 9% in government/non-profits, 9% in manufacturing, and then an unreported long tail).

People in the technology sector think investing in technology is important to the success of their business? I’m shocked, your honour, I genuinely had no idea.

“It’s a paid-for report put out by a corporation with a target audience of business executives, not a Cochrane Review. Of course it’s bullshit, who cares?“, you might ask.

Well, it’s somewhat important that the promises of brand new magic computers are considered with a little more skepticism than management consultant nonsense. The executive summary goes on to argue that “[a]dopting AI in the public sector is a question of strategic prioritisation that supersedes everything else”,2 and AI could bring public sector productivity improvements worth £40 billion a year. By golly, that’s over two Brexit’s worth of benefits!

How would we achieve this goal?

Interoperable data, for one. The government needs to “secure upfront funding to rapidly link data across government that will make the implementation of AI at scale possible, maintaining privacy and anonymity”. Nobody’s tried that before, unless you ignore data.gov.uk, and the massive push across government to get more data, to use data science techniques etc.

Also on the agenda: buying a boatload of GPUs for the government’s data centers. This seems to rather put the cart before the horse. Google and Meta and OpenAI kind of need them because they’re training lots of machine learning models, but is the British government going to suddenly need to? And if they need to, why can’t they just make a sensible procurement decision between training their models on AWS/GCP/Azure/whatever and buying their own hardware at the time when that becomes a live issue?

The Civil Service could apparently be rejuvenated through hiring a bunch of AI experts, including a “graduate-entry route for AI experts through the Fast Stream”. The report envisions civil servants being “guided and supported in their day-to-day tasks by a Multidisciplinary AI Support Team (or MAST) platform”.

The benefits this would bring to citizens of the United Kingdom would apparently be immense. For instance, it could speed up responding to Freedom of Information Act requests.

Currently, responding to FOI requests requires a significant investment of time from officials to find and format information as well as make decisions about what can and cannot be shared, often inconsistently. Rather than deal with individual queries on an ad hoc basis, MAST allows departments to use open-data platforms for FOI requests, using the same mechanisms as in the previous examples.

Right, so I send in an FOI request. It magically attempts to do a fancy JOIN command across a bunch of CSVs that may or may not be up-to-date and sends it back to me. If the data is already published, I can do that myself already. But if the data isn’t published, instead of the government department providing the data I asked for, I’ll get Clippy either making up data I didn’t ask for, or denying it on the basis of a clearly inappliable FOI exemption. All the joys of WhatsApp customer service chatbots but as applied to government transparency.

It also imagines that “the MAST platform” can help with public procurement. Once an area of interest only for the wonkiest of policy wonks, the last few years of controversy around VIP lanes for COVID PPE has certainly made it interesting again.

With AI analysis of large data sets on economic activity and past contracts, departments can reach out directly to organisations that meet different thresholds for risk, the vendor’s financial health and track record…

Vendors, in turn, streamline the process of putting together a bid with AI- generated responses and receive an immediate assessment of their fit prior to its submission, demystifying the procurement process for SMEs.

I have some questions. Quite boring ones, I am sure, but I fear they might be of some importance.

Imagine you run a small or medium sized company participating in the procurement process, and the government encourages you to use their magic AI “help vendors fill in forms” system. It screws up and makes a material misstatement to the government. The government relies on that statement, but you can’t deliver and so you breach the contract. Who will be liable? The vendor who trusted the crappy computer system the government told them to use, or the government for nudging them into using the crappy system.

The “immediate assessment” the system gives—is that something a vendor can rely on? What if it the system incorrectly tells a vendor that their bid is unlikely to succeed and they give up when they would otherwise have had a very good shot at getting the contract? A bold new frontier in the loss of a chance doctrine awaits!

Next: Regulation 18 of the Public Contracts Regulations 2015 states that “contracting authorities” (read, the government or public sector body) “shall treat economic operators” (suppliers) “equally and without discrimination and shall act in a transparent and proportionate manner”.

Is the magic AI procurement bot going to handle that? If and when it screws up, is the central government body who administers it going to cover the cost of the consequences, or will it come out of the departmental budget? Will the company who supplies the magic fix-everything technology take any responsibility? Or will we just say “it’s Agile, you’re holding it wrong” and move on.

“What if it goes wrong? How long until it goes to court?” are totally valid questions to ask, especially given the considerable sums that’s just been handed out willy-nilly to party donors, chums and spivs one met down the pub to provide unusable PPE.

An important question that has to be considered in all attempts to use machine learning (and “AI”, for whatever that vague term is worth) in government is how it fits with the rule of law-type obligations that public bodies have to make decisions that are fair, unbiased, explainable, and compatible with human rights. Keep a beady eye out for the judge over your shoulder. How exactly one makes technology that sits well with these obligations is a matter on which a former Prime Minister could potentially impart some insight, and on which this report is remarkably quiet. The only real attempt to do so is framed around privacy. Privacy is important, but it’s only one of a number of policy concerns that really need a decent answer.

As with bold blockchain pronouncements and other tech hype, every experiment or prototype gets magically transformed by an army of consultants from “we’re kinda looking at it a bit” to “we’re trying it” to “we’re using it”, and then on to “it definitely works” and “it’s the greatest damn thing since the invention of the wheel”. One of the consequences of TED-style dilletantism and “naive wonderment”—of which there is a lot in the political and business leadership class of this country—is the perception that real problems can only be solved with the new and sexy and exciting.

Meanwhile, the practicalities remain in short supply. Where are all those AI experts lurking in every Whitehall department to build citizen-facing chatbots and data platforms going to come from? Tech hiring is hard even when the UK university sector wasn’t facing a funding crisis that borders on existential. You can have an incredibly clever AI triage system in A&E,3 but if you don’t have the doctors, nurses and hospital beds to actually send them to, you’re spending a lot of money reshuffling the order of queues where hundreds of thousands of people are now waiting over 12 hours, rather than reducing the queue by actually treating them.

Before building the next magical mystery machine that will totally fix everything, ask yourself “yeah, but will it actually do that though?” If the case for it consists of crappy paid-for online surveys, prototypes/experiments that have been puffed up into dead-cert successes, and either silence or hand-waving on how one resolves the actual difficult practical problems, tread with considerable caution. And be aware that actual technologists are a hell of a lot more cautious in their claims and promises than the business and political leaders they work for.


  1. One cannot imagine why Google Cloud would be keen on talking about IT cost predictability.

    [return]
  2. Than everything? It’s above climate change? Nuclear war? Calm down, they’re just computers.

    [return]
  3. These “service users” are somewhat less likely to sue you than sophisticated commercial suppliers who’ve just had their bid for a government contract turned down by an overpuffed Alexa. [return]