What Does TOPS Mean and Does It Matter When I Buy a Laptop?

Get ready to learn the old metrics to describe a device's new AI performance.

Mark Knapp

Lori Grunin Senior Editor / Advice

I've been reviewing hardware and software, devising testing methodology and handed out buying advice for what seems like forever; I'm currently absorbed by computers and gaming hardware, but previously spent many years concentrating on cameras. I've also volunteered with a cat rescue for over 15 years doing adoptions, designing marketing materials, managing volunteers and, of course, photographing cats.

Expertise Photography | PCs and laptops | Gaming and gaming accessories

See full bio

Mark Knapp ,

Lori Grunin

June 17, 2024 11:05 a.m. PT

4 min read

Abstract illustration of a computer processor, angled — Created by Lori Grunin using Adobe Firefly

New technology always brings with it lots of new jargon, and the so-called AI revolution has every computer and chip manufacturer tossing about a new one: TOPS. While understanding TOPS isn't critical to understanding how AI tools will work running completely on your phones and computers or hybrid cloud/local processing, it may prove a useful metric to consider when shopping just as much as something like torque when shopping for a car, MB/s when shopping for computer storage, or GHz when shopping for a CPU. None of these are perfect parallels, especially given how fast the technology is changing.

What does TOPS mean?

TOPS is a simple acronym for Tera Operations Per Second or Trillion Operations Per Second. They're the same because "tera" is the prefix for trillion. And the operations? Here they're 8-bit integer (INT8) operations; in other words, the data type and math that an on-chip neural processor for AI acceleration uses.

Since TOPS is a measure of speed, you're generally going to see this specification listed alongside the recent spate of neural processing units turning up in laptops and phones; NPUs are less important in desktops because NPUs are intended for devices where power saving matters. They're slower than the GPU (graphics processing unit), either integrated on the chip or discrete, which can also do the integer math, but they take a lot less power.

NPUs are also intended for running models that you access locally on your device rather than relying on the cloud: It's about cost. As one executive said recently about running basic AI in the cloud, "someone's got to pay for those servers." It also allows marketers to allay your fears about privacy and security with "it only runs locally and doesn't get uploaded!"

From manufacturers, it refers to the peak theoretical throughput, so it will always be quoted as "up to" whatever. Like GHz for CPUs, it's there to suggest a certain level of performance you can expect from the chip. In a like-for-like scenario, the chip with the higher TOPS specification should complete AI tasks faster, and that could make a big difference in the utility and responsiveness of various AI tools, such as image recognition, text generation or any number of assistive AI technologies.

NPU performance

	NPU TOPS (up to)
Intel Core Ultra (2023)	11.5
AMD Ryzen 8040 series (2023)	16
Apple M3 series (2023)	18
Apple M4 (2024)	38
Snapdragon X series (2024)	45
Intel Lunar Lake (2024)	48
AMD 300 series (2024)	50

Why TOPS suddenly matters

TOPs became a big deal when Microsoft and Qualcomm launched Microsoft's Copilot Plus platform for Windows, where 40 NPU TOPS is the dividing line, making every older consumer laptop chip unqualified for the platform. That's because Copilot Plus PCs use a specific set of Windows programming interfaces to accelerate some basic AI-related features and on-device models, not just the ones in Windows but in software by other developers as well. The first systems offering this are built around Qualcomm's Snapdragon X Elite and X Plus chips, which feature a custom Hexagon NPU capable of up to 45 TOPS.

In less than a month since Qualcomm's launch, we've already heard about the new AMD Ryzen AI 300 series (coming in July) and Intel Lunar Lake CPUs (coming in Q3) with rearchitected NPUs. They use a data type called block floating point, which effectively combines a compressed representation of 16-bit floating point -- a data type that allows for storage and manipulation of much larger and smaller numbers than INT8 -- with INT8.

Watch this: Microsoft Announces First Surface Copilot Plus PCs Powered by Qualcomm

03:33

That allows them to bump their NPU performance into the Copilot Plus zone. The Apple M4 upgraded to the latest version of the Arm core, which also uses block FP; hence its improvement over the M3. (See Nvidia's explanation of sparsity, which is about matrix math as performed by its Tensor cores. It's not the same as block FP, but gives you an example of how compression can work.)

Beyond Copilot Plus

While NPU TOPS will be useful for seeing how quickly you might be able to run Microsoft's Copilot-driven experiences (like Recall), it's not the be-all-end-all of performance. Not all AI processes can use integer math comfortably, and therefore the INT8 TOPS measurement may only reflect how fast a chip will handle the basics. That doesn't include the whizzy generative AI that can create detailed images and videos from text prompts.

Not only can dedicated graphics processors already able to hit much higher speeds, but Nvidia's RTX 40-series GPUs offer hundreds of TOPS even from its lower-tier mobile GPUs. The CPU can handle the math, albeit much more slowly than the NPU in some cases.

nvidia-rtx-ai-and-copilot-plus-laptops — Nvidia, Lori Grunin/CNET

So TOPS is already giving way to platform TOPS as a more marketing-impressive metric. Platform TOPS is a measure of the aggregate performance of all the processors in the system: CPU, NPU and GPU(s). When you see it, remember that it can vary a lot, depending upon whether you're on power or battery.

On the other hand, if a system has a high platform TOPS but doesn't have an NPU, that rules out support for any Copilot Plus experiences in Windows you may care about, at least for now.

Any GPU can work with multiple data types. Most of the serious generative tools are written with floating point math in mind. If a tool relies on floating point operations (FP16 and FP32), you'd instead be looking for a TFLOPS metric -- trillion/tera floating point operations. Then there are the matrix extensions introduced into CPU chips, which perform the same types of operations as dedicated Tensor cores. Both of those are far more important for the high-power stuff, like training models rather than just running them. We expect the measure of a PC's AI performance to evolve yet again before we can get used to this one.

Computing Guides

Laptops

Desktops & Monitors

Computer Accessories

Photography

Tablets & E-Readers

3D Printers