Diffbot Aims To Build The Intel Of Data For Artificial Intelligence

Comment

Image Credits: agsandrew (opens in a new window) / Shutterstock (opens in a new window)

With a new $10 million commitment led by Tencent, one of China’s largest Internet companies, Diffbot chief executive Mike Tung has come a long way from his days of eating beans and rice in the dark and solving the math problems that would form the core of his groundbreaking artificial intelligence software.

Diffbot, which raised its first seed money in 2012, has set itself the lofty goal of being the “Intel of data” for independent artificial intelligence application developers.

Companies like Google, Facebook, and Baidu — which are all working on artificial intelligence — have the benefit of massive amounts of data at their fingertips that they and their data entry employees can use to categorize and define the web in a language that AI software can later feed into their algorithms .

Small companies who don’t have the benefit of that data can turn to Diffbot.

“We’ve been working on this technology for quite a few years. It was really last year that 90% to 95% accuracy was reached. And hitting profitability last year as one of the first AI startups to do so was a turning point,” says Tung.

The major expenses for Diffbot had been electricity and bandwidth, Tung says. Unlike other artificial intelligence deep learning projects that rely on humans to classify web pages, Diffbot uses only the proprietary algorithms that it created itself and has refined over the years, according to Tung.

“We want to build the world’s largest database of structured knowledge,” he says.

If artificial intelligence is to achieve the promise (and potential peril) inherent in the technology, it still needs to be taught.

Tung compares it to teaching a child. “The technology is scouring the web and is trying to simulate what a human being is doing when they’re on the page,” he says.

shutterstock_228897490

Research into artificial intelligence, and the ability to develop sentience in machines, sits at the intersection of a few very large trends in computing. It combines the development of new, and newly powerful, chipsets that can process complex increasingly quickly; the development of new kinds of database software that can organize massive amounts of data more flexibly, and the development of a nearly ubiquitous arrays of sensors and systems to collect that data.

The problem with the data that these would be intelligences would learn and process is that it needs to be structured in a way that the systems can recognize and that’s exactly what Diffbot does.

“We’re taking the Internet and converting it into semantic knowledge,” says  Tung. And, in a strategy that drives down the cost of developing the massive trillions of facts that comprise the taxonomy that Diffbot is creating, the company’s secret weapon is its own AI software.

“Google has this knowledge graph using human curation and it’s the same with Watson. There’s a lot of human beings behind the scenes creating the rules the way the algorithm works,” says Tung. And humans cost money that Diffbot simply doesn’t need to spend.

Tung calls it the Manhattan project for AI — except computers are the researchers developing the bomb.

The Path Seldom Taken

Diffbot was always going to make money. The question of profitability wasn’t one that Tung ever wanted to address, nor was relying on fundraising as a necessity, the founder and chief executive said.

To make money to support the development of the software, Tung pinched pennies and took on a second job after dropping out of Stanford’s graduate school, learning patent law and filing patents in the wee small hours of the morning to make rent money.

“For each patent I was able to get 20K,” says Tung. “I would be good to get rent for a few months.”

He lived on a diet of beans and rice and ramen, alternating working on the math at the core of the software with filing patent applications for money.

Once the initial product was baked, Diffbot had the singular honor of being the first company to be accelerated in the program that would become Stanford’s premiere source for getting graduates to exit velocity with their business — StartX (where Tung is still a mentor).

With the initial seed money from StartX, Diffbot was able to continue its research and launch its first, revenue generating, products.

“From day one we made it an on-demand service,” Tung recalls. “You pass us a URL and we will process that. For every hit to our server we earn .008 cents…

In retrospect it was a decision that Tung was happiest about. “Our on-demand customers were paying us to structure the web,” he says.

Many of those on-demand customers are still on board. AOL (the parent company and owner of TechCrunch), Yandex, eBay, Microsoft’s Bing search service, Cisco and Adobe all pay Diffbot for its taxonomical services — and Diffbot got to increase the scope of its learning.

A Thin, Premeditated Rig

MBF-Arachnophobia-Spider-Table-Clock-aBlogtoWatch-3

While Diffbot couldn’t spider the web from day one, by 2015 its situation had changed. The company was profitable, confident in its ability to raise money, its AI software was identifying data on the web with a 90% to 95% reliability. It was time.

So the company started spidering the web to speed up its data collection. The goal, ultimately is to get to trillions of discrete data points to provide a structured taxonomy for the entire internet (it’s a small goal).

Since the company began its spidering project last year, it’s taxonomy already contains more than 1.2 billion objects and is adding 10 million objects per day.

By comparison, Google’s Knowledge Graph only recently passed 1 billion objects, the company notes.

Show Me The Money

Lofty goals attract big investors, and Diffbot has attracted some of the biggest.

For its seed round the company attracted a who’s who of the Silicon Valley’s biggest names including: EarthLink founder Sky DaytonAndy Bechtolsheim, co-founder of Sun Microsystems; Joi Ito, Director of MIT Media Lab; Brad Garlinghouse, CEO of YouSendIt (and formerly of TechCrunch parent company AOL),Maynard Webb, Chairman of the Board at LiveOps, formerly eBay COO; Elad Gil, VP of Corporate Strategy at Twitter; Jonathan Heiliger, former VP of Technical Operations at Facebook; Redbeacon co-founder Aaron Lee; and founder of VitalSigns Montgomery Kersten.

The latest round brought in a strategic investor in Tencent, one of China’s largest Internet companies in one of the world’s largest markets. And Felicis Ventures, which is building a sizable portfolio of artificial intelligence companies.

A coterie of new angels and other institutions joined as well — all of them also bold-faced names in the Valley. Among the superstar new names are: Andy Bechtolsheim, the founder of Sun Microsystems and the first investor in Google; Amplify Ventures, Valor Capital, and Bill Lee — an early investor in SpaceX and Tesla.

More TechCrunch

So-called “unlearning” techniques are used to make a generative AI model forget specific and undesirable info it picked up from training data, like sensitive private data or copyrighted material. But…

Making AI models ‘forget’ undesirable data hurts their performance

Uber is now letting riders in India to book up to three rides simultaneously.

Uber now lets users in India book three trips at once

U.S. airports are rolling out facial recognition to scan travelers’ faces before boarding their flights. Americans, at least, can opt out. 

How to opt out of facial recognition at airports (if you’re American)

The promise of AI and large language models (LLMs) is the ability to understand increasingly wider amounts of context and make sense of that information easily, so it makes sense…

Bee AI raises $7M for its wearable AI assistant that learns from your conversations

Featured Article

DEI backlash: Stay up-to-date on the latest legal and corporate challenges

It’s clear that this year will be a turning point for DEI.

DEI backlash: Stay up-to-date on the latest legal and corporate challenges

Bike-taxi startup Rapido, which counts Swiggy among its investors, is the latest Indian firm to become a unicorn.

India’s Rapido becomes a unicorn with fresh $120M funding

Government websites aren’t known for cutting-edge tech. GovWell co-founder and CTO Ben Cohen discovered this while trying to help his dad, a contractor, apply for building permits. Cohen worked as…

GovWell is bringing automation and efficiency to local governments

Critics have long argued that wararantless device searches at the U.S. border are unconstitutional and violate the Fourth Amendment.

US border agents must get warrant before cell phone searches, federal court rules

Featured Article

UK’s Zapp EV plans to expand globally with an early start in India

Zapp is launching its urban electric two-wheeler in India in 2025 as it plans to expand globally.

UK’s Zapp EV plans to expand globally with an early start in India

The first time I saw Google’s latest commercial, I wondered, “Is it just me, or is this kind of bad?” By the fourth or fifth time I saw it, I’d…

Dear Google, who wants an AI-written fan letter?

Featured Article

MatPat, the first big YouTuber to successfully exit his company, is lobbying for creators on Capitol Hill

Though MatPat retired from YouTube, he’s still pretty busy. In fact, he’s been spending a lot of time on Capitol Hill.

MatPat, the first big YouTuber to successfully exit his company, is lobbying for creators on Capitol Hill

Featured Article

A tale of two foldables

Samsung is still foldables’ 500-pound gorilla, but the company successes have made the category significantly less lonely in recent years.

A tale of two foldables

The California Department of Motor Vehicles this week granted Nuro approval to test its third-generation R3 autonomous delivery vehicle in four Bay Area cities, giving the AV startup a positive…

Autonomous delivery startup Nuro is gearing up for a comeback

With Ghostery turning 15 years old this month, TechCrunch caught up with CEO Jean-Paul Schmetz to discuss the company’s strategy and the state of ad tracking.

Ghostery’s CEO says regulation won’t save us from ad trackers

Two years ago, workers at an Apple Store in Towson, Maryland, were the first to establish a formally recognized union at an Apple retail store in the United States. Now…

Apple reaches its first contract agreement with a US retail union

OpenAI is testing SearchGPT, a new AI search experience to compete directly with Google. The feature aims to elevate search queries with “timely answers” from across the internet and allows…

OpenAI comes for Google with SearchGPT

Indian cryptocurrency exchange WazirX announced on Saturday a controversial plan to “socialize” the $230 million loss from its recent security breach among all its customers, a move that has sent…

WazirX to ‘socialize’ $230 million security breach loss among customers

Featured Article

Stay up-to-date on the amount of venture dollars going to underrepresented founders

Stay up-to-date on the latest funding news for Black and women founders.

Stay up-to-date on the amount of venture dollars going to underrepresented founders

The National Institute of Standards and Technology (NIST), the U.S. Commerce Department agency that develops and tests tech for the U.S. government, companies and the broader public, has re-released a…

NIST releases a tool for testing AI model risk

Featured Article

Max Space reinvents expandable habitats with a 17th-century twist, launching in 2026

Max Space’s expandable habitats promise to be larger, stronger, and more versatile than anything like them ever launched, not to mention cheaper and lighter by far than a solid, machined structure.

Max Space reinvents expandable habitats with a 17th-century twist, launching in 2026

Payments giant Stripe has acquired a four-year-old competitor, Lemon Squeezy, the latter company announced Friday. Terms of the deal were not disclosed. As a merchant of record, Lemon Squeezy calculates…

Stripe acquires payment processing startup Lemon Squeezy

iCloud Private Relay has not been working for some Apple users across major markets, including the U.S., Europe, India and Japan.

Apple reports iCloud Private Relay global outages for some users

Welcome to Startups Weekly — your weekly recap of everything you can’t miss from the world of startups. To get Startups Weekly in your inbox every Friday, sign up here. This…

Legal tech, VC brawls and saying no to big offers

Apple joins 15 other tech companies — including Google, Meta, Microsoft and OpenAI — that committed to the White House’s rules for developing generative AI.

Apple signs the White House’s commitment to AI safety

The language is ambiguous, so it’s not clear whether X is helping itself to all user data for training Grok or whether this processing refers only to user interactions with…

Privacy watchdog says it’s ‘surprised’ by Elon Musk opting user data into Grok AI training

Sound Search on TikTok is somewhat similar to YouTube Music’s song detection tool that lets you find the name of a song by singing, humming or playing it. 

TikTok rolls out a new feature that lets you find songs by singing or humming them

Skip, a wearable tech startup that began as a secretive project inside Alphabet, exited stealth this week to announce a partnership with outdoor clothing specialist Arc’teryx. The deal is the…

Alphabet X spinoff partners with Arc’teryx to bring ‘everyday’ exoskeleton to market

Ledger, a French startup mostly known for its secure crypto hardware wallets, has launched a new mid-range device, the Ledger Flex. Available now, priced at $249, the dinky hardware wallet…

Ledger launches Ledger Flex, a mid-range hardware crypto wallet

The good news is that you can switch off the new data-sharing setting and also delete your conversation history with the AI. 

Here’s how to disable X (Twitter) from using your data to train its Grok AI

Regulators gave SpaceX the all-clear to return to launch two weeks after the Falcon 9 rocket experienced an anomaly on orbit.

SpaceX cleared to resume Falcon 9 launches while FAA investigation remains open