AI

Google’s Gemini updates: How Project Astra is powering some of I/O’s big reveals

Comment

Screen grab of Google Gemini
Image Credits: Google

Google is improving its AI-powered chatbot Gemini so that it can better understand the world around it — and the people conversing with it.

At the Google I/O 2024 developer conference on Tuesday, the company previewed a new experience in Gemini called Gemini Live, which lets users have “in-depth” voice chats with Gemini on their smartphones. Users can interrupt Gemini while the chatbot’s speaking to ask clarifying questions, and it’ll adapt to their speech patterns in real time. And Gemini can see and respond to users’ surroundings, either via photos or video captured by their smartphones’ cameras.

“With Live, Gemini can better understand you,” Sissie Hsiao, GM for Gemini experiences at Google, said during a press briefing. “It’s custom-tuned to be intuitive and have a back-and-forth, actual conversation with [the underlying AI] model.”

Gemini Live is in some ways the evolution of Google Lens, Google’s long-standing computer vision platform to analyze images and videos, and Google Assistant, Google’s AI-powered, speech-generating and -recognizing virtual assistant across phones, smart speakers and TVs.

At first glance, Live doesn’t seem like a drastic upgrade over existing tech. But Google claims it taps newer techniques from the generative AI field to deliver superior, less error-prone image analysis — and combines these techniques with an enhanced speech engine for more consistent, emotionally expressive and realistic multi-turn dialogue.

“It’s a real-time voice interface and [has] extremely powerful multimodal capabilities combined with long context,” Oriol Vinyals, principal scientist at DeepMind, Google’s AI research division, told TechCrunch in an interview. “You could imagine how that combination will feel very powerful.”

The technical innovations driving Live stem in part from Project Astra, a new initiative within DeepMind to create AI-powered apps and “agents” for real-time, multimodal understanding.

“We’ve always wanted to build a universal agent that will be useful in everyday life,” Demis Hassabis, CEO of DeepMind, said during the briefing. “Imagine agents that can see and hear what we do, better understand the context we’re in and respond quickly in conversation, making the pace and quality of interactions feel much more natural.”

Gemini Live — which won’t launch until later this year — can answer questions about things within view (or recently within view) of a smartphone’s camera, like which neighborhood a user might be in or the name of a part on a broken bicycle. Pointed at a portion of computer code, Live can explain what that code does. Or, asked about where a pair of glasses might be, Live can say where it last “saw” the glasses.

Gemini
Image Credits: Google

Live is also designed to serve as a virtual coach of sorts, helping users rehearse for events, brainstorm ideas and so on. Live can suggest which skills to highlight in an upcoming job or internship interview, for instance, or give public speaking advice.

“Gemini Live can provide information more succinctly and answer more conversationally than, for example, if you’re interacting in just text,” Sissie said. “We think that an AI assistant should be able to solve complex problems … and also feel very natural and fluid when you engage with it.”

Gemini Live’s ability to “remember” is made possible by the architecture of the model underpinning it: Gemini 1.5 Pro (and to a lesser extent other “task-specific” generative models), which is the current flagship in Google’s Gemini family of generative AI models. It has a longer-than-average context window, meaning it can take in and reason over a lot of data — about an hour of video (RIP, smartphone batteries) — before crafting a response.

“That’s hours of video that you could have interacting with the model, and it would remember all that has happened before,” Vinyals said.

Live is reminiscent of the generative AI behind Meta’s Ray-Ban glasses, which similarly can look at images captured by a camera and interpret them in near-real time. Judging from the pre-recorded demo reels Google showed during the briefing, it’s also quite similar — conspicuously so — to OpenAI’s recently revamped ChatGPT.

One key difference between the new ChatGPT and Gemini Live is that Gemini Live won’t be free. Once it launches, Live will be exclusive to Gemini Advanced, a more sophisticated version of Gemini that’s gated behind the Google One AI Premium Plan, priced at $20 per month.

Perhaps in a jab at Meta, one of Google’s demos showed a person wearing AR glasses equipped with a Gemini Live-like app. Google — doubtless keen to avoid another dud in the eyewear department — declined to say whether those glasses or any glasses powered by its generative AI would come to market in the near future.

Vinyals didn’t completely shut down the idea, though. “We’re still prototyping and, of course, showcasing [Astra and Gemini Live] to the world,” he said. “We’re seeing the reaction from folks that can try it, and that will inform where we go.”

Other Gemini updates

Beyond Live, Gemini is getting a range of upgrades to make it more useful day-to-day.

Gemini Advanced users in more than 150 countries and over 35 languages can take advantage of Gemini 1.5 Pro’s larger context to have the chatbot analyze, summarize and answer questions about long (up to 1,500 pages) documents. (While Live is arriving later in the year, Gemini Advanced users can interact with Gemini 1.5 Pro starting today.) Documents can now be imported from Google Drive or uploaded directly from a mobile device.

Later this year for Gemini Advanced users, the context window will grow even larger — to 2 million tokens — and bring with it support for uploading videos (up to two hours in length) to Gemini and having Gemini analyze big codebases (more than 30,000 lines of code). 

Google claims that the large context window will improve Gemini’s image understanding. For example, given a photo of a fish dish, Gemini will be able to suggest a comparable recipe. Or, given a math problem, Gemini will provide step-by-step instructions on how to solve it. 

And it’ll help Gemini to trip plan. 

Gemini
Image Credits: Google

In the coming months, Gemini Advanced will gain a new “planning experience” that creates custom travel itineraries from prompts. Taking into account things like flight times (from emails in a user’s Gmail inbox), meal preferences and information about local attractions (from Google Search and Maps data), as well as the distances between those attractions, Gemini will generate an itinerary that updates automatically to reflect any changes. 

In the more immediate future, Gemini Advanced users will be able to create Gems, custom chatbots powered by Google’s Gemini models. Along the lines of OpenAI’s GPTs, Gems can be generated from natural language descriptions — for example, “You’re my running coach. Give me a daily running plan” — and shared with others or kept private. No word on whether Google plans to launch a storefront for Gems like OpenAI’s GPT Store; hopefully we’ll learn more as I/O goes on.

Soon, Gems and Gemini proper will be able to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep and YouTube Music, to complete various labor-saving tasks.

Gemini
Image Credits: Google

“Let’s say you have a flier from your kid’s school, and there’s all these events that you want to add to your personal calendar,” Hsiao said. “You’ll be able to take a picture of this flier and ask the Gemini app to create these calendar entries directly onto your calendar. This is going to be a great time saver.”

Given generative AI’s tendency to get summaries wrong and generally go off the rails (plus Gemini’s not-so-glowing early reviews), take Google’s claims with a grain of salt. But if the improved Gemini and Gemini Advanced actually perform as Hsiao describes — and that’s a big if — they could be great time savers indeed. 

We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.

Read more about Google I/O 2024 on TechCrunch

More TechCrunch

The first time I saw Google’s latest commercial, I wondered, “Is it just me, or is this kind of bad?” By the fourth or fifth time I saw it, I’d…

Dear Google, who wants an AI-written fan letter?

Featured Article

MatPat, the first big YouTuber to successfully exit his company, is lobbying for creators on Capitol Hill

Though MatPat retired from YouTube, he’s still pretty busy. In fact, he’s been spending a lot of time on Capitol Hill.

MatPat, the first big YouTuber to successfully exit his company, is lobbying for creators on Capitol Hill

Featured Article

A tale of two foldables

Samsung is still foldables’ 500-pound gorilla, but the company successes have made the category significantly less lonely in recent years.

A tale of two foldables

The California Department of Motor Vehicles this week granted Nuro approval to test its third-generation R3 autonomous delivery vehicle in four Bay Area cities, giving the AV startup a positive…

Autonomous delivery startup Nuro is gearing up for a comeback

With Ghostery turning 15 years old this month, TechCrunch caught up with CEO Jean-Paul Schmetz to discuss the company’s strategy and the state of ad tracking.

Ghostery’s CEO says regulation won’t save us from ad trackers

Two years ago, workers at an Apple Store in Towson, Maryland were the first to establish a formally recognized union at an Apple retail store in the United States. Now…

Apple reaches its first contract agreement with a US retail union

OpenAI is testing SearchGPT, a new AI search experience to compete directly with Google. The feature aims to elevate search queries with “timely answers” from across the internet and allows…

OpenAI comes for Google with SearchGPT

Indian cryptocurrency exchange WazirX announced on Saturday a controversial plan to “socialize” the $230 million loss from its recent security breach among all its customers, a move that has sent…

WazirX to ‘socialize’ $230 million security breach loss among customers

Featured Article

Stay up-to-date on the amount of venture dollars going to underrepresented founders

Stay up-to-date on the latest funding news for Black and women founders.

Stay up-to-date on the amount of venture dollars going to underrepresented founders

The National Institute of Standards and Technology (NIST), the U.S. Commerce Department agency that develops and tests tech for the U.S. government, companies and the broader public, has re-released a…

NIST releases a tool for testing AI model risk

Featured Article

Max Space reinvents expandable habitats with a 17th-century twist, launching in 2026

Max Space’s expandable habitats promise to be larger, stronger, and more versatile than anything like them ever launched, not to mention cheaper and lighter by far than a solid, machined structure.

Max Space reinvents expandable habitats with a 17th-century twist, launching in 2026

Payments giant Stripe has acquired a four-year-old competitor, Lemon Squeezy, the latter company announced Friday. Terms of the deal were not disclosed. As a merchant of record, Lemon Squeezy calculates…

Stripe acquires payment processing startup Lemon Squeezy

iCloud Private Relay has not been working for some Apple users across major markets, including the U.S., Europe, India and Japan.

Apple reports iCloud Private Relay global outages for some users

Welcome to Startups Weekly — your weekly recap of everything you can’t miss from the world of startups. To get Startups Weekly in your inbox every Friday, sign up here. This…

Legal tech, VC brawls and saying no to big offers

Apple joins 15 other tech companies — including Google, Meta, Microsoft and OpenAI — that committed to the White House’s rules for developing generative AI.

Apple signs the White House’s commitment to AI safety

The language is ambiguous, so it’s not clear whether X is helping itself to all user data for training Grok or whether this processing refers only to user interactions with…

Privacy watchdog says it’s ‘surprised’ by Elon Musk opting user data into Grok AI training

Sound Search on TikTok is somewhat similar to YouTube Music’s song detection tool that lets you find the name of a song by singing, humming or playing it. 

TikTok rolls out a new feature that lets you find songs by singing or humming them

Skip, a wearable tech startup that began as a secretive project inside Alphabet, exited stealth this week to announce a partnership with outdoor clothing specialist Arc’teryx. The deal is the…

Alphabet X spinoff partners with Arc’teryx to bring ‘everyday’ exoskeleton to market

Ledger, a French startup mostly known for its secure crypto hardware wallets, has launched a new mid-range device, the Ledger Flex. Available now, priced at $249, the dinky hardware wallet…

Ledger launches Ledger Flex, a mid-range hardware crypto wallet

The good news is that you can switch off the new data-sharing setting and also delete your conversation history with the AI. 

Here’s how to disable X (Twitter) from using your data to train its Grok AI

Regulators gave SpaceX the all-clear to return to launch two weeks after the Falcon 9 rocket experienced an anomaly on orbit.

SpaceX cleared to resume Falcon 9 launches while FAA investigation remains open

Madison Long and Simone May founded Clutch in 2020 to help connect people to businesses looking for marketing and content creation.

Digital marketing startup Plaiced has acquired Precursor Ventures-backed Clutch

With the CrowdStrike update continuing to cause havoc across the planet, a startup has raised $13.5 million to at least improve some level of security for the kinds of devices…

ZeroTier raises $13.5M to help avert CrowdStrike-like network problems

Apple has reduced prices of its iPhone models in India by 3-4% following a cut in import duties in the South Asian market.

Apple cuts iPhone price in India amid China slowdown

MNT-Halan, a fintech unicorn out of Egypt, is on a consolidation march. The microfinance and payments startup has raised $157.5 million in funding and is using the money in part…

Egypt’s MNT-Halan banks $157.5M, gobbles up a fintech in Turkey to expand

The energy transition is a marathon, not a sprint. But opportunities for acceleration are growing. Swedish startup Greenely* has just spotted one. It’s closing an €8 million Series A funding…

Energy tech startup Greenely grabs €8M to reach more households and support Europe’s energy transition

The Floorr offers tools for conducting sales, hosting tailored styling sessions, creating mood boards, and engaging in text or voice chats with clients, all in one place. 

Luxury fashion startup The Floorr empowers personal stylists with tools to grow their businesses

A decade-old drama involving VC David Sacks and Rippling founder Parker Conrad has blown up on X with many among the Silicon Valley elite taking sides.

Here’s why David Sacks, Paul Graham and other big Silicon Valley names had a brawl on X over VC behavior

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm since its launch in November 2022. What started as a tool to hyper-charge productivity through writing essays and code…

ChatGPT: Everything you need to know about the AI-powered chatbot

Autonomous vehicle software startup Applied Intuition has closed a $300 million secondary sale just four months after raising a $250 million Series E round, yet another sign of how white-hot…

Applied Intuition closes $300M secondary four months after raising $250M