Follow us on RSS

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

As enterprise developers and astute company leaders know, the application programming interface (API) is the nexus of modern software development that sits atop tech platforms, allowing third-party apps to connect and integrate with them, and OpenAI just made a big improvement to its API for its powerful GPT-4 Turbo large language model (LLM).

The company today announced on its X accounts that its GPT-4 Turbo with Vision model is now “generally available” through its API. GPT-4’s vision capabilities were announced alongside audio uploads in September 2023, and GPT-4 Turbo was announced back at OpenAI’s developer conference in November, the latter promising speed improvements, larger input context windows (up to 128,000 tokens — equivalent to about a 300-page book or document) and increased affordability.

GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.https://t.co/cbvJjij3uL

Below are some great ways developers are building with vision. Drop yours in a reply ?
— OpenAI Developers (@OpenAIDevs) April 9, 2024

In addition, requests for using the model’s vision recognition and analysis capabilities can now be made through the text format JSON and function calling, which generates a JSON code snippet that developers can use to automate actions within their connected apps — “sending an email, posting something online, making a purchase, etc,” though OpenAI notes on its API page that: “We strongly recommend building in user confirmation flows before taking actions that impact the world on behalf of users.”

New GPT-4 Turbo now available in GA with built-in vision capabilities! You can now use function calling and JSON mode with image inputs—something many of you have been asking for. Happy building! ? https://t.co/kC9WYW2OrA
— Romain Huet (@romainhuet) April 9, 2024

According to an OpenAI spokesperson, the changes help to streamline the workflow for developers and make for more efficient apps, as “previously, developers had to use separate models for text and images, but now, with just one API call, the model can analyze images and apply reasoning.”

Already, OpenAI highlights several examples of customers making use of GPT-4 Turbo with Vision, including hit startup Cognition, whose autonomous AI coding agent Devin relies on the model to automatically generate full code on a user’s behalf…

Devin, built by @cognition_labs, is an AI software engineering assistant powered by GPT-4 Turbo that uses vision for a variety of coding tasks. pic.twitter.com/E1Svxe5fBu
— OpenAI Developers (@OpenAIDevs) April 9, 2024

…and Healthify, a health and fitness app, which uses GPT-4 Turbo with Vision to provide nutritional analysis and recommendations of photos of their meals.

Make Real, built by @tldraw, lets users draw UI on a whiteboard and uses GPT-4 Turbo with Vision to generate a working website powered by real code. pic.twitter.com/RYlbmfeNRZ
— OpenAI Developers (@OpenAIDevs) April 9, 2024

Finally, UK-based startup TLDraw uses GPT-4 Turbo with Vision to power its virtual whiteboard and convert user’s drawings into functional websites.

Though GPT-4 Turbo has fallen in benchmark tests to other, newer models such as Anthropic’s Claude 3 Opus and Cohere’s Command R+, not to mention Google’s Gemini Advanced, the move today to bring GPT-4 Turbo with Vision to more potential enterprise customers and developers should help continue to make OpenAI’s models an appealing choice while the world awaits the release of its next LLM.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat's Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.