Skip to main content

OpenAI makes GPT-4 Turbo with Vision generally available through its API

Close up on human eye with green holographic circles upon it.
Credit: VentureBeat made with Midjourney V6

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


As enterprise developers and astute company leaders know, the application programming interface (API) is the nexus of modern software development that sits atop tech platforms, allowing third-party apps to connect and integrate with them, and OpenAI just made a big improvement to its API for its powerful GPT-4 Turbo large language model (LLM).

The company today announced on its X accounts that its GPT-4 Turbo with Vision model is now “generally available” through its API. GPT-4’s vision capabilities were announced alongside audio uploads in September 2023, and GPT-4 Turbo was announced back at OpenAI’s developer conference in November, the latter promising speed improvements, larger input context windows (up to 128,000 tokens — equivalent to about a 300-page book or document) and increased affordability.

In addition, requests for using the model’s vision recognition and analysis capabilities can now be made through the text format JSON and function calling, which generates a JSON code snippet that developers can use to automate actions within their connected apps — “sending an email, posting something online, making a purchase, etc,” though OpenAI notes on its API page that:  “We strongly recommend building in user confirmation flows before taking actions that impact the world on behalf of users.”

According to an OpenAI spokesperson, the changes help to streamline the workflow for developers and make for more efficient apps, as “previously, developers had to use separate models for text and images, but now, with just one API call, the model can analyze images and apply reasoning.”

Already, OpenAI highlights several examples of customers making use of GPT-4 Turbo with Vision, including hit startup Cognition, whose autonomous AI coding agent Devin relies on the model to automatically generate full code on a user’s behalf…

…and Healthify, a health and fitness app, which uses GPT-4 Turbo with Vision to provide nutritional analysis and recommendations of photos of their meals.

Finally, UK-based startup TLDraw uses GPT-4 Turbo with Vision to power its virtual whiteboard and convert user’s drawings into functional websites.

Though GPT-4 Turbo has fallen in benchmark tests to other, newer models such as Anthropic’s Claude 3 Opus and Cohere’s Command R+, not to mention Google’s Gemini Advanced, the move today to bring GPT-4 Turbo with Vision to more potential enterprise customers and developers should help continue to make OpenAI’s models an appealing choice while the world awaits the release of its next LLM.