Arthur’s Post

View organization page for Arthur, graphic

6,599 followers

1mo

6️⃣ Tools for Getting Started with LLM Experimentation & Development 🛠️🧰 With the field of AI changing at such a rapid pace, it can feel nearly impossible to stay up to date with the latest tools and techniques. Here are a few that our ML Research Scientist Max Cembalest thinks are productive, innovative, and easy to use! 🧑🔬 For Experimentation: - LiteLLM (YC W23): A simple client API that makes it easy to test major LLM providers. It maintains enough of a common format for your LLM inputs for painless swapping between providers. - Ollama: A tool for experimenting with open-source models, with a git-like CLI to fetch all the latest models (at various levels of quantization so you can run quickly from a laptop) and prompt from the terminal. - MLX: Built specifically for Apple hardware, MLX brings massive improvements to the speed and memory-efficiency of running and training all the standard and state-of-the-art AI models on Apple devices. - DSPy: Designed to be analogous to PyTorch—every time the LLM, retriever, evaluation criteria, or anything else is modified, DSPy can re-optimize a new set of prompts and examples that max out your evaluation criteria. 📊 For Evaluation: - Elo: Traditionally used to rank chess players, the Elo rating system has been employed to compare the relative strengths of various AI language models based on votes from human evaluators. It has become a very popular and cost-effective general purpose metric to quantitatively rank LLMs from head-to-head blind A/B preference tests. - Arthur Bench: Last but not least, Bench is our open-source evaluation product for comparing LLMs, prompts, and hyperparameters for generative text models. It enables businesses to evaluate how different LLMs will perform in real-world scenarios so they can make informed, data-driven decisions when integrating the latest AI technologies into their operations.

1 Comment

Arthur

1mo

👉 Interested in learning more? Read our full “Guide to LLM Experimentation and Development in 2024”: https://bit.ly/4e5PPEr 👉 Check out Arthur Bench: https://github.com/arthur-ai/bench

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Abraham Alappat

Sr. Technical Product Manager @someplace new | AI/ML | Generative AI | Venture Capital
9mo
Report this post
I keep hearing about the challenges in evaluating AI systems - here is a great summary by Anthropic that I think covers a lot of the various methods that are emerging for foundation models. One thing that sticks out here is that evaluating against a specific usecase is often left to the end user of a foundation model, and depends on the customer collecting labelled examples of good and bad behavior. How will they do it remains an area of exploration! Measuring if a model or agent can accurately roleplay a gaming NPC with the right context for the character (or any other usecase really) depends on collecting examples of good and bad outputs in production. We can't have Gandalf opine on Apple's stock price climb in-game! 😂 PMs of user-facing products with LLMs will need to think about the loop of data-collection on possible failure modes, prompt-engineering, and finetuning; all from live-user data. Getting manual annotations from an inhouse team won't be scalable. What will make this even more complex is that embeddings (and effectiveness of prompts) will change, as models are finetuned and retrained. Anyone depending on a third party model via APIs will need to be aware!

Challenges in evaluating AI systems

anthropic.com
Like Comment
To view or add a comment, sign in
Sidu Ponnappa

AI Transformation for Salesforce Partners | ex-CIO — Gojek Singapore, ex-MD – Gojek India
2mo
Report this post
I'd say yesterday's GPT-4o update is designed to improve mass consumer adoption than to provide significant improvements in baseline IQ. For those who aren't in the loop, GPT-4o is OpenAI's latest model that processes text, speech, and video all at once, with one model. Now, you can show it something you've written on a piece of paper while talking to it and asking it for help. So it can help you understand a piece of code on your screen, read a graph and answer questions based on what it sees on the graph, and even guide kids through their math homework — just like a personal tutor would. This multimodality is another step towards generalisation and AI being employed in all sorts of consumer-facing interfaces. It will likely be a huge upgrade for many of our daily interactions with our devices. It's a classic "target simple, high-frequency use cases first" approach. When you provide a better interface with low latency, adoption and frequency of use naturally go up. They're also providing this flagship model for free for all users, which only helps adoption further. Instead of opening a chat window and typing in your query, only expecting a text output back, you can now simply talk to it and show it what you're trying to do. It's a lot less friction and a lot more useful. Is it good enough to abstract away entire interfaces yet? Not sure. But it's definitely good enough to be passed on as a first draft for Jarvis!
Like Comment
To view or add a comment, sign in
Vitali Gusatinsky

Head of Design / Independent consultant, designer and coach
5mo
Report this post
Insane how fast AI is moving. I've been helping startups and small businesses make AI videos for a while now, and it's crazy how much time I'm saving using AI. If this interests you, be in touch because in less than an hour I can go from idea to first draft rendered and ready, it's really incredible. Some new tools came out recently that I think are good to explore: Galileo (https://usegalileo.ai) is a text-to-UI tool that has some promise in helping designers create some initial layouts for services, and especially certain views. I had a creative block a few days ago on how to organize a list view and this gave me something to start with super fast! ElevenLabs (https://elevenlabs.io) allows you to quickly generate realistic speech from text in no time at all. You can even record yourself speaking and use it as the basis for synthetic speech, ensuring you have the pacing you want. Stable Cascade (https://lnkd.in/dGJNdmSM) is the next generation of open source text-to-image in terms of quality and features. Consistency and the ability to finely tune specific parts with high fidelity is exciting. The fact that this is open source is insane. Sora (https://openai.com/sora) is OpenAI's text-to-video service that was just announced - not yet available for use but it will do for video what Midjourney and Dall-E did for pictures. 60 seconds of high fidelity video that looks photorealistic is incredibly promising! Are you interested in more AI news, should I share more of what I've come across?

web link

cdn.usegalileo.ai
Like Comment
To view or add a comment, sign in
Shawn Chauhan

Generative AI Consultant | Helping CEOs Automate their Businesses with AI | Sharing Daily AI & Tech Insights for You
3mo
Report this post
"The application based operating system is dying. There's no question about that." In this clip, Rabbit CEO Jesse Lyu shares his vision for the future of computing: "We realized how challenging and how difficult even for younger generation to learn how to use different apps and solve a single problem using four or five different apps and jump back and forth. It's just not intuitive enough." Jesse believes the breakthroughs in generative AI have made devices like the Rabbit r1 possible for the first time in history. The r1 aims to solve the "intention" part of computing with large language models, and the "action" part with what Jesse calls a "large action model": "Yes, in theory you can force the language model to work on the tokens and kind of get to a point that you know there's a demo from another company, like 'use AI to go to Mr Beast's latest YouTube video and leave a comment.’” He continues: “Yes, in theory language models can do that, but at what cost? You have to literally watch your screen doing that step by step and it takes roughly around 2-3 minutes to finish one task like that. We just don't think that can convert into a good end user experience.” Instead, Rabbit has built a neural-symbolic AI system that learns to use apps by watching millions of hours of real humans interacting with interfaces. The goal is a seamless experience where you can simply say "hey r1, order me a pizza" and it will handle the task autonomously. The r1 is a bold bet on a new computing paradigm, and it will be fascinating to see how it evolves. But as Jesse reminds us: "Reality is that startup is a survival game and you better spend your time focusing on your own stuff."
Like Comment
To view or add a comment, sign in
Birger H.

Blue | LLMs | Admin | BYOAI
4mo
Report this post
Inflection AI introduces Inflection-2.5, setting a new standard in personal AI technology. This latest update combines high intellectual capabilities with the empathetic interactions of its predecessor, Pi, to offer users an unparalleled AI experience. Inflection-2.5, which now rivals top models like GPT-4, is more efficient, using significantly less compute power for training while delivering exceptional performance, especially in coding and mathematics. It's available now on various platforms, ensuring everyone can benefit from this advanced AI technology. With its unique blend of IQ and EQ, Inflection-2.5 is reshaping the landscape of personal AI interactions.

Inflection-2.5: meet the world's best personal AI

inflection.ai
Like Comment
To view or add a comment, sign in
Michael Ehrlich

Explorer | Metaverse Enthusiast | Proptech Fanboy | EQ & AI Adventurist | Experimental Marketing Engineer | REALTOR®️
5mo
Report this post
𝐆𝐨𝐨𝐠𝐥𝐞'𝐬 𝐆𝐞𝐦𝐢𝐧𝐢 𝐓𝐚𝐤𝐞𝐬 𝐅𝐥𝐢𝐠𝐡𝐭: 𝐕𝐞𝐫𝐬𝐢𝐨𝐧 𝟏.𝟓 𝐒𝐨𝐚𝐫𝐬 𝐰𝐢𝐭𝐡 𝐔𝐧𝐩𝐫𝐞𝐜𝐞𝐝𝐞𝐧𝐭𝐞𝐝 𝐌𝐞𝐦𝐨𝐫𝐲 𝐚𝐧𝐝 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 Get ready for a major leap in AI capabilities! Google's unveiled Gemini 1.5, a game-changer in the world of large language models (LLMs). This powerhouse MoE model boasts a mind-blowing 1 million token context window, the longest ever seen, enabling it to process and understand information like never before. Imagine being able to feed Gemini an hour of video, 11 hours of audio, 30,000 lines of code, or a whopping 700,000 words in a single prompt. That's exactly what Gemini 1.5 can handle, making it capable of tackling complex tasks and generating highly nuanced responses. And the results speak for themselves. In extensive text, code, image, audio, and video evaluations, Gemini 1.5 Pro outperformed its predecessor, 1.0 Pro, on a staggering 87% of benchmarks used for LLM development. This translates into superior performance across various aspects, from understanding intricate code to generating realistic videos. Want to experience this marvel yourself? Head over to AI Studio and sign up to get your hands on Gemini 1.5. Brace yourself for the future of AI, where information overload is not a barrier, but a springboard for incredible possibilities.

1 Comment
Like Comment
To view or add a comment, sign in
John C. Lewandowski

Technology Expert @ US Embassy in Warsaw
6mo
Report this post
👋 I wanted to share some exciting development in AI that's quite a game-changer 🌍 Mistral AI's new open-source project, Mixtral 8x7B. 🤔 Mixtral 8x7B? Mixtral 8x7B is a type of AI model known as a Sparse Mixture of Experts (SMoE). In layman's terms, it's like having a team of specialists (experts), and for each task, only the most relevant specialists jump in to do the job. This approach makes Mixtral not just powerful but also efficient in handling tasks. 🌍 Why's Everyone Talking About It? Mixtral has been turning heads because it can do what some of the big names in AI, like Meta's LLaMA and OpenAI's GPT-3.5, can do, but with a twist - it's much faster and efficient. Imagine having the power of a supercomputer in a regular laptop. That's Mixtral for you. It's designed to run efficiently even on standard PCs💻, which is a big deal because typically, such powerful AI models require significant computing power that's not easily accessible to everyone📲. 🗣️ Making AI Accessible This is where Mixtral shines - it democratizes access to powerful AI. You don't need a high-end setup to run it. Whether you're a small business owner, a student, or just someone curious about AI, Mixtral brings the power of advanced AI to your regular computer🖥️. 🌐 Why It Matters in the Real World For businesses📠, this means being able to leverage advanced AI without the heavy investment in tech infrastructure. For educators and students, it's about having access to cutting-edge technology for learning and research. And for developers and tech enthusiasts, it’s an open playground to innovate and create. Mixtral 8x7B is a testament to how far we've come in making AI both powerful and accessible. It's not just for tech wizards but for anyone with a curiosity and a standard PC. Would love to hear your thoughts on how this could change the game in your field!
Like Comment
To view or add a comment, sign in
Ken Tola

CEO of Bear Systems
4mo Edited
Report this post
As I move from the world of Generative AI and into the deeper world of AI and ML, I wanted to take a minute to provide an insight that I truly hope is incorrect. I have gone through a whole series of Gen AI programming efforts, mainly using langchain, and I see that you can either use your own data OR use an LLM. Despite all of the hoopla about embeddings, vector stores, and so forth, there does not appear to be a way to merge these two data sources together. In other words, you can readily transform a 5,000 page PDF into a vector store that you can then ask questions against. What you cannot do is then ask the same prompt to define terms beyond the text found in the document. Nor can you ask for references, links, or counter ideas not found in that document. Of course, from a software perspective, you can run two different prompts, create logic to figure out when a question cannot be answered in the document(s) but that is prone to error and, really, contrary to what these Gen AI efforts are supposed to provide. Unless your document is sufficiently small, therefore, you are stuck with using one app to talk to your document and another to find answers to additional questions. This might be a small thing for many, however the inability to dynamically, and spontaneously, grow your core knowledge base, to me, is an indicator of the static nature of these Gen AI models. Sure, they can do a lot of useful things to help humans out but they are greatly limited once defined. They cannot actually learn anything new without massive re-processing of their entire data store and they will never achieve any semblance of sentience. Now, that said, none of this will keep me from building my own Gen AI companies 😁 ! If you disagree, please show me a solution to this challenge! #generativeai #ai #ml #embeddings #machinelearning
Like Comment
To view or add a comment, sign in
SkyPower Global

15,689 followers
4mo
Report this post
🌟 Intrigued by the incredible potential of AI after reading a fascinating article on beginners' guide to AI in financial services! Here are some takeaways that caught my eye: 🤖 Artificial intelligence, not just about beating humans at games, but transforming industries like chatbots, e-commerce, and sports betting to operate 'smartly'. 🧠 Delving into intelligent computing fields like machine learning, neural networks, and deep learning, revealing the complex brains behind our digital systems. 🏆 Celebrating the historical milestones of AI, from computer Deep Blue defeating chess champion Kasparov to Deepmind's recent breakthrough in solving a biology puzzle. 💼 Exploring AI's impact on financial services - from contract management to credit decisions and robo-advice, revolutionizing how we handle legal agreements, lending, and financial planning. 🔮 The future outlook? More robots, enhanced cybersecurity, tailored marketplaces, and a digitized world ahead! The data-driven era is driving AI innovation at lightning speed. 🌍 Looking forward to the next part of the series diving into AI applications in the legal sector. Stay tuned for more insights! https://lnkd.in/dPTP4wvn

A Beginner’s Guide to AI: Part One - D2LT

https://d2legaltech.com
Like Comment
To view or add a comment, sign in
Appscribed

1,951 followers
5mo
Report this post
🚀 Breakthrough AI Innovation - Introducing Google's Gemini 1.5 Model We are blown away by the game-changing advancements behind Google's latest artificial intelligence revelation. The new Gemini 1.5 conversational AI represents an exponential leap forwards in capabilities. Powered by a revolutionary 1 million token understanding window - 10X larger than any preceding model - Gemini 1.5 achieves unprecedented comprehension across text, images, audio, video and other multimedia. This positions the model at the cutting-edge of contextual reasoning. 📚 Complex problem-solving abilities were showcased across 100,000 lines of real-world code, where Gemini 1.5 provided helpful solutions and eloquent explanations of program behavior. It also displayed skilled in-context learning using only a single reference book on an obscure language. Analyzing lengthy technical documents or even full-length films also proved no problem for Gemini 1.5 in demonstrations. The model could recall intricate details and determine high-level significance even from very sparse input cues. These breakthroughs leverage an advanced Mixture-of-Experts architecture that improves training speed, efficiency and overall performance beyond legacy transformer models like GPT-4 and Claude which rely on brute compute force. While we eagerly await learning more as Google rolls out Gemini 1.5 access over 2024, its immense potential is clear. Read more: https://lnkd.in/dtVwjw_B This AI innovation could soon revolutionize search, content creation and much more - while progressing responsibly. Kudos Google! 👏 Let us know your thoughts on Gemini 1.5 and the astounding pace of AI advancement! 💬 How do you think generative models will impact your business in the years ahead? #gemini #google #geminiai #ArtificialIntelligence
1 Comment
Like Comment
To view or add a comment, sign in

6,599 followers

View Profile Follow

Arthur’s Post

More Relevant Posts

Explore topics