Meta Releases Llama 3.1
In May 2020, OpenAI released GPT-3, a large language model that used approximately 10^23 operations during training.
About 23 months later, Meta released OPT-175B. The model was, roughly speaking, a replica of GPT-3.
Now, about 16 months after OpenAI’s March 2023 release of GPT-4, Meta has again succeeded in catching up to OpenAI with the release of Llama 3.1, a “herd” of three Llama models with 8 billion (8B), 70B, and 405B parameters.
Meta expended immense resources to build Llama 3.1-405B, which performs competitively with the best AI systems in the world.
16,384 NVIDIA #H100 chips executed over 10^25 operations to train the model. 16 chips cost about $400,000, so this full arsenal could easily be worth $400 million.
Well over 200 employees worked on the project as “core contributors,” plus hundreds of additional employees in less involved roles. Top AI engineers can command multi-million dollar pay packages, so Meta may have paid over $100 million in labor costs.
Over 15 trillion tokens served as text data for training the model. Meta offers limited information about the “variety of sources” behind this dataset. Nonetheless, data access deals can cost over $100 million per year.
Lastly, varying hardware demands occasionally caused “instant fluctuations of power consumption across the data center on the order of tens of megawatts, stretching the limits of the power grid.” For reference, 10 megawatts would supply over 80,000 megawatt-hours in a year, enough to power thousands of US households.
In total, Meta may have spent half a billion dollars on this project’s resources. And they are poised to spend more; CEO Mark Zuckerberg told Bloomberg that “we’re basically already starting to work on Llama 4.”
The Llama 3.1 models—including versions without safety guardrails—are available for anyone on the internet to download. Facebook and Instagram users can access the 405B version for free through a chatbot interface at meta.ai.
Meta tested Llama 3.1 for risks related to cyber, chemical, and biological weapons. It appeared to be safe.
However, the testing failed to account for ways that malicious actors might modify, tune, and specialize the model to cause harm. This raises national security risks, as actors in China, Russia, and Iran are already using less-modifiable AI models to assist in influence operations and cyber attacks.
This serious oversight in Meta’s safety testing demonstrates the need for stronger safety practices at billion-dollar AI companies.
#AI #AIPolicy #Llama3
Pictured: The start of Meta's technical report on Llama 3.