The huge cost of training LLMs | Lex Fridman Podcast
Lex Fridman · 4m
aimachine learningcomputational economicstechnology infrastructurescaling lawslarge language models
Resumen
The podcast discusses the significant economic challenges and considerations in large language model (LLM) training and deployment. The speaker highlights that while pre-training models has become increasingly expensive, the real financial burden lies in serving these models to millions of users. Training a model like DeepSeek reportedly costs around $5 million, with cluster rental and engineering costs around $2 million. However, the recurring costs of serving these models can reach billions of dollars in compute infrastructure. The discussion suggests that scaling laws have held consistently across 13 orders of magnitude of computational power, indicating continued potential for model improvement. The speaker is optimistic about future developments, anticipating new compute clusters coming online by 2026 with gigawatt-scale facilities. These infrastructure investments, planned in 2022-2023, reflect the tech industry's ongoing commitment to AI development. The conversation also explores the nuanced relationship between model size, benchmark performance, and practical intelligence, suggesting that incremental improvements might lead to more sophisticated AI capabilities.
Puntos clave
- → Training large language models costs millions, but serving them to users costs billions
- → Scaling laws have consistently held across 13 orders of computational magnitude
- → New gigawatt-scale compute clusters are expected to come online by 2026
- → Model improvements might lead to significantly more expensive AI subscriptions
Citas notables
"The cost of training them is really low relative to the cost of serving them to hundreds of millions of users"
"It's held for 13 orders of magnitude of computers something like why would it ever end?"
"We will see a $2,000 subscription this year. We've seen $200 subscriptions."