A team of AI researchers at the University of California, Berkeley, under the guidance of Ph.D. student Jiayi Pan, has successfully replicated the core technologies of DeepSeek R1-Zero for a mere $30. This demonstrates the potential for advanced models to be executed economically. Jiayi Pan shared via Nitter that their group recreated DeepSeek R1-Zero within the framework of the Countdown game. This compact language model, equipped with 3 billion parameters, acquired self-checking and searching capabilities through reinforcement learning.
Beginning with a basic language model, a prompt, and a ground-truth reward, Pan and his team applied reinforcement learning based on the Countdown game. This game mirrors a British game show by the same name, where contestants strive to reach a target number from a set of given numbers using elementary arithmetic operations.
The researchers observed that initially, the model produced basic, incorrect outputs but gradually developed strategies such as revision and searching to pinpoint the correct answers. For instance, the model would propose a solution, verify its accuracy, and then refine it through several attempts until the right answer was achieved.
Beyond the Countdown scenario, Pan also tested multiplication tasks on the model, where it adopted a distinct approach to solve problems. It segmented the problem using the distributive property of multiplication, akin to the method some might use to mentally calculate large numbers, and then tackled it step-by-step.
The Berkeley team experimented with various versions of their model, starting with one that had only 500 million parameters, which would guess a potential solution and then halt, regardless of its correctness. However, when they employed a model with 1.5 billion parameters, they began observing that the models learned diverse techniques to score better. As they increased the parameters to between 3 and 7 billion, the model consistently reached the correct answers more swiftly.
Remarkably, the team claims that all this was achieved with just about $30 in costs. This is in stark contrast to OpenAI’s o1 APIs, which cost $15 per million input tokens—substantially higher than DeepSeek-R1’s $0.55 per million input tokens. Pan emphasizes that the project is aimed at making scalable reinforcement learning research more accessible and affordable.
However, there is some contention from machine learning expert Nathan Lambert regarding the real expenses behind DeepSeek, who argues that the reported $5 million cost for training its 671 billion LLM does not account for other significant expenses such as staff, infrastructure, and electricity. Lambert estimates the annual operational costs of DeepSeek AI could be between $500 million and over $1 billion. Nonetheless, this marks a significant achievement, particularly when compared to the $10 billion annually spent by other leading American AI developers.