This is wild - UC Berkeley shows that a tiny 1.5B model beats o1-preview on math by RL! They applied simple RL to Deepseek-R1-Distilled-Qwen-1.5B on 40K math problems, trained at 8K context, then scaled to 16K & 24K. 3,800 A100 hours ($4,500) to beat o1-preview in math! Best