Rongchai Wang
Mar 23, 2026 20:27
Anthropic demonstrates multi-day autonomous AI workflows the place Claude compressed months of physics analysis coding into a couple of days with minimal human oversight.
Anthropic simply confirmed what occurs whenever you let an AI work unsupervised for days on finish. Their Claude Opus 4.6 mannequin constructed a posh cosmological physics solver from scratch—work that usually takes researchers months or years—in a matter of days.
The $380 billion AI firm revealed analysis on March 23 detailing how their newest mannequin tackled implementing a differentiable Boltzmann solver, which predicts statistical properties of the Cosmic Microwave Background by simulating photons, baryons, neutrinos, and darkish matter within the early universe. The kicker? The researcher overseeing the mission, Siddharth Mishra-Sharma, admits it wasn’t even his core area.
The Setup That Made It Work
Overlook the standard AI chat loop the place people babysit each step. This strategy units Claude unfastened with clear success standards and lets it run autonomously throughout a number of periods. The mannequin achieved sub-percent accuracy in opposition to CLASS, a reference implementation that cosmologists think about the gold normal.
Three elements proved important. First, a progress file (CHANGELOG.md) acts because the agent’s long-term reminiscence between periods, monitoring accomplished duties, failed approaches, and why they did not work. With out recording lifeless ends, successive periods waste time re-attempting the identical errors.
Second, a check oracle—on this case, the CLASS C supply code—provides the agent an goal strategy to measure progress. Claude repeatedly ran unit assessments in opposition to this reference, aiming for that 0.1% accuracy goal.
Third, git commits after each significant unit of labor create recoverable historical past. If compute allocation runs out mid-session, nothing will get misplaced. Mishra-Sharma monitored progress by checking GitHub on his cellphone whereas ready in line for espresso.
The Ralph Loop Downside
Present fashions undergo from what Anthropic calls “agentic laziness”—they’re going to discover excuses to cease earlier than ending complicated duties. One mannequin actually mentioned, “It is getting late, let’s choose again up once more tomorrow.”
The workaround is the “Ralph loop,” basically a for-loop that kicks the agent again into context when it claims completion and asks if it is actually executed. Claude would iterate as much as 20 instances till genuinely completed.
The place It Struggled
The event trajectory wasn’t easy. Claude initially examined code at solely a single parameter level, drastically lowering its bug-catching means. It spent hours chasing bugs that any cosmologist would spot immediately. It tripped over gauge conventions.
However it stored making progress. The ensuing solver is not production-grade—accuracy falls quick in sure regimes—but it demonstrates real compression of researcher time.
What This Means for AI Improvement
This builds on Anthropic’s earlier C compiler mission, the place Claude labored throughout roughly 2,000 periods to construct a compiler able to compiling the Linux kernel. The Boltzmann solver required totally different expertise: tracing errors by means of a deeply coupled pipeline the place small numerical errors cascade by means of every part downstream.
An surprising aspect impact emerged. Mishra-Sharma discovered substantial physics by watching the git commit historical past unfold. He described it as studying lab notes from “a quick, hyper-literal postdoc.”
For Anthropic, recent off a $30 billion Collection G spherical in February that valued the corporate at $380 billion, these demonstrations matter. They don’t seem to be simply displaying Claude can chat—they’re proving it will probably change costly, specialised labor over prolonged intervals with minimal supervision. The query now turns into which industries determine deploy this primary.
Picture supply: Shutterstock
