AI Agents May Soon Handle Tasks That Take Humans Weeks
- Patrick Law
- Mar 25
- 2 min read
A new study proposes a simple but powerful way to track AI progress: Measure how long of a task an AI agent can complete on its own — with 50% reliability.
Over the last six years, the results have shown a clear trend:🕒 The length of tasks AI can handle has been doubling every 7 months.
What That Means in Practice
Today’s best models — like Claude 3.7 Sonnet — can reliably complete tasks that take humans about one hour.
For tasks under 4 minutes, AI success is nearly 100%.
For tasks over 4 hours, success drops below 10%.
This gap highlights a key challenge: AI is great at solving single-step problems, but still struggles to stay on track across longer, multi-step tasks.
What’s Coming Next?
If this doubling trend continues, AI agents could be reliably handling week-long or even month-long projects in just a few years.
Researchers say this could lead to a major shift in real-world applications — from automation in software development to handling complex workflows with little human oversight.
Even if the measurements are off by a factor of 10, the trend still predicts a leap in capability within just a few years.
Why This Matters
Rather than relying on benchmarks or test scores, this approach offers a real-world lens for evaluating AI usefulness — by asking:How long of a task can it actually finish without help?
With that framing, we get a much clearer picture of what today’s AI can and can’t do — and how fast that’s changing.
Read the full paper on arXiv:https://arxiv.org/abs/2503.14499DOI: 10.48550/arXiv.2503.14499
Original blog post by METR:https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks Check out the video: https://youtube.com/shorts/Su8TFgIVjRU
Comments