AI Agents May Soon Handle Tasks That Take Humans Weeks

Patrick Law
Mar 25
2 min read

A new study proposes a simple but powerful way to track AI progress: Measure how long of a task an AI agent can complete on its own — with 50% reliability.

Over the last six years, the results have shown a clear trend:🕒 The length of tasks AI can handle has been doubling every 7 months.

from: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

What That Means in Practice

Today’s best models — like Claude 3.7 Sonnet — can reliably complete tasks that take humans about one hour.
For tasks under 4 minutes, AI success is nearly 100%.
For tasks over 4 hours, success drops below 10%.

This gap highlights a key challenge: AI is great at solving single-step problems, but still struggles to stay on track across longer, multi-step tasks.

What’s Coming Next?

If this doubling trend continues, AI agents could be reliably handling week-long or even month-long projects in just a few years.

Researchers say this could lead to a major shift in real-world applications — from automation in software development to handling complex workflows with little human oversight.

Even if the measurements are off by a factor of 10, the trend still predicts a leap in capability within just a few years.

Why This Matters

Rather than relying on benchmarks or test scores, this approach offers a real-world lens for evaluating AI usefulness — by asking:How long of a task can it actually finish without help?

With that framing, we get a much clearer picture of what today’s AI can and can’t do — and how fast that’s changing.

Read the full paper on arXiv:https://arxiv.org/abs/2503.14499DOI : 10.48550/arXiv.2503.14499

Original blog post by METR:https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks Check out the video: https://youtube.com/shorts/Su8TFgIVjRU

AI Agents May Soon Handle Tasks That Take Humans Weeks

What That Means in Practice

What’s Coming Next?

Why This Matters

Recent Posts

コメント