top of page

AI Agents May Soon Handle Tasks That Take Humans Weeks

  • Writer: Patrick Law
    Patrick Law
  • Mar 25
  • 2 min read

A new study proposes a simple but powerful way to track AI progress: Measure how long of a task an AI agent can complete on its own — with 50% reliability.

Over the last six years, the results have shown a clear trend:🕒 The length of tasks AI can handle has been doubling every 7 months.



What That Means in Practice

  • Today’s best models — like Claude 3.7 Sonnet — can reliably complete tasks that take humans about one hour.

  • For tasks under 4 minutes, AI success is nearly 100%.

  • For tasks over 4 hours, success drops below 10%.


This gap highlights a key challenge: AI is great at solving single-step problems, but still struggles to stay on track across longer, multi-step tasks.


What’s Coming Next?

If this doubling trend continues, AI agents could be reliably handling week-long or even month-long projects in just a few years.

Researchers say this could lead to a major shift in real-world applications — from automation in software development to handling complex workflows with little human oversight.

Even if the measurements are off by a factor of 10, the trend still predicts a leap in capability within just a few years.


Why This Matters

Rather than relying on benchmarks or test scores, this approach offers a real-world lens for evaluating AI usefulness — by asking:How long of a task can it actually finish without help?

With that framing, we get a much clearer picture of what today’s AI can and can’t do — and how fast that’s changing.

 
 
 

Recent Posts

See All

Comments


bottom of page