AI Can Now Do an Hour of Your Work — Here’s What That Means

Patrick Law
Mar 20
2 min read

Introduction

Imagine handing off a task that normally takes you an hour — and AI gets it done in minutes. Not a dream. It’s happening now.

A new way of measuring AI progress is making headlines. Instead of asking, “Can AI solve this problem?” researchers are now asking, “How long a task can AI handle like a human?” And the results are surprising.

The Problem

Most AI benchmarks today look at accuracy or scores on tests.But those don’t reflect what we really care about — saving time on real work.

Many tools do great on small, simple tasks.
But what about writing full reports, reviewing code, or handling audits?
Engineers, developers, and analysts still spend hours doing these tasks manually.

That’s the gap — we’ve had no good way to measure how much time AI can actually save.

The Solution: Task Time Horizon

Researchers at METR (Model Evaluation and Testing for Reliability) came up with a new metric called the Task Time Horizon.

Here’s what it does:

It measures how long a task an AI can complete at a decent success rate — usually 50%.
It’s like asking: “What’s the longest thing this AI can do well enough?”

Here’s what they found:

GPT-2 from 2019 could barely handle tasks over 1 minute.
Claude 3.7, released in early 2025, can now finish half of all tasks that take humans up to 59 minutes (Nature, 2025).

That’s a huge leap in just a few years.

How It Works (Step by Step)

Pick real-world tasks
1. Researchers chose 170 tasks from coding, cybersecurity, reasoning, and machine learning — not just simple Q&A.
Time expert humans
1. Professionals completed the tasks, and their time was recorded to set the baseline.
Test AI on the same tasks
1. Each AI model was given the same tasks. If it completed at least 50% of tasks at a certain time length — that became its "task time horizon."
Track growth over time
1. Since 2019, AI models have doubled their task time horizon every 7 months. In 2024, the pace sped up to every 3 months.
Forecast the future
1. At this rate, AI could reliably complete month-long projects by 2029.

Conclusion

The new Task Time Horizon metric gives us a clearer way to understand what AI can actually do — not just how smart it is, but how much time it can save.

This is a major shift, especially for people in engineering, coding, and technical work.