StoryNote logo

METR Evals - LLM agents vs skilled humans on diverse task completion: When agents can do a task, they do so at ~1/30th of the cost of the median hourly wage of a US bachelor’s degree... Claude 3.5 Sonnet agent fixed bugs in an ORM library at a cost of <$2, Human baseline took >2 hours.

by /u/sachos345 in /r/singularity

Upvotes: 159

Favorite this post:
Mark as read:
Your rating:
Add this post to a custom list

StoryNote©

Reddit is a registered trademark of Reddit, Inc. Use of this trademark on our website does not imply any affiliation with or endorsement by Reddit, Inc.