/r/singularity
Mark as read: Add to a list
Mark as read: Add to a list
Mark as read: Add to a list
Mark as read: Add to a list
Mark as read: Add to a list
Mark as read: Add to a list
Mark as read: Add to a list
METR Evals - LLM agents vs skilled humans on diverse task completion: When agents can do a task, they do so at ~1/30th of the cost of the median hourly wage of a US bachelor’s degree... Claude 3.5 Sonnet agent fixed bugs in an ORM library at a cost of <$2, Human baseline took >2 hours.
Mark as read: Add to a list