Safetywashing? A new paper suggests alignment progress is mostly driven by capabilities progress, which can give a false sense of how well alignment is going
by /u/Maxie445 in /r/singularity
Upvotes: 39
Favorite this post:
Mark as read:
Your rating:
Add this post to a custom list