/r/reinforcementlearning

Year:

Only show posts with narrations

John Schulman (PPO, OA co-founder, post-training/RLHF) leaves OpenAI for Anthropic

72 upvotes • gwern

Mark as read: Add to a list

Sharing my JAX-based RL Algorithms Repository - Including BBF and TD7 Implementations

23 upvotes • New_East832

Mark as read: Add to a list

Why does the agent do not learn to get to the cube position ?

16 upvotes • CoolestSlave

Mark as read: Add to a list

Since Offline RL is environment-independent, why are many paper implementations still based on gym?

16 upvotes • Desperate_List4312

Mark as read: Add to a list

Why does Efficient Zero V2 work?

12 upvotes • Automatic-Web8429

Mark as read: Add to a list

"Pareto" in layman's terms?

8 upvotes • WilhelmRedemption

Mark as read: Add to a list

Very Slow Environment - Should I pivot to Offline RL?

8 upvotes • NoNeighborhood9302

Mark as read: Add to a list

A New Survey -- Generative Models for Offline Policy Learning

8 upvotes • Ashamed-Put-2344

Mark as read: Add to a list

RLHF in LLMs: Variable action space?

8 upvotes • No_Individual_7831

Mark as read: Add to a list

Switching academic careerpath from ML to RL

7 upvotes • SenecaEnjoyer69

Mark as read: Add to a list

Title	Upvotes	Author	Mark as read	Favorited	Rating	Add to a list
John Schulman (PPO, OA co-founder, post-training/RLHF) leaves OpenAI for Anthropic	72	gwern
Sharing my JAX-based RL Algorithms Repository - Including BBF and TD7 Implementations	23	New_East832
Why does the agent do not learn to get to the cube position ?	16	CoolestSlave
Since Offline RL is environment-independent, why are many paper implementations still based on gym?	16	Desperate_List4312
Why does Efficient Zero V2 work?	12	Automatic-Web8429
"Pareto" in layman's terms?	8	WilhelmRedemption
Very Slow Environment - Should I pivot to Offline RL?	8	NoNeighborhood9302
A New Survey -- Generative Models for Offline Policy Learning	8	Ashamed-Put-2344
RLHF in LLMs: Variable action space?	8	No_Individual_7831
Switching academic careerpath from ML to RL	7	SenecaEnjoyer69