StoryNote
Log in
|
Sign up
/r/reinforcementlearning
Year:
All
2024
Show search filters
Search by title:
Search by author:
Hide posts already read
Only show posts with narrations
John Schulman (PPO, OA co-founder, post-training/RLHF) leaves OpenAI for Anthropic
72 upvotes
•
gwern
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Sharing my JAX-based RL Algorithms Repository - Including BBF and TD7 Implementations
23 upvotes
•
New_East832
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Why does the agent do not learn to get to the cube position ?
16 upvotes
•
CoolestSlave
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Since Offline RL is environment-independent, why are many paper implementations still based on gym?
16 upvotes
•
Desperate_List4312
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Why does Efficient Zero V2 work?
12 upvotes
•
Automatic-Web8429
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
"Pareto" in layman's terms?
8 upvotes
•
WilhelmRedemption
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Very Slow Environment - Should I pivot to Offline RL?
8 upvotes
•
NoNeighborhood9302
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
A New Survey -- Generative Models for Offline Policy Learning
8 upvotes
•
Ashamed-Put-2344
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
RLHF in LLMs: Variable action space?
8 upvotes
•
No_Individual_7831
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Switching academic careerpath from ML to RL
7 upvotes
•
SenecaEnjoyer69
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Title
Upvotes
Author
Mark as read
Favorited
Rating
Add to a list
John Schulman (PPO, OA co-founder, post-training/RLHF) leaves OpenAI for Anthropic
72
gwern
--
10
9
8
7
6
5
4
3
2
1
0
Sharing my JAX-based RL Algorithms Repository - Including BBF and TD7 Implementations
23
New_East832
--
10
9
8
7
6
5
4
3
2
1
0
Why does the agent do not learn to get to the cube position ?
16
CoolestSlave
--
10
9
8
7
6
5
4
3
2
1
0
Since Offline RL is environment-independent, why are many paper implementations still based on gym?
16
Desperate_List4312
--
10
9
8
7
6
5
4
3
2
1
0
Why does Efficient Zero V2 work?
12
Automatic-Web8429
--
10
9
8
7
6
5
4
3
2
1
0
"Pareto" in layman's terms?
8
WilhelmRedemption
--
10
9
8
7
6
5
4
3
2
1
0
Very Slow Environment - Should I pivot to Offline RL?
8
NoNeighborhood9302
--
10
9
8
7
6
5
4
3
2
1
0
A New Survey -- Generative Models for Offline Policy Learning
8
Ashamed-Put-2344
--
10
9
8
7
6
5
4
3
2
1
0
RLHF in LLMs: Variable action space?
8
No_Individual_7831
--
10
9
8
7
6
5
4
3
2
1
0
Switching academic careerpath from ML to RL
7
SenecaEnjoyer69
--
10
9
8
7
6
5
4
3
2
1
0
«
<
>
»
Page
of 13
Go