/r/reinforcementlearning

Year:

Only show posts with narrations

Comparing online learning/model-based with offline learning/model-free algorithms

3 upvotes • BeezyPineapple

Mark as read: Add to a list

[R] preference learning: RLHF, best-of-n sampling (BoN), or direct preference optimization (DPO)?

2 upvotes • gwern

Mark as read: Add to a list

Training a DDPG to act as a finely tuned controller for a 3DOF aircraft

2 upvotes • leo95nf

Mark as read: Add to a list

RL model inside of RL model

2 upvotes • Practical-Resort7278

Mark as read: Add to a list

Save model at peak ep_rew_mean?

2 upvotes • RamenKomplex

Mark as read: Add to a list

Why Decision Transformer works in OfflineRL sequential decision making domain？

2 upvotes • Desperate_List4312

Mark as read: Add to a list

돈버는법ㅣ하루만에 80만원번 꿀팁 대공개

1 upvotes • BattleEast495

Mark as read: Add to a list

Everything Was Fine Until This Change

1 upvotes • OpenToAdvices96

Mark as read: Add to a list

我是怎么用以太坊套利机器人做到日赚1000 usdt的？无风险套利eth新手教程 - MEV套利自动搬砖

1 upvotes • BattleEast495

Mark as read: Add to a list

코인 선물거래소 추천ㅣ가입시 100$ 받는 방법 꿀팁

1 upvotes • BattleEast495

Mark as read: Add to a list

Title	Upvotes	Author	Mark as read	Favorited	Rating	Add to a list
Comparing online learning/model-based with offline learning/model-free algorithms	3	BeezyPineapple
[R] preference learning: RLHF, best-of-n sampling (BoN), or direct preference optimization (DPO)?	2	gwern
Training a DDPG to act as a finely tuned controller for a 3DOF aircraft	2	leo95nf
RL model inside of RL model	2	Practical-Resort7278
Save model at peak ep_rew_mean?	2	RamenKomplex
Why Decision Transformer works in OfflineRL sequential decision making domain？	2	Desperate_List4312
돈버는법ㅣ하루만에 80만원번 꿀팁 대공개	1	BattleEast495
Everything Was Fine Until This Change	1	OpenToAdvices96
我是怎么用以太坊套利机器人做到日赚1000 usdt的？无风险套利eth新手教程 - MEV套利自动搬砖	1	BattleEast495
코인 선물거래소 추천ㅣ가입시 100$ 받는 방법 꿀팁	1	BattleEast495