Direct Preference Optimization (DPO) for LLM Alignment coded in Python & PyTorch from scratch
by /u/seraschka in /r/Python
Upvotes: 12
Favorite this post:
Mark as read:
Your rating:
Add this post to a custom list
StoryNote Upvotes: 12