Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request : ORPO #894

Open
nivibilla opened this issue Apr 28, 2024 · 3 comments
Open

Feature Request : ORPO #894

nivibilla opened this issue Apr 28, 2024 · 3 comments

Comments

@nivibilla
Copy link

nivibilla commented Apr 28, 2024

Hi,

First of all, thank you for this library! Very clean and I appreciate that all I need is pytorch!

I wanted to make an issue for the integration of ORPO, not needing to do SFT before the RLHF step is huge since it saves a lot of compute when training on preference data. Hoping it can be integrated into torch tune (with Lora support if possible)!

There is an existing integration into TRL

Thanks!

@kartikayk
Copy link
Contributor

@nivibilla thanks for opening this issue!

ORPO would indeed be a really nice addition to the library. It's not been at the top of our list if I was being honest, but maybe we should reconsider. Is this something you'd be open to adding? DPO and PPO (WIP) are both added by our awesome community members and if you'd be interested in adding ORPO - I'm happy to help to brainstorm and review the design, code etc.

@nivibilla
Copy link
Author

nivibilla commented Apr 30, 2024

Hi @kartikayk

Im not that experienced in writing custom training loops. Mainly a huggingface user haha. I'd be no better than llama 3 70b attempting it 🤣

@kartikayk
Copy link
Contributor

I'm partly serious here, but why not train a codellama 70B using torchtune and then see if this gets you the right recipe :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants