Feature Request : ORPO #894

nivibilla · 2024-04-28T19:17:45Z

Hi,

First of all, thank you for this library! Very clean and I appreciate that all I need is pytorch!

I wanted to make an issue for the integration of ORPO, not needing to do SFT before the RLHF step is huge since it saves a lot of compute when training on preference data. Hoping it can be integrated into torch tune (with Lora support if possible)!

There is an existing integration into TRL

Thanks!

kartikayk · 2024-04-29T15:38:53Z

@nivibilla thanks for opening this issue!

ORPO would indeed be a really nice addition to the library. It's not been at the top of our list if I was being honest, but maybe we should reconsider. Is this something you'd be open to adding? DPO and PPO (WIP) are both added by our awesome community members and if you'd be interested in adding ORPO - I'm happy to help to brainstorm and review the design, code etc.

nivibilla · 2024-04-30T21:25:57Z

Hi @kartikayk

Im not that experienced in writing custom training loops. Mainly a huggingface user haha. I'd be no better than llama 3 70b attempting it 🤣

kartikayk · 2024-04-30T23:03:26Z

I'm partly serious here, but why not train a codellama 70B using torchtune and then see if this gets you the right recipe :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request : ORPO #894

Feature Request : ORPO #894

nivibilla commented Apr 28, 2024 •

edited

kartikayk commented Apr 29, 2024

nivibilla commented Apr 30, 2024 •

edited

kartikayk commented Apr 30, 2024

Feature Request : ORPO #894

Feature Request : ORPO #894

Comments

nivibilla commented Apr 28, 2024 • edited

kartikayk commented Apr 29, 2024

nivibilla commented Apr 30, 2024 • edited

kartikayk commented Apr 30, 2024

nivibilla commented Apr 28, 2024 •

edited

nivibilla commented Apr 30, 2024 •

edited