Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support StreamingDataLoader passed to map #13

Open
ethanwharris opened this issue Feb 23, 2024 · 1 comment
Open

Support StreamingDataLoader passed to map #13

ethanwharris opened this issue Feb 23, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@ethanwharris
Copy link
Member

馃殌 Feature

It would be great to be able to pass a StreamingDataLoader to map. When experimenting with CLIP embeddings I've found that I needed to use StreamingDataLoader to be able to fully utilise the GPU - but it doesn't play nicely with map because we don't set the right env variables and things for it to work in a distributed setting.

Motivation

This would let us run distributed embedding of much larger data sets like LAION

Pitch

Allow providing a StreamingDataLoader to the map function, and then set correct envs etc. so that we still visit each sample just once.

@ethanwharris ethanwharris added enhancement New feature or request help wanted Extra attention is needed labels Feb 23, 2024
Copy link

Hi! thanks for your contribution!, great first issue!

@Borda Borda removed the help wanted Extra attention is needed label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants