Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: is there a plan to support streaming from GCS? #101

Open
dnnspark opened this issue Apr 13, 2024 · 6 comments
Open

Question: is there a plan to support streaming from GCS? #101

dnnspark opened this issue Apr 13, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@dnnspark
Copy link

馃殌 Feature

Motivation

Pitch

Alternatives

Additional context

@dnnspark dnnspark added enhancement New feature or request help wanted Extra attention is needed labels Apr 13, 2024
Copy link

Hi! thanks for your contribution!, great first issue!

@tchaton
Copy link
Collaborator

tchaton commented Apr 14, 2024

Hey @dnnspark,

Yes, this is quite simple to add. Simply needs to add the downloader.

@dnnspark
Copy link
Author

Thanks @tchaton, do you have an idea when it's going to land (even very rough estimate)?

@tchaton
Copy link
Collaborator

tchaton commented Apr 15, 2024

Hey @dnnspark,

If you are willing to give it a try, I can look into it this week.

@dnnspark
Copy link
Author

Sorry for the late @tchaton

I'm willing to try! But it's not blocking at the moment, so I will stay tuned about the GCS support (it will be very helpful if you ping on this thread once it's ready).

One thing I notice is that optimize() function assumes the data is stored on local disk (at least in the example). In my case, the raw data is at GCS (because it's too large). Is there a way to transform the data that is stored in the cloud, and save the transformed data to the cloud without having to download the entire data?

@tchaton
Copy link
Collaborator

tchaton commented Apr 17, 2024

Sorry for the late @tchaton

I'm willing to try! But it's not blocking at the moment, so I will stay tuned about the GCS support (it will be very helpful if you ping on this thread once it's ready).

One thing I notice is that optimize() function assumes the data is stored on local disk (at least in the example). In my case, the raw data is at GCS (because it's too large). Is there a way to transform the data that is stored in the cloud, and save the transformed data to the cloud without having to download the entire data?

Yes. that's why this library was built :) But I would need to add GCS support for it ;) I will try to prioritize it.

@Borda Borda removed the help wanted Extra attention is needed label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants