Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing parquet to stdout #1687

Closed
pacman82 opened this issue May 11, 2022 · 9 comments
Closed

Support writing parquet to stdout #1687

pacman82 opened this issue May 11, 2022 · 9 comments
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@pacman82
Copy link

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

I would like to write parquet files to "true" streams. E.g. stdout. This is in the context of the downstream odbc2parquet tool, for which I would like provide the option. This would allow my users to stream the parquet directly into a key value store or other sink just using pipes in their shell.

Describe the solution you'd like
I would like to see the Seek + TryClone requirement dropped as a requirement to initialize a SerializedFileWriter. From what I've seen at least the Seek requirement is used to determine the length of the Metadata written into the stream. Or tracking stream position in general. I feel Seek is to strong a requirement to just keep track of a position, or bytes written.

Describe alternatives you've considered
I have not considered any alternatives. Happy to hear about them, though.

@pacman82 pacman82 added the enhancement Any new improvement worthy of a entry in the changelog label May 11, 2022
@alamb
Copy link
Contributor

alamb commented May 12, 2022

This may be related to some work @tustvold has planned for the parquet reader such as #1605

I haven't heard him discuss anything about the writer yet though

@tustvold
Copy link
Contributor

As described in #937 it should be relatively straightforward to drop the seek requirement. A PR would be most welcome, otherwise I can try to take a stab when I have time

@pacman82
Copy link
Author

Great. Sorry for missing the existing issue. I looked, but not thouroghly enough it seems. I also found it would be rather straight forward to drop it. Might become my first contribution, if I am not kept busy with issues on the downstream artefacts. My intention with this issue, was exactly to verify that such a PR indee would be welcome.

Thanks for the quick response!

@tustvold
Copy link
Contributor

I'm currently working on this as part of fixing #1717

@tustvold tustvold self-assigned this May 21, 2022
@pacman82
Copy link
Author

@tustvold Great! This will unblock new features in downstream crate odbc2parqet. 🙇

@alamb
Copy link
Contributor

alamb commented May 23, 2022

Specifically, #1719 allows SerializedFileWriter to write to anything that implements std::io::Write as is common in the rust ecosystem 🎉

@pacman82
Copy link
Author

This is great. I'm not sure what the etiquette here is. Am I supposed to close this issue, or do the maintainers do so? For me the change in signature is enough to verify that it solves my use-case.

@tustvold
Copy link
Contributor

tustvold commented May 23, 2022

It will be closed automatically when #1719 is merged, which will hopefully be in the next few days. Going to leave it open for a bit to give other reviewers a chance to look over it

@tustvold tustvold closed this as completed Jun 2, 2022
@alamb alamb added the parquet Changes to the parquet crate label Jun 9, 2022
@alamb alamb changed the title Support writing parquet to stdout Support writing parquet to stdout Jun 9, 2022
@alamb
Copy link
Contributor

alamb commented Jun 9, 2022

planned for release in 16.0.0 (eta early next week)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

No branches or pull requests

3 participants