New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed(gzip) csv files support. #4039
Comments
Both GZIP actually provides relatively low compression ratio for files in this format, so if we'll introduce it, the next feature request will be about BZIP2, PPMd or other compression method. Maybe possibility to pass data to a third-party program and read data from a third-party program will be a better option, but it should be carefully reviewed from a security perspective. |
Greetings.
Apologies for chiming in. I have dealt with those problems in 3
different ways:
1) when there is only technical information needed about data and types
(without support of "human readable", then in my opinion Parquet is the
best choice. Especially when moving tables between DBs without a direct
DB link or network connection.
For the purpose I
wrote https://github.com/manticore-projects/JDBCParquetWriter which can
transfer any JDBC Result Set into a parquet file (with compression).
Lots of DBs read parquet files directly these days.
2) if human readability is an issue then SQLSheet can help. I took over
maintenance and amended the MetaData and Stream
capabilities: https://github.com/manticore-projects/sqlsheet
Especially regarding to Data Type information, it allows to maintain
extra header rows to store this information (reading and writing):
Th resulting XLSX files are gzip compressed already and since styles
are used they don't bear too much overhead. Its a good trade-off in my
opinion.
3) for everything else, simple stupid CSV export is fine in my opinion
and yes a Compression can be helpful. Be careful though of ZIP Bombs
and stuff, its a real issue. Invest into Apache Commons Compress or any
similar well aged solution in order to avoid inventing the wheel.
Feel free to count me in, I have big interest in such stuff as we move
data in between databases regularly.
All the best
Andreas
On Sat, 2024-04-06 at 23:09 -0700, Evgenij Ryazanov wrote:
Both CSVREAD and CSVWRITE have significant design flaws. Modern
versions of H2 don't have any universal data type any more and there
is no way to specify data types to parse values from CSV properly.
There are some problems with really large files, parameters are
obscure and so on. From my personal point of view, we should
deprecate both these functions and introduce a better designed
replacement for them. Maybe something similar to the COPY command
from PostgreSQL or whatever else. We can try to add additional
filters in the new API, but changes in these legacy functions are
premature at this point, but If you need a quick fix, your can write
some user-defined functions for your application.
GZIP actually provides relatively low compression ratio for files in
this format, so if we'll introduce it, the next feature request will
be about BZIP2, PPMd or other compression method. Maybe possibility
to pass data to a third-party program and read data from a third-
party program will be a better option, but it should be carefully
reviewed from a security perspective.
—
Reply to this email directly, view it on GitHub [1], or unsubscribe
[2].
You are receiving this because you are subscribed to this
thread.Message ID:
***@***.***>
|
Hello.
Code should be easy given all the chores are encapsulated within CSVREAD/CSVWRITE functions. I will try to write that code and create the PR if there are no objections against that functionality. Just let me know, if any.
The text was updated successfully, but these errors were encountered: