Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytea fields are wrongly handled #61

Closed
lservini opened this issue May 23, 2022 · 2 comments
Closed

bytea fields are wrongly handled #61

lservini opened this issue May 23, 2022 · 2 comments
Labels

Comments

@lservini
Copy link
Contributor

From case 8581,

However, we discovered a bug, which prevents us from using it. The bug is related to incorrect values inserted into
bytea type columns. We have "hash" column in transactions table with a bytea type. I created a CSV export from 
Postgres DB. For instance in Postgres hash is "\x160f2b057728502c17b0f7d5883a94b2dcc5c8be013cf3b81c0fe6ddd6b77050",
but when it is inserted into hypertable using timescaledb-parallel-copy utility, actually what is inserted is 
hex(hash). So, after the import with the utility, I see in hypertable
`\x5c7831363066326230353737323835303263313762306637643538383361393462326463633563386265303133636633623831633066653664646436623737303530` 
which actually is hex('\x160f2b057728502c17b0f7d5883a94b2dcc5c8be013cf3b81c0fe6ddd6b77050').
@lservini lservini added the bug label May 23, 2022
@jchampio
Copy link
Contributor

jchampio commented Jun 1, 2022

Looks like lib/pq's COPY support is limited to text input, which it always escapes. There was a raw CopyData API added by Cockroach but no tests, so I don't really know how public clients would actually use it. lib/pq is in maintenance mode and doesn't appear to be very healthy. 😬

jchampio added a commit to jchampio/timescaledb-parallel-copy that referenced this issue Jun 9, 2022
lib/pq has been in maintenance mode for a while, and issue timescale#61 appears
to have run into one of its idiosyncrasies: its COPY implementation
assumes that you're using a query generated via pq.CopyIn(), which uses
the default TEXT format, so it runs all of the incoming data through an
additional escaping layer.

Our code uses CSV by default (and there appears to be no way to use TEXT
format, since we're using the old COPY syntax), which means that
incoming CSV containing its own escapes will be double-escaped and
corrupted. This is most visible with bytea columns, but the tests
currently document additional problems with tab and backslash
characters, and there are probably other problematic cases too.

To fix, switch from lib/pq over to jackc/pgx, and reimplement
db.CopyFromLines() using the PgConn.CopyFrom() API. We were already
depending on a part of this library before, so the new dependency isn't
as big of a change as it would have been otherwise, but the switch isn't
free. The compiled binary gains roughly 1.5 MB in size -- likely due to
jackc's extensive type conversion system, which is unfortunate because
we're not using it. Further optimization could probably be done, at the
expense of having most of the DB logic go through the low-level APIs
rather than database/sql.

We make use of the new sql.Conn.Raw() method to easily drop down to the
lowest API level, so bump our minimum Go version to 1.13. (1.12 has been
EOL for about three years now.) This escaping fix is a breaking change
for anyone who may have already worked around this problem, so bump the
utility's version to 0.4.0.
@jchampio
Copy link
Contributor

Should be fixed by #63; please reopen and tag me if you find otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants