bytea fields are wrongly handled #61

lservini · 2022-05-23T10:53:09Z

From case 8581,

However, we discovered a bug, which prevents us from using it. The bug is related to incorrect values inserted into
bytea type columns. We have "hash" column in transactions table with a bytea type. I created a CSV export from 
Postgres DB. For instance in Postgres hash is "\x160f2b057728502c17b0f7d5883a94b2dcc5c8be013cf3b81c0fe6ddd6b77050",
but when it is inserted into hypertable using timescaledb-parallel-copy utility, actually what is inserted is 
hex(hash). So, after the import with the utility, I see in hypertable
`\x5c7831363066326230353737323835303263313762306637643538383361393462326463633563386265303133636633623831633066653664646436623737303530` 
which actually is hex('\x160f2b057728502c17b0f7d5883a94b2dcc5c8be013cf3b81c0fe6ddd6b77050').

The text was updated successfully, but these errors were encountered:

jchampio · 2022-06-01T23:09:45Z

Looks like lib/pq's COPY support is limited to text input, which it always escapes. There was a raw CopyData API added by Cockroach but no tests, so I don't really know how public clients would actually use it. lib/pq is in maintenance mode and doesn't appear to be very healthy. 😬

lib/pq has been in maintenance mode for a while, and issue timescale#61 appears to have run into one of its idiosyncrasies: its COPY implementation assumes that you're using a query generated via pq.CopyIn(), which uses the default TEXT format, so it runs all of the incoming data through an additional escaping layer. Our code uses CSV by default (and there appears to be no way to use TEXT format, since we're using the old COPY syntax), which means that incoming CSV containing its own escapes will be double-escaped and corrupted. This is most visible with bytea columns, but the tests currently document additional problems with tab and backslash characters, and there are probably other problematic cases too. To fix, switch from lib/pq over to jackc/pgx, and reimplement db.CopyFromLines() using the PgConn.CopyFrom() API. We were already depending on a part of this library before, so the new dependency isn't as big of a change as it would have been otherwise, but the switch isn't free. The compiled binary gains roughly 1.5 MB in size -- likely due to jackc's extensive type conversion system, which is unfortunate because we're not using it. Further optimization could probably be done, at the expense of having most of the DB logic go through the low-level APIs rather than database/sql. We make use of the new sql.Conn.Raw() method to easily drop down to the lowest API level, so bump our minimum Go version to 1.13. (1.12 has been EOL for about three years now.) This escaping fix is a breaking change for anyone who may have already worked around this problem, so bump the utility's version to 0.4.0.

jchampio · 2022-06-17T15:49:24Z

Should be fixed by #63; please reopen and tag me if you find otherwise.

lservini added the bug label May 23, 2022

jchampio mentioned this issue Jun 9, 2022

Fix multiple issues by switching database drivers #63

Merged

jchampio closed this as completed Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytea fields are wrongly handled #61

bytea fields are wrongly handled #61

lservini commented May 23, 2022

jchampio commented Jun 1, 2022

jchampio commented Jun 17, 2022

bytea fields are wrongly handled #61

bytea fields are wrongly handled #61

Comments

lservini commented May 23, 2022

jchampio commented Jun 1, 2022

jchampio commented Jun 17, 2022