Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3_column_text may return non-utf-8, rusqlite does not allow to handle it #548

Closed
hpk42 opened this issue Jul 20, 2019 · 4 comments
Closed

Comments

@hpk42
Copy link

hpk42 commented Jul 20, 2019

we got a real-life case where the lines at https://github.com/jgallagher/rusqlite/blob/master/src/statement.rs#L625 cause a panic because sqlite3_column_text can indeed contain non-utf8. FWIW the sqlite3 docs https://www.sqlite.org/version3.html say this:

SQLite is not particular about the text it receives and is more than happy to process text strings that are not normalized or even well-formed UTF-8 or UTF-16. Thus, programmers who want to store IS08859 data can do so using the UTF-8 interfaces. As long as no attempts are made to use a UTF-16 collating sequence or SQL function, the byte sequence of the text will not be modified in any way.

In our case we are reading from a database that was created/populated by a non-rust program so we can't change the fact that a text column contains non-utf8. We'd like to handle the invalid case ourselves -- could the API be amended? any recommendations on how we could work around the issue?

@dignifiedquire
Copy link

According to https://www.sqlite.org/capi3ref.html#sqlite3_column_blob sqlite3_column_text does return utf-8 strings

@dignifiedquire
Copy link

but it might be better to change the code slightly according to the above documentation

The safest policy is to invoke these routines in one of the following ways:

sqlite3_column_text() followed by sqlite3_column_bytes()
sqlite3_column_blob() followed by sqlite3_column_bytes()
sqlite3_column_text16() followed by sqlite3_column_bytes16()

@thomcc
Copy link
Member

thomcc commented Jul 22, 2019

This might be a case where using something like https://github.com/BurntSushi/bstr rather than str is appropriate. (Once it's API is stable enough for that purpose, at least -- at the moment it's recommended against exposing publicly)

In the short term &[u8] might be the right call.

@gwenn
Copy link
Collaborator

gwenn commented Jul 27, 2019

Should be fixed by release 0.20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants