Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support SQLite's dynamic types #2728

Open
cmoog opened this issue Mar 2, 2024 · 2 comments
Open

support SQLite's dynamic types #2728

cmoog opened this issue Mar 2, 2024 · 2 comments
Labels
feat 🎇 New feature or request

Comments

@cmoog
Copy link
Contributor

cmoog commented Mar 2, 2024

SQLite essentially has dynamic typing. More on that here: https://www.sqlite.org/datatype3.html. Perhaps there is a solution where GlareDB can handle these cases more gracefully? At present, any value that differs from the column type affinity throws an error.

$ sqlite3 test.db "create table test (a integer); insert into test (a) values ('a string')"
$ glaredb --query "select * from read_sqlite('./test.db', 'test')"
Error: External error: Execution error: Cannot convert Text("a string") to Int64

Caused by:
    Execution error: Cannot convert Text("a string") to Int64
@cmoog cmoog added the bug 🐛 Something isn't working label Mar 2, 2024
@tychoish
Copy link
Collaborator

tychoish commented Mar 4, 2024

Thanks for the report!

We were definitely aware of this during development, and opted for the most strict solution possible to make is easier to relax semantics in the future if we needed to, and also to figure out what the right solution would be here.

Unfortunately, for DataFusion-reasons, we have to know the schema of a table before we begin to operate on it, and so the operations are to:

  • trust the underlying storage's types.
  • read some (or all) of the data, determine the schema, and then read the data again into that table view.
  • allow the user to specify types/columns in GlareDB (and cast the data as possible to that).

For relational and strongly typed systems (Parquet, Lance, Postgres, MySQL, etc.) and including SQLite, we take the first option. For CSV, JSON, MongoDB and BSON, we take the second option (usually not the full data, but a sample, or the first n rows). The third option is something I/we are working on in #2333. and I think has merit but is a more medium term solution.

I think the behavior we currently have is a good default, though I'm definitely open to providing alternate type-casting semantics (and am open to suggestions both about the default, and the kinds of logic that we use to type cast.) Things that we could do:

  • respect the types in the "first" row rather than the schema
  • have a type hierarchy (e.g. bool -> numbers -> strings -> binary) and downcast to the lowest common denominator (we'd need to sample or read a prefix.)
  • ignore/null values that don't match the type.

There are pros and cons to each, but I think having the strict-mode be the default, and relaxing the constraints optionally has some benefits.

While I have you here, I'd also love to know what the use case/situation you have where you have SQLite tables with different types for the columns and values? I've probably been working in strongly/statically typed languages for long enough that I underestimated the prefelance of dynamic type use in SQLite...

@cmoog
Copy link
Contributor Author

cmoog commented Mar 7, 2024

Unfortunately, for DataFusion-reasons, we have to know the schema of a table before we begin to operate on it
...I think having the strict-mode be the default, and relaxing the constraints optionally has some benefits.

Ah ok, I see. Makes sense.

@tychoish tychoish added feat 🎇 New feature or request and removed bug 🐛 Something isn't working labels Mar 7, 2024
@tychoish tychoish changed the title SQLite type handling support SQLite's dynamic types Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat 🎇 New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants