Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Support VALUES query #1013

Open
jonmmease opened this issue Jan 27, 2023 · 2 comments · May be fixed by #1038
Open

[ENH] Support VALUES query #1013

jonmmease opened this issue Jan 27, 2023 · 2 comments · May be fixed by #1038
Labels
enhancement New feature or request python Affects Python API

Comments

@jonmmease
Copy link

Is your feature request related to a problem? Please describe.
Mostly for testing purposes, it would be great if dask-sql would support creating tables of inline data using the VALUES keyword.

Describe the solution you'd like
In many SQL dialects (including DataFusion, Postgres, and DuckDB), it's possible to construct tables from literal values using the VALUES keyword. See https://www.postgresql.org/docs/current/queries-values.html.

For example:

SELECT * FROM (VALUES (1, 2), (1, 3)) as tbl(column1, column2)

In the DataFusion CLI, this evaluates to

+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | 2       |
| 1       | 3       |
+---------+---------+

This isn't currently supported in dask-sql. For example:

from dask_sql import Context
c = Context()
result = c.sql(r"""
SELECT * FROM (VALUES (1, 2), (1, 3)) as tbl(column1, column2)
""")
...
NotImplementedError: No relational conversion for node type Values available (yet).

Describe alternatives you've considered
None

Additional context
I'm in the early stages of adding SQL support to VegaFusion, and I'd like to test SQL dialect generation using self-contained queries that include small inline datasets.

@jonmmease jonmmease added enhancement New feature or request needs triage Awaiting triage by a dask-sql maintainer labels Jan 27, 2023
@ayushdg
Copy link
Collaborator

ayushdg commented Jan 31, 2023

Thanks for raising this issue. As mentioned the datafusion planner supports these kind of nodes and it shouldn't be too hard to add an implementation on the dask-sql side.
Is your current plan to use these queries with cpu backed dask dataframes or gpu backed dask-cudf dataframes as well?

The primary reason I'm asking is because it's easier to default to creating cpu backed dask dataframes by default, since we don't have a good api today to allow users specifying gpu tables for inline cases like these.

@ayushdg ayushdg added python Affects Python API and removed needs triage Awaiting triage by a dask-sql maintainer labels Jan 31, 2023
@jonmmease
Copy link
Author

For my purposes the CPU backend would be preferable, and since these would necessarily be small datasets, my guess is that this is probably appropriate in general.

@ayushdg ayushdg linked a pull request Feb 9, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python Affects Python API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants