Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User guide is incorrect regarding using CLI to register CSV files using schema inference #3001

Closed
andygrove opened this issue Aug 1, 2022 · 4 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@andygrove
Copy link
Member

Describe the bug

The user guide page at https://arrow.apache.org/datafusion/cli/index.html states that "It is necessary to provide schema information for CSV files since DataFusion does not automatically infer the schema when using SQL to query CSV files." but this is not true, as demonstrated below:

DataFusion CLI v10.0.0
❯ create external table a stored as csv with header row location '/tmp/a.csv';
0 rows in set. Query took 0.017 seconds.
❯ select * from a;
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
| 1 | 2 | 3 | 4 |
+---+---+---+---+
1 row in set. Query took 0.011 seconds.
❯ describe a;
+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| a           | Int64     | NO          |
| b           | Int64     | NO          |
| c           | Int64     | NO          |
| d           | Int64     | NO          |
+-------------+-----------+-------------+
4 rows in set. Query took 0.017 seconds.

To Reproduce
See above.

Expected behavior
We should update the user guide to state that specifying a schema is optional.

Additional context
None

@andygrove andygrove added bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers labels Aug 1, 2022
@kmitchener
Copy link
Contributor

I have a PR coming that will fix this and a bunch of other issues with the docs

kmitchener added a commit to kmitchener/arrow-datafusion that referenced this issue Aug 1, 2022
fix some links and docs
fix docker build for datafusion-cli and update docs
improve left nav link naming for clarity
consolidated the documentation for the CLI into one page per issue apache#1352
fix the CSV schema inference in datafusion-cli docs per apache#3001
@retikulum
Copy link
Contributor

Hi. I think this issue is fixed (I guess on #3171) but not understand why it is still open? https://arrow.apache.org/datafusion/user-guide/sql/ddl.html

image

@retikulum
Copy link
Contributor

retikulum commented Oct 28, 2022

Did you have a chance to take a look? @andygrove

@Dandandan
Copy link
Contributor

Thanks @retikulum for noticing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants