Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create external table inconsistencies #2768

Open
universalmind303 opened this issue Mar 8, 2024 · 1 comment
Open

create external table inconsistencies #2768

universalmind303 opened this issue Mar 8, 2024 · 1 comment
Labels
syntax Issues or features related to our SQL syntax ux 💎 Convenience label for grouping UX issues

Comments

@universalmind303
Copy link
Contributor

Description

there are some inconsistencies when using table options, especially in combination with create external table.

For many of them, they use the same object_store options, which allows for some rather inconsistent syntax

create external table test from bson
options (
  location = './external_schema.bson',
  file_type = 'bson'
)

create external table multi_report from excel 
options (
  location='./testdata/xlsx/multiple_sheets.xlsx', 
  file_type = 'xlsx', 
  sheet_name = 'other', 
  has_header = false
);

Additionally, the syntax can often be reversed for object stores (which is confusing)

create external table delta_azure_opts
from delta
options (
  location='azure://glaredb-test/ext-table*'
  account_name= 'AZURE_ACCOUNT',
  access_key='AZURE_ACCESS_KEY'
);

create external table ext_table_1 
from azure
credentials azure_creds 
options (
  location='azure://glaredb-test/ext-table*'
);

I don't think that both syntaxes should be supported, and i think we should drop support for one of them

Suggestion

  • Drop support for create external table t from <file_type> options (location = <object_store>)
  • make create external table t from <object_store> options (file_type = <file_type>) more explicit about only allowing object stores and file types. [1]

for example from delta|lance|iceberg options (...) should not use the object store options, but it's own dedicated options, as they are their own storage formats, and don't integrate into object store nicely like file formats do. It also makes the distinction between the two a lot clearer.

Additionally, create external table t from <file_type> should probably never be valid. If you want to do that, you should probably be using the table function, or the inference from './path/to/file.<file_type>' instead.


For brevity, for the remainder of the text, from ... will refer to create external table t from .....

(create external table t from t -> from t)

in the above examples, I suggest they be rewritten to use from local instead of from <file_type | table_type>.

  • from bson options (location=location...) -> from local options (location=location, file_type = 'bson')
  • from azure credentials azure_creds options (location='azure://'); -> from delta options(location='azure://') credentials azure_creds

So just to recap, all File types (bson, json, csv, parquet, avro, ...) should use the object store as the from qualifier

from <local | s3 | azure | gcs> options (object_store_opts)

  • from local options (file_type = 'csv', ...object_store_opts);
  • from s3 options (file_type = 'bson', ...object_store_opts);
  • from azure options (file_type = 'parquet', ...object_store_opts);

all storage formats should use the storage format as the from qualifier (delta, lance, iceberg)

from <delta | lance | iceberg | excel | ...> options (T_opts)

  • from delta options (location = 's3://', ...other delta specific options)
  • from excel options (location = 's3://', ...other excel specific options)

[1] I define a file format as a format that is used to represent a single table usually in a single file. (there is no notion of a table, and they are generally quite portable (csv, json, parquet, ...)).

A table format (AKA storage format) is any format that supports multiple tables (delta, lance, excel, iceberg).

@universalmind303 universalmind303 added the bug 🐛 Something isn't working label Mar 8, 2024
@tychoish
Copy link
Collaborator

I pretty strongly believe that we should hide FROM <storage location/service> from users (because that's all information that we can (and do, in many cases) derive from the path, and using FROM exclusively for "format" rather than "location" simplifies this. The inverse (location and not format) would mean that we would either need to deduce format, or add additional syntax.

@tychoish tychoish added syntax Issues or features related to our SQL syntax ux 💎 Convenience label for grouping UX issues and removed bug 🐛 Something isn't working labels Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
syntax Issues or features related to our SQL syntax ux 💎 Convenience label for grouping UX issues
Projects
None yet
Development

No branches or pull requests

2 participants