Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(databricks): add initial databricks syntax #698

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sjrusso8
Copy link

@sjrusso8 sjrusso8 commented Jan 2, 2023

@andialbrecht & @mrmasterplan update my initial PR with the lexer changes. See below!

This PR will add frequently used Databricks and Delta table syntax. Databricks SQL has a lot of special operations to work with Delta tables which means a lot of new keywords.

Here is an example of standard operations of Databricks SQL for a created Delta table.

CREATE TABLE IF NOT EXISTS default.event 
(
    id INT, 
    name STRING, 
    description VARCHAR(30)
)
USING delta
LOCATION '/mnt/data/location'
PARTITIONED BY (id)
COMMENT 'this is a comment'
TBLPROPERTIES (
    'foo'='bar',
    delta.autoOptimize.optimizeWrite = true, 
    delta.autoOptimize.autoCompact = true
);

OPTIMIZE event 
WHERE date >= current_timestamp() - INTERVAL 1 day 
ZORDER BY (id);

VACUUM event;

CREATE BLOOMFILTER INDEX ON TABLE event 
FOR COLUMNS(description OPTIONS (fpp=0.1, numItems=50000000));

CREATE TABLE default.event_clone SHALLOW CLONE default.event;

DESCRIBE HISTORY event;

DESCRIBE TABLE EXTENDED event;

SHOW DETAIL event;

MSCK REPAIR TABLE event SYNC METADATA;

REFRESH TABLE event;

Then operating on those statements should parse out additional keywords like below.

statements = sqlparse.parse(sql)

for statement in statements:
    result = [v.value for v in sqlparse.sql.IdentifierList(statement.tokens).get_identifiers() if v.is_keyword]

    print(result)

>>> ['CREATE', 'TABLE', 'IF', 'NOT', 'EXISTS', 'USING', 'LOCATION', 'PARTITIONED BY', 'COMMENT', 'TBLPROPERTIES']
>>> ['OPTIMIZE', 'ZORDER BY']
>>> ['VACUUM']
>>> ['CREATE', 'BLOOMFILTER INDEX', 'ON', 'TABLE', 'FOR']
>>> ['CREATE', 'TABLE', 'SHALLOW CLONE']
>>> ['DESCRIBE', 'HISTORY']
>>> ['DESCRIBE', 'TABLE', 'EXTENDED']
>>> ['SHOW', 'DETAIL']
>>> ['MSCK REPAIR', 'TABLE', 'SYNC', 'METADATA']
>>> ['REFRESH', 'TABLE']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant