Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSQL linting fix: L019+L034 breaks the code #1734

Closed
jpers36 opened this issue Oct 22, 2021 · 7 comments · Fixed by #1803
Closed

TSQL linting fix: L019+L034 breaks the code #1734

jpers36 opened this issue Oct 22, 2021 · 7 comments · Fixed by #1803
Assignees
Labels
bug Something isn't working fix corrupts sql For identifying issues where running the fix command actually breaks the SQL - bad! rule bug A rule is not working as intended, either missing errors or incorrectly highlighting non-errors t-sql Issues related to the T-SQL/TSQL/Transact SQL dialect

Comments

@jpers36
Copy link
Contributor

jpers36 commented Oct 22, 2021

The following code breaks when fixed:

    SELECT
      [ID]
    , CASE
        WHEN [ID] = 1 THEN NULL
        ELSE 1
        END AS [CaseOutput]
    , [DataDate]
    FROM #temp1

Expected Behaviour

Workable code.

Observed Behaviour

SELECT
    [ID],
    [DataDate]
    CASE
        WHEN [ID] = 1 THEN NULL
        ELSE 1
    END AS [CaseOutput],
FROM #temp1

Steps to Reproduce

Run fix on the above

Dialect

TSQL

Version

Include the output of sqlfluff --version along with your Python version

most recent main
Python 3.9.7

Configuration

Include your SQLFluff configuration here
@jpers36 jpers36 added the bug Something isn't working label Oct 22, 2021
@tunetheweb tunetheweb added fix corrupts sql For identifying issues where running the fix command actually breaks the SQL - bad! rule bug A rule is not working as intended, either missing errors or incorrectly highlighting non-errors t-sql Issues related to the T-SQL/TSQL/Transact SQL dialect labels Oct 22, 2021
@barrywhart
Copy link
Member

Note: The output SQL is broken because it's missing the comma after [DataDate].

@jpers36
Copy link
Contributor Author

jpers36 commented Nov 1, 2021

And there's a comma after CaseOutput, which breaks the SQL for TSQL at least.

@barrywhart
Copy link
Member

I did some preliminary investigation. I think L034 is running after L019's edits have already been applied, so when it attempts to reorder the columns, it's inadvertently moving the updated commas as well. You see see this in the L034 "edit" actions below. If I run L034 by itself (with L019 disabled), commas are not included in the edits.

INFO       0 -> 'SELECT\n  [ID]\n, CASE\n    WHEN [ID] = 1 THEN NULL\n    ELSE 1\n    END AS [CaseOutput]\n, [DataDate]\nFROM #temp1\n'
DEBUG      [L019] !! Violation Found: 'Found leading comma. Expected only trailing.'
DEBUG      [L019] !! Fix Proposed: <LintFix: delete @[L:  3, P:  1] delete:','>
DEBUG      [L019] !! Fix Proposed: <LintFix: delete @[L:  3, P:  2] delete:' '>
DEBUG      [L019] !! Fix Proposed: <LintFix: edit @[L:  2, P:  3] edt:'[ID]'->'[ID],'>
DEBUG      Developer Note: Edit segment found with preset position marker. These should be unset and calculated later.
DEBUG      Developer Note: Edit segment found with preset position marker. These should be unset and calculated later.
DEBUG      [L019] !! Violation Found: 'Found leading comma. Expected only trailing.'
DEBUG      [L019] !! Fix Proposed: <LintFix: delete @[L:  7, P:  1] delete:','>
DEBUG      [L019] !! Fix Proposed: <LintFix: delete @[L:  7, P:  2] delete:' '>
DEBUG      [L019] !! Fix Proposed: <LintFix: edit @[L:  6, P: 12] edt:'[CaseOutput]'->'[CaseOutput],'>
INFO       Applying Fixes [L019]: [<LintFix: delete @[L:  3, P:  1] delete:','>, <LintFix: delete @[L:  3, P:  2] delete:' '>, <LintFix: edit @[L:  2, P:  3] edt:'[ID]'->'[ID],'>, <LintFix: delete @[L:  7, P:  1] delete:','>, <LintFix: delete @[L:  7, P:  2] delete:' '>, <LintFix: edit @[L:  6, P: 12] edt:'[CaseOutput]'->'[CaseOutput],'>]
DEBUG      Matched fix against segment: <LintFix: delete @[L:  3, P:  1] delete:','> -> <SymbolSegment: ([L:  3, P:  1]) ','>
DEBUG      Matched fix against segment: <LintFix: delete @[L:  3, P:  2] delete:' '> -> <WhitespaceSegment: ([L:  3, P:  2]) ' '>
DEBUG      Matched fix against segment: <LintFix: delete @[L:  7, P:  1] delete:','> -> <SymbolSegment: ([L:  7, P:  1]) ','>
DEBUG      Matched fix against segment: <LintFix: delete @[L:  7, P:  2] delete:' '> -> <WhitespaceSegment: ([L:  7, P:  2]) ' '>
DEBUG      Matched fix against segment: <LintFix: edit @[L:  2, P:  3] edt:'[ID]'->'[ID],'> -> <CodeSegment: ([L:  2, P:  3]) '[ID]'>
DEBUG      Matched fix against segment: <LintFix: edit @[L:  6, P: 12] edt:'[CaseOutput]'->'[CaseOutput],'> -> <CodeSegment: ([L:  6, P: 12]) '[CaseOutput]'>
DEBUG      Developer Note: Edit segment found with preset position marker. These should be unset and calculated later.
DEBUG      Developer Note: Edit segment found with preset position marker. These should be unset and calculated later.
DEBUG      [L034] !! Violation Found: 'Use wildcards then simple targets before calculations and aggregates in select statements.'
DEBUG      [L034] !! Fix Proposed: <LintFix: edit @[L:  3, P:  3] edt:'CASE\n    WHEN [ID] = 1 THEN NULL\n    ELSE 1\n    END AS [CaseOutput],'->'[DataDate]'>
DEBUG      [L034] !! Fix Proposed: <LintFix: edit @[L:  7, P:  3] edt:'[DataDate]'->'CASE\n    WHEN [ID] = 1 THEN NULL\n    ELSE 1\n    END AS [CaseOutput],'>
INFO       Applying Fixes [L034]: [<LintFix: edit @[L:  3, P:  3] edt:'CASE\n    WHEN [ID] = 1 THEN NULL\n    ELSE 1\n    END AS [CaseOutput],'->'[DataDate]'>, <LintFix: edit @[L:  7, P:  3] edt:'[DataDate]'->'CASE\n    WHEN [ID] = 1 THEN NULL\n    ELSE 1\n    END AS [CaseOutput],'>]
DEBUG      Matched fix against segment: <LintFix: edit @[L:  3, P:  3] edt:'CASE\n    WHEN [ID] = 1 THEN NULL\n    ELSE 1\n    END AS [CaseOutput],'->'[DataDate]'> -> <SelectClauseElementSegment: ([L:  3, P:  3])>
DEBUG      Matched fix against segment: <LintFix: edit @[L:  7, P:  3] edt:'[DataDate]'->'CASE\n    WHEN [ID] = 1 THEN NULL\n    ELSE 1\n    END AS [CaseOutput],'> -> <SelectClauseElementSegment: ([L:  7, P:  3])>
INFO       Fix loop complete. Stability achieved after 1/10 loops.

To fix this, we first need to determine where the bug is (and this could be a bit of a philosophical question):

  • A bug in L034 (for not handling the new comma placement correctly)
  • A bug in L019 (e.g. is it doing something wrong when moving the commas?)
  • A bug in the core lint/fix engine (e.g. should it prevent multiple rules editing the same area of code in the same "linting round? If so, the same fix could still be generated and applied in a future round, separate from the round that made the L019 fixes)

@barrywhart
Copy link
Member

I reproduced this bug in the ANSI dialect, so it's not specific to TSQL:

SELECT
  ID
, CASE
    WHEN ID = 1 THEN NULL
    ELSE 1
    END AS CaseOutput
, DataDate
FROM temp1

becomes

SELECT
  ID,
DataDate
CASE
    WHEN ID = 1 THEN NULL
    ELSE 1
    END AS CaseOutput,
FROM temp1

@barrywhart
Copy link
Member

It also happens if I replace the CASE statement with a COALESCE() function call:

SELECT
  ID
, COALESCE(a, 1) AS CoalesceOutput
, DataDate
FROM temp1

becomes

SELECT
  ID,
DataDate
COALESCE(a, 1) AS CoalesceOutput,
FROM temp1

Not surprising, but I wanted to verify it happened with a syntactically simpler expression (function call vs CASE).

@barrywhart
Copy link
Member

barrywhart commented Nov 1, 2021

The bug is in L019. That rule appears to be literally editing the raw content of the SelectClauseElementSegment to add a comma. Correct behavior would be to insert a separate comma segment (an instance of SymbolSegment). Since L019 makes the comma "part of" the column, L034 moves the comma along with the column text. 🤢

For example, here's the ID column as L034 sees it (L019 has already run). Note the comma.

(Pdb) select_target_elements[0].segments[0].raw
'ID,'

If I run L034 with L019 disabled, the select_target_elements don't contain commas:

(Pdb) select_target_elements[0].segments[0].raw
'ID'
(Pdb) pp [s.raw for s in select_target_elements]
['ID', 'COALESCE(a, 1) AS CoalesceOutput', 'DataDate']

Instead, the commas are "siblings" of the SelectClauseElementSegments, represented as separate SymbolSegment objects:

(Pdb) pp context.segment.segments
(<KeywordSegment: ([L:  1, P:  1]) 'SELECT'>,
 <Indent: ([L:  1, P:  7]) ''>,
 <NewlineSegment: ([L:  1, P:  7]) '\n'>,
 <WhitespaceSegment: ([L:  2, P:  1]) '  '>,
 <SelectClauseElementSegment: ([L:  2, P:  3])>,
 <NewlineSegment: ([L:  2, P:  5]) '\n'>,
 <SymbolSegment: ([L:  3, P:  1]) ','>,
 <WhitespaceSegment: ([L:  3, P:  2]) ' '>,
 <SelectClauseElementSegment: ([L:  3, P:  3])>,
 <NewlineSegment: ([L:  3, P: 35]) '\n'>,
 <SymbolSegment: ([L:  4, P:  1]) ','>,
 <WhitespaceSegment: ([L:  4, P:  2]) ' '>,
 <SelectClauseElementSegment: ([L:  4, P:  3])>)

@barrywhart barrywhart self-assigned this Nov 1, 2021
@barrywhart
Copy link
Member

Correction: L019 is creating a new SelectClauseElementSegment with two child segments:

(Pdb) !r.raw
'ID,'
(Pdb) !r.segments
(<CodeSegment: ([L:  2, P:  3]) 'ID'>, <SymbolSegment: ([L:  2, P:  3]) ','>)

This happens in BaseSegment.apply_fixes(). It seems that any "edit" fix replaces the specified segment with a new segment of the same type and the specified segments as children. But the comma should not be a child, it should be a sibling. Perhaps using create rather than edit will work?

This is similar to other subtle bugs we've seen previously: a rule makes a change that looks correct if you simply render the raw segment strings, but they "corrupt" the parse tree, leaving it in a state that is not possible or correct with respect to the dialect. Ideally, SQLFluff would detect this and either flag it as an internal error (rule bug) or find a way to "make it right", i.e. find another way of applying the edit that respects the dialect. See #1304 for more detail. @alanmcruickshank: FYI.

For now, I'll try and fix this by changing L019, as described above. We may end up needing to make this deeper enhancement at some point -- it's not yet clear to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fix corrupts sql For identifying issues where running the fix command actually breaks the SQL - bad! rule bug A rule is not working as intended, either missing errors or incorrectly highlighting non-errors t-sql Issues related to the T-SQL/TSQL/Transact SQL dialect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants