fix succeeds even though lint fails with PRS error #2345

juhoautio · 2022-01-18T22:28:41Z

Expected Behaviour

I assume fix is basically lint + fix (probably many do if they fix in a pre-commit, for example). If that assumption is reasonable, then fix -f should fail whenever lint would fail.

Observed Behaviour

It's possible that fix -f succeeds when fix doesn't find anything to do, even though there are errors that would cause lint to fail.

Steps to Reproduce

Edit: since #2438 the INTEGER is detected. Replace it with any non-existing pseudo type to reproduce this issue again.

lint_placeholders.sql:

SELECT my_col
FROM my_schema.my_table
WHERE processdate = CAST(
        event_date AS INTEGER
    )
    AND eventtype = &{event_type}

$ sqlfluff lint lint_placeholders.sql
== [lint_placeholders.sql] FAIL
L:   4 | P:  70 |  PRS | Line 4, Position 70: Found unparsable section: ' AS
| INTEGER'
All Finished 📜 🎉!

$ echo $?
65

That is as expected, because INTEGER is not listed as a keyword in Hive dialect at the moment.

$ sqlfluff fix -f lint_placeholders.sql
==== finding fixable violations ====
==== no fixable linting violations found ====
All Finished 📜 🎉!

$ echo $?
0

That is unexpected. I was expecting this to also fail and complain about the PRS error.

Dialect

Hive, but this issue doesn't seem to be dialect specific per se.

Version

sqlfluff, version 0.9.1

Python 3.7.5

(not using dbt)

Configuration

.sqlfluff:

[sqlfluff]
dialect = hive
templater = placeholder

[sqlfluff:templater:placeholder]
param_style = ampersand
event_type='my_event'

The text was updated successfully, but these errors were encountered:

juhoautio · 2022-02-05T13:45:14Z

The PRs #2424 and #2546 are somehow about specific error types. But is it now so that whenever lint would exit with non-zero exit code, also fix -f does?

tunetheweb · 2022-02-05T15:26:38Z

Apart from when fix actually fixes it (so lint would exit non-zero, but fix would exit 0) yes I would expect this to be the case.

But there are instances where this is not the case (as you've noted in #2134) and where fix takes a few iterations..

If using this as part of a pre-commit, I would recommend running fix first, and then lint. That should capture those instances.

Similarly, in this repo, our pre-commit runs black first (to fix all fixable things it can), and then flake8 (to identify extra issues black did not, or could not fix).

juhoautio · 2022-02-08T21:56:38Z

@tunetheweb To me the analogy of black + flake8 doesn't really work. Those tools may have quite different purposes, especially depending on the flake8 configuration.

I would rather like to see as analogous black --check <=> sqlfluff lint and black<=> sqlfluff fix -f.

It is counterintuitive that the same tool (sqlfluff) exits with a successful status code after fix -f but then fails when lint is run. I think it's a natural expectation that if fix -f exits as success, all possible violations have been fixed or there was nothing to fix in the first place.

It also sounds a bit inefficient to run two commands instead of one? It seems better if the tool itself handles the final validation, in the best way it sees fit, instead of eg. parsing everything again etc. as required by an extra round of lint on an entire project.

Is it worth creating a new issue about this in general: "fix succeeds even though lint fails"? Because this current issue kind of turned out to be treated as specific to PRS errors. A possible problem is that I'm not sure how to provide a test case that reproduces that situation since the PRS case was fixed, although based on what you wrote it sounds clear that it is still possible in some other situations.

tunetheweb · 2022-02-08T22:10:31Z

As I said in general I would expect it to be the case that fix -f returning zero would mean all items are fixed and so lint would return zero too.

I agree it is counter intuitive in the (hopefully small number now) of cases where this does not happen and it's usually due to incorrect fixes (e.g. #1304 where fix effectively puts the fix in the wrong section of the parse tree, and the subsequent iteration of fix therefore is not running on the correct parse tree and so misses errors, until the whole SQL is re-parsed from a completely fresh run).

I see you've raised #2584 which is probably a good idea to concentrate on that particular (non-parsing problem) example.

Ideally, we should correct this, but I'm not sure how easy that will be. So offered the suggestion to run fix and then lint to be sure until then.

juhoautio · 2022-02-08T22:23:20Z

Right, so the suggestion to run both fix & lint was offer more like as a temporary workaround.

I have said it before, but maybe it's still worth repeating: black works so that after it has finished its job it still runs one more extra round to check and make sure that there would be no further changes. If there would be still changes (which is always unexpected) -> return non-zero exit code and print a message asking to file a bug. I was hoping for sqlfluff to also have that level of guarantee.

Does it make sense that users have to run fix -f multiple times or run lint explicitly after it? Or should sqlfluff do that on behalf of the user whenever fix -f is run? Again, I'm happy to create a new issue about it if you see any hope for implementing that.

There is maybe one important aspect though: sql is complex, and that means that sqlfluff takes some time to execute. Maybe it is a good tradeoff that users are just running fix -f and assume that there's no need to run it again. Running another round of lint would be just waste of time, most of the time. In the rare case that that there's a bug and fix -f doesn't settle, users would catch that for example in their CI circle and be able to report the issue to sqlfluff any way. Although it's not as easy to realize that it is an sqlfluff bug, compared to getting an explicit error message like "fix didn't settle, please file a bug for sqlfluff..". Thoughts?

tunetheweb · 2022-02-08T22:36:10Z

Right, so the suggestion to run both fix & lint was offer more like as a temporary workaround.

It was offered as the only way to guarantee what you seek to guarantee.

I have said it before, but maybe it's still worth repeating: black works so that after it has finished its job it still runs one more extra round to check and make sure that there would be no further changes. If there would be still changes (which is always unexpected) -> return non-zero exit code and print a message asking to file a bug. I was hoping for sqlfluff to also have that level of guarantee.

SQLFluff runs up to 10 iterations to attempt to settle. It has been suggested in #1386 that we should alert when hitting that limit and I would very much be in favour of that.

In the example you've raised in #2584 upping the limit to 20 (and even 30) does not fix it in one run. So it's not a matter of "doing on more run", but the fact that, for some reason, fix is not able to complete this in one run (probably, because as I stated previously sometimes fix doesn't/can't apply fixes correctly.

Does it make sense that users have to run fix -f multiple times or run lint explicitly after it? Or should sqlfluff do that on behalf of the user whenever fix -f is run? Again, I'm happy to create a new issue about it if you see any hope for implementing that.

No it doesn't make sense, and I'd rather it didn't. As an open source project we're happy to accept a pull request over an issue to resolve this behaviour. What you seek is not unreasonable but unfortunately I don't have the answer you want. But if you know how to fix this to act like you would like it to, then we would be happy to accept that fix. I do see hope for implementing it, but cannot guarantee when (or even if) that will be.

There is maybe one important aspect though: sql is complex, and that means that sqlfluff takes some time to execute. Maybe it is a good tradeoff that users are just running fix -f and assume that there's no need to run it again. Running another round of lint would be just waste of time, most of the time. In the rare case that that there's a bug and fix -f doesn't settle, users would catch that for example in their CI circle and be able to report the issue to sqlfluff any way. Although it's not as easy to realize that it is an sqlfluff bug, compared to getting an explicit error message like "fix didn't settle, please file a bug for sqlfluff..". Thoughts?

I honestly think CI should lint, but not fix to catch this. Yes it would be more painful than not making it to CI but, as I have stated above, it's (hopefully!) rare enough and we haven't been able to identify and fix why this happens.

Pre-commits and the like should fix (and potentially lint as well as a double check). Unless you are doing a massive commit, or you have very large SQL files, that should be quick.

Ideally we could report when it's not settled but, as I have explained above, fix doesn't know it has not settled in the examples you give. It's a bug. It should be fixed. I don't know if/when it will be. Happy to take the fix if you have it.

barrywhart · 2022-02-08T22:41:09Z

Note that there are issues SQLFluff can detect but not fix.

juhoautio · 2022-02-08T22:44:08Z

In the example you've raised in #2584 upping the limit to 20 (and even 30) does not fix it in one run. So it's not a matter of "doing on more run", but the fact that, for some reason, fix is not able to complete this in one run (probably, because as I stated previously sometimes fix doesn't/can't apply fixes correctly.

True, another iteration in fix -f would not catch it. But internally sqlfluff fix -f could also run sqlfluff lint in the end, so that an error could be raised if anything remains.

tunetheweb · 2022-02-08T22:49:30Z

Note that there are issues SQLFluff can detect but not fix.

It does look like fix exits with non-zero error for linting rules that are not sqlfluff fix compatible.

For example:

SELECT a +
    b
FROM foo

Both lint and fix exit with non-zero codes (though different ones!).

% sqlfluff lint test.sql  
== [test.sql] FAIL                                                                                                                                                                 
L:   1 | P:  10 | L007 | Operators near newlines should be after, not before the
                       | newline
All Finished 📜 🎉!
% echo $?               
65
% sqlfluff fix -f test.sql
==== finding fixable violations ====
==== no fixable linting violations found ====                                                                                                                                      
All Finished 📜 🎉!
  [1 unfixable linting violations found]
% echo $?
1

tunetheweb · 2022-02-08T22:51:14Z

True, another iteration in fix -f would not catch it. But internally sqlfluff fix -f could also run sqlfluff lint in the end, so that an error could be raised if anything remains.

It could. Personally I think re-parsing between each fix iteration as suggest in #1304 (comment) would likely resolve most of the issues, but happy to do above too. But I had a quick look at doing that and couldn't get it working.

barrywhart · 2022-02-08T22:53:15Z

This discussion seems to have become very philosophical -- is there a particular case we have in mind where SQLFluff should do something different? If so, can we create a new issue about that. If not, I propose we table the discussion, at least here. The issue is already closed. 🙂

juhoautio added the bug Something isn't working label Jan 18, 2022

juhoautio mentioned this issue Jan 18, 2022

Hive does not recognise INTEGER type #2296

Closed

tunetheweb mentioned this issue Jan 20, 2022

begin/end not indented correctly #2360

Closed

This was referenced Jan 22, 2022

"sqlfluff fix" should report any parse errors found #2423

Merged

Improve how "sqlfluff fix" handles parse errors #2424

Closed

tunetheweb added the core Issues relating to core design or architecture of SQLFluff label Jan 26, 2022

barrywhart self-assigned this Feb 1, 2022

barrywhart mentioned this issue Feb 3, 2022

Improve how "sqlfluff fix" handles templating and parse errors #2546

Merged

6 tasks

barrywhart closed this as completed in #2546 Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix succeeds even though lint fails with PRS error #2345

fix succeeds even though lint fails with PRS error #2345

juhoautio commented Jan 18, 2022 •

edited

juhoautio commented Feb 5, 2022

tunetheweb commented Feb 5, 2022

juhoautio commented Feb 8, 2022

tunetheweb commented Feb 8, 2022 •

edited

juhoautio commented Feb 8, 2022

tunetheweb commented Feb 8, 2022

barrywhart commented Feb 8, 2022

juhoautio commented Feb 8, 2022

tunetheweb commented Feb 8, 2022

tunetheweb commented Feb 8, 2022

barrywhart commented Feb 8, 2022

fix succeeds even though lint fails with PRS error #2345

fix succeeds even though lint fails with PRS error #2345

Comments

juhoautio commented Jan 18, 2022 • edited

Expected Behaviour

Observed Behaviour

Steps to Reproduce

Dialect

Version

Configuration

juhoautio commented Feb 5, 2022

tunetheweb commented Feb 5, 2022

juhoautio commented Feb 8, 2022

tunetheweb commented Feb 8, 2022 • edited

juhoautio commented Feb 8, 2022

tunetheweb commented Feb 8, 2022

barrywhart commented Feb 8, 2022

juhoautio commented Feb 8, 2022

tunetheweb commented Feb 8, 2022

tunetheweb commented Feb 8, 2022

barrywhart commented Feb 8, 2022

juhoautio commented Jan 18, 2022 •

edited

tunetheweb commented Feb 8, 2022 •

edited