Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScriptUtils parsing failed when using field name with 'end' keyword #1547

Closed
skyline75489 opened this issue Jun 13, 2019 · 8 comments · Fixed by #1627
Closed

ScriptUtils parsing failed when using field name with 'end' keyword #1547

skyline75489 opened this issue Jun 13, 2019 · 8 comments · Fixed by #1627

Comments

@skyline75489
Copy link

I'm using runInitScript and it seems to falsely parse the sql:

create database if not exists ttt;

use ttt;

create table aaa
(
    id                  bigint auto_increment   primary key,
    end_time            datetime     null       COMMENT 'end_time',
    data_status         varchar(16)  not null
) comment 'aaa';

create table bbb
(
    id                  bigint auto_increment   primary key
) comment 'bbb';

I've stepped into the code and found that splitSqlScript in ScriptUtils thinks there are 3 statements: create database, use and combined 2 create table. The last statement, of course, throws sql syntax error.

@skyline75489
Copy link
Author

I've digged a little deeper and figured that the key is naming. There's field named end_time. This field name is recognized as END in PROCEDURE, which ruins the parsing process.

@skyline75489 skyline75489 changed the title ScriptUtils parsing failed when using comment ScriptUtils parsing failed when using field name with 'end' keyword Jun 13, 2019
@skyline75489
Copy link
Author

If we modify the splittable.sql used in ScriptUtilsTest, adding this name:

CREATE TABLE bar (
  end_time VARCHAR(255)
);

The test is broken.

Seems to me the ScriptUtils is only roughly tested. I wonder why isn't a more robust solution is used, such as a SQL syntax parser.

@kiview
Copy link
Member

kiview commented Jun 24, 2019

Hey @skyline75489, thanks a lot for digging into this. There are indeed different quirks like this in ScriptUtils.

Can you please point use to a more robust solution we could use if you know one that supports different vendor specific SQL?

The ScriptUtils class we use is taken from the Spring-JDBC project and is used thereby implicitly in thousands of projects. I would therefore refrain from calling it roughly tested, but of course we would be happy if you could point us to a better solution.

@skyline75489
Copy link
Author

I agree ScriptUtils is widely used. But I doubt if it is as robust as dedicated parsers used in druid or something similar.

However, for testcontainer maybe ScriptUtils is enough, though. We are not trying to build some connection pool or fully-function JDBC library, after all. We just need to run some sql scripts. This commit
1d03cba#diff-5023355fa84dc1f1c805f42aafe61a8e introduced compound statements support and also this parsing bug. I think we can figure out a way to fix this without replacing ScriptUtils entirely.

@ppfeiler
Copy link
Contributor

ppfeiler commented Jun 25, 2019

I analysed the problem and the following two ifs are the problem:

if (!inLiteral && !inComment && containsSubstringAtOffset(lowerCaseScriptContent, "BEGIN", i)) {
	compoundStatementDepth++;
}
if (!inLiteral && !inComment && containsSubstringAtOffset(lowerCaseScriptContent, "END", i) && compoundStatementDepth > 0) {
	compoundStatementDepth--;
}
final boolean inCompoundStatement = compoundStatementDepth != 0;

Inside the method containsSubstringAtOffset it just checks the substring with startsWith. So also fields where the name contains the word "begin" have problems.

I will create a pull request with the fix when it is ready. I have to find a better way to check if its the beginning or end of a procedure.

@rnorth
Copy link
Member

rnorth commented Jul 19, 2019

However, for testcontainer maybe ScriptUtils is enough, though. We are not trying to build some connection pool or fully-function JDBC library, after all. We just need to run some sql scripts. This commit
1d03cba#diff-5023355fa84dc1f1c805f42aafe61a8e introduced compound statements support and also this parsing bug. I think we can figure out a way to fix this without replacing ScriptUtils entirely.

I agree. This is, I think, the way I'd prefer to go - I spent some time looking for a splitter/parser that we could use, but keep coming back to:

  • we just need to split the script
  • we don't actually care about dialect or even malformed SQL; it's OK to let the database do the syntax validation

All the parsers I've seen would push us towards a parser/configuration-per-dialect, which I fear would be just be a lot of work given the number of DBs we have to support. Additionally, any cases where the parser has bugs or is overly strict would cause an unnecessary rejection. We just want to split the script, not validate it.

I think that #1627 fixes this particular issue, and seems to be solid at splitting very messy or malformed scripts correctly.

@rnorth
Copy link
Member

rnorth commented Jul 24, 2019

Fix was released in 1.12.0.

@rnorth rnorth added this to the 1.12.0 milestone Jul 24, 2019
@skyline75489
Copy link
Author

Glad to see it shipped 🥂 You guys are awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants