Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverFlow when trying to write to Server #2400

Open
rishureetesh opened this issue Apr 24, 2024 · 7 comments
Open

StackOverFlow when trying to write to Server #2400

rishureetesh opened this issue Apr 24, 2024 · 7 comments
Projects

Comments

@rishureetesh
Copy link

Driver version

mssql-jdbc driver version 11.2.0.jre8

SQL Server version

Provide the output of executing SELECT @@VERSION on your target SQL Server.

Client Operating System

Operating System the client application is running on Windows.

JAVA/JVM version

Provide the JAVA/JVM version 1.8

Problem description

When i am trying to send the csv file with all the data to the MSSQL jdbc driver by calling writeToServer(), internally it splits all the data based on passed delimiter. When MSSQL is sending the data to spit the row data it gets into StackOverflowError.

This is the particular line which has been trying to split the data from file name SQLServerBulkCSVFileRecord.getRowData()
String[] data = this.escapeDelimiters && this.currentLine.contains(""") ? escapeQuotesRFC4180(this.currentLine.split(this.delimiter + "(?=(?:[^\"]"[^\"]")[^\"]$)", -1)) : this.currentLine.split(this.delimiter, -1);

JDBC trace logs

Provide the JDBC driver trace logs. Instructions can be found here: https://docs.microsoft.com/sql/connect/jdbc/tracing-driver-operation
image

LEVEL_1 trace - java.lang.StackOverflowError\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4240)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)

log is repeatative so cutting down to few lines.

Sample sinlge row data from the csv

"!@#$%^&*()_-+=[]{}:;""'<>,.?/|!@#$%^&*()_-+=[]{}"`
Same data has been copied multiple times to make its length of 1000, 4000, 8000, 16000 and 32000 for multiple columns

@Jeffery-Wasty Jeffery-Wasty added this to Under Investigation in MSSQL JDBC via automation Apr 25, 2024
@barryw-mssql
Copy link
Contributor

Hi Reetesh,

Thanks for bringing this to our attention.

This looks like the parsing logic is entering an endless recursion while parsing the input CSV file. I am working on trying to replicate the problem but could use your assistance. Could you please provide:

  1. Schema for the DB table where the data is inserted
  2. SQL BULK INSERT statement used to insert the CSV data
  3. Sample CSV file causing the problem.

I am requesting the SQL statement because it will specify the various terminators used in the CSV file data. These terminators may be a factor in the parsing/recursion problem. I am requesting a sample CSV file because I am concerned that I may not be able to replicate the problem using the small sample you have provided in the problem statement above. The table schema and the terminators will allow us to determine how the CSV data should be inserted into the table.

Thx,
Barry

@barryw-mssql barryw-mssql added the Under Investigation Used for issues under investigation label Apr 26, 2024
@Jeffery-Wasty Jeffery-Wasty added Waiting for Response Waiting for a reply from the original poster, or affiliated party and removed Under Investigation Used for issues under investigation labels Apr 26, 2024
@Jeffery-Wasty Jeffery-Wasty moved this from Under Investigation to Waiting for Customer in MSSQL JDBC Apr 26, 2024
@rishureetesh
Copy link
Author

rishureetesh commented Apr 29, 2024

#DB Table Schema

CREATE TABLE Special_Char_Issue (
pk int NOT NULL,
col_char_1 varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_50 varchar(200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_200 varchar(800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_1000 varchar(4000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_4000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_8000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_16000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_1 varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_50 varchar(200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_200 varchar(800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_1000 varchar(4000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_4000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_8000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_16000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT PK_Special_Char_Issue_pgs005_src_allchar_init_pkey PRIMARY KEY (pk)
);

SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(destinationConnection);
SQLServerBulkCSVFileRecord fileRecord = new SQLServerBulkCSVFileRecord("C:\WorkBench\SQLServerStackPOC\src\main\resources\outputFile.csv", false);

fileRecord.addColumnMetadata(1, null, Types.INTEGER, 10, 0);
fileRecord.addColumnMetadata(2, null, Types.VARCHAR, 4, 0);
fileRecord.addColumnMetadata(3, null, Types.VARCHAR, 200, 0);
fileRecord.addColumnMetadata(4, null, Types.VARCHAR, 800, 0);
fileRecord.addColumnMetadata(5, null, Types.VARCHAR, 4000, 0);
fileRecord.addColumnMetadata(6, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(7, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(8, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(9, null, Types.VARCHAR, 4, 0);
fileRecord.addColumnMetadata(10, null, Types.VARCHAR, 200, 0);
fileRecord.addColumnMetadata(11, null, Types.VARCHAR, 800, 0);
fileRecord.addColumnMetadata(12, null, Types.VARCHAR, 4000, 0);
fileRecord.addColumnMetadata(13, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(14, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(15, null, Types.VARCHAR, -1, 0);

fileRecord.setEscapeColumnDelimitersCSV(true);
copyOptions = new SQLServerBulkCopyOptions();
copyOptions.setKeepIdentity(true);
copyOptions.setBatchSize(100000);
copyOptions.setBulkCopyTimeout(0);
bulkCopy.setBulkCopyOptions(copyOptions);
bulkCopy.setDestinationTableName(tableName);

bulkCopy.addColumnMapping(1, 1);
bulkCopy.addColumnMapping(2, 2);
bulkCopy.addColumnMapping(3, 3);
bulkCopy.addColumnMapping(4, 4);
bulkCopy.addColumnMapping(5, 5);
bulkCopy.addColumnMapping(6, 6);
bulkCopy.addColumnMapping(7, 7);
bulkCopy.addColumnMapping(8, 8);
bulkCopy.addColumnMapping(9, 9);
bulkCopy.addColumnMapping(10, 10);
bulkCopy.addColumnMapping(11, 11);
bulkCopy.addColumnMapping(12, 12);
bulkCopy.addColumnMapping(13, 13);
bulkCopy.addColumnMapping(14, 14);
bulkCopy.addColumnMapping(15, 15);

@rishureetesh
Copy link
Author

outputFile.csv
CSV File to be used for data insertion in DB

@barryw-mssql
Copy link
Contributor

Thank you Reetesh, the provided information is very helpful.

Based upon the provided code and CSV file, I think the issue is with the data in the CSV. I noticed that some rows have 15 columns (i.e. as defined by the coma delimiter, and others have appear to have many more than 15 columns (e.g. rows 3, 6, 7, and 9). The rows with many more columns appear to have the coma delimiter embedded in the data.

I believe it is these coma delimiters embedded in the data that is the cause of the recursion problem resulting in a stack overflow. The MS SQL Server documentation (https://learn.microsoft.com/en-us/sql/connect/jdbc/using-bulk-copy-with-the-jdbc-driver?view=sql-server-ver16&redirectedfrom=MSDN) states:

"The delimiter specified for the CSV file shouldn't appear anywhere in the data and should be escaped properly if it is a restricted character in Java regular expressions"

Looking at the CSV file it is clear that the coma delimiter on lines 1, 2, 4, 5, and 8 are not escaped therefore the coma delimiter on lines 3, 6, 7, and 9 are also not escaped.

You have 2 options to solve this issue:

  1. Escape the coma delimiter for every line in your CSV file
  2. Change the coma delimiter to something that is not in the data. You can specify the specific/non-standard delimiter in the overloaded SQLServerBulkCSVFileRecord constructor:

SQLServerBulkCSVFileRecord​(java.lang.String fileToParse, java.lang.String encoding, java.lang.String delimiter, boolean firstLineIsColumnNames)

Please let me know if this helps resolve your issue.

Thx,
Barry

@rishureetesh
Copy link
Author

rishureetesh commented Apr 30, 2024

Thanks Barry!

I encountered some strange thing while making use of the same file. When i increase the stack size of java, it runs smoothly without giving any stack overflow error.

For the rows 3, 6, 7, and 9, the data is passed in the enclosed form of double quote as its coming from the another database in real time scenario. To be more precise, the data is coming from the Postgres DB and being written to MSSQL.

@Jeffery-Wasty Jeffery-Wasty removed the Waiting for Response Waiting for a reply from the original poster, or affiliated party label May 6, 2024
@Jeffery-Wasty Jeffery-Wasty moved this from Waiting for Customer to Under Investigation in MSSQL JDBC May 6, 2024
@barryw-mssql
Copy link
Contributor

Using your provided information, I have been able to replicate the problem locally. It appears that the source of the problem is that specific rows in your data set using the double quote in the data causes a very deep recursion call to properly parse the row into columns. You adding space to the Java stack allowed that recursion to complete. While this is a boundary condition and not likely to be encountered by most, it is still an error and we are looking at ways to eliminate the recursion in the regex.

Thx,
Barry

@rishureetesh
Copy link
Author

Thanks Barry,

I appreciate your efforts!

Would love to hear more about it. Please do let me know if any fix is made.

Best Regards,
Reetesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
MSSQL JDBC
  
Under Investigation
Development

No branches or pull requests

3 participants