StackOverFlow when trying to write to Server #2400

rishureetesh · 2024-04-24T05:06:14Z

Driver version

mssql-jdbc driver version 11.2.0.jre8

SQL Server version

Provide the output of executing SELECT @@VERSION on your target SQL Server.

Client Operating System

Operating System the client application is running on Windows.

JAVA/JVM version

Provide the JAVA/JVM version 1.8

Problem description

When i am trying to send the csv file with all the data to the MSSQL jdbc driver by calling writeToServer(), internally it splits all the data based on passed delimiter. When MSSQL is sending the data to spit the row data it gets into StackOverflowError.

This is the particular line which has been trying to split the data from file name SQLServerBulkCSVFileRecord.getRowData()
String[] data = this.escapeDelimiters && this.currentLine.contains(""") ? escapeQuotesRFC4180(this.currentLine.split(this.delimiter + "(?=(?:[^\"]"[^\"]")[^\"]$)", -1)) : this.currentLine.split(this.delimiter, -1);

JDBC trace logs

Provide the JDBC driver trace logs. Instructions can be found here: https://docs.microsoft.com/sql/connect/jdbc/tracing-driver-operation

LEVEL_1 trace - java.lang.StackOverflowError\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4240)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)

log is repeatative so cutting down to few lines.

Sample sinlge row data from the csv

"~~!@#$%^&*()_-+=[]{}:;""'<>,.?/|~~!@#$%^&*()_-+=[]{}"`
Same data has been copied multiple times to make its length of 1000, 4000, 8000, 16000 and 32000 for multiple columns

The text was updated successfully, but these errors were encountered:

barryw-mssql · 2024-04-26T17:56:22Z

Hi Reetesh,

Thanks for bringing this to our attention.

This looks like the parsing logic is entering an endless recursion while parsing the input CSV file. I am working on trying to replicate the problem but could use your assistance. Could you please provide:

Schema for the DB table where the data is inserted
SQL BULK INSERT statement used to insert the CSV data
Sample CSV file causing the problem.

I am requesting the SQL statement because it will specify the various terminators used in the CSV file data. These terminators may be a factor in the parsing/recursion problem. I am requesting a sample CSV file because I am concerned that I may not be able to replicate the problem using the small sample you have provided in the problem statement above. The table schema and the terminators will allow us to determine how the CSV data should be inserted into the table.

Thx,
Barry

rishureetesh · 2024-04-29T09:59:15Z

#DB Table Schema

CREATE TABLE Special_Char_Issue (
pk int NOT NULL,
col_char_1 varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_50 varchar(200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_200 varchar(800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_1000 varchar(4000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_4000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_8000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_char_16000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_1 varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_50 varchar(200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_200 varchar(800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_1000 varchar(4000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_4000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_8000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
col_varchar_16000 varchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT PK_Special_Char_Issue_pgs005_src_allchar_init_pkey PRIMARY KEY (pk)
);

SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(destinationConnection);
SQLServerBulkCSVFileRecord fileRecord = new SQLServerBulkCSVFileRecord("C:\WorkBench\SQLServerStackPOC\src\main\resources\outputFile.csv", false);

fileRecord.addColumnMetadata(1, null, Types.INTEGER, 10, 0);
fileRecord.addColumnMetadata(2, null, Types.VARCHAR, 4, 0);
fileRecord.addColumnMetadata(3, null, Types.VARCHAR, 200, 0);
fileRecord.addColumnMetadata(4, null, Types.VARCHAR, 800, 0);
fileRecord.addColumnMetadata(5, null, Types.VARCHAR, 4000, 0);
fileRecord.addColumnMetadata(6, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(7, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(8, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(9, null, Types.VARCHAR, 4, 0);
fileRecord.addColumnMetadata(10, null, Types.VARCHAR, 200, 0);
fileRecord.addColumnMetadata(11, null, Types.VARCHAR, 800, 0);
fileRecord.addColumnMetadata(12, null, Types.VARCHAR, 4000, 0);
fileRecord.addColumnMetadata(13, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(14, null, Types.VARCHAR, -1, 0);
fileRecord.addColumnMetadata(15, null, Types.VARCHAR, -1, 0);

fileRecord.setEscapeColumnDelimitersCSV(true);
copyOptions = new SQLServerBulkCopyOptions();
copyOptions.setKeepIdentity(true);
copyOptions.setBatchSize(100000);
copyOptions.setBulkCopyTimeout(0);
bulkCopy.setBulkCopyOptions(copyOptions);
bulkCopy.setDestinationTableName(tableName);

bulkCopy.addColumnMapping(1, 1);
bulkCopy.addColumnMapping(2, 2);
bulkCopy.addColumnMapping(3, 3);
bulkCopy.addColumnMapping(4, 4);
bulkCopy.addColumnMapping(5, 5);
bulkCopy.addColumnMapping(6, 6);
bulkCopy.addColumnMapping(7, 7);
bulkCopy.addColumnMapping(8, 8);
bulkCopy.addColumnMapping(9, 9);
bulkCopy.addColumnMapping(10, 10);
bulkCopy.addColumnMapping(11, 11);
bulkCopy.addColumnMapping(12, 12);
bulkCopy.addColumnMapping(13, 13);
bulkCopy.addColumnMapping(14, 14);
bulkCopy.addColumnMapping(15, 15);

rishureetesh · 2024-04-29T10:04:18Z

outputFile.csv
CSV File to be used for data insertion in DB

barryw-mssql · 2024-04-29T19:01:31Z

Thank you Reetesh, the provided information is very helpful.

Based upon the provided code and CSV file, I think the issue is with the data in the CSV. I noticed that some rows have 15 columns (i.e. as defined by the coma delimiter, and others have appear to have many more than 15 columns (e.g. rows 3, 6, 7, and 9). The rows with many more columns appear to have the coma delimiter embedded in the data.

I believe it is these coma delimiters embedded in the data that is the cause of the recursion problem resulting in a stack overflow. The MS SQL Server documentation (https://learn.microsoft.com/en-us/sql/connect/jdbc/using-bulk-copy-with-the-jdbc-driver?view=sql-server-ver16&redirectedfrom=MSDN) states:

"The delimiter specified for the CSV file shouldn't appear anywhere in the data and should be escaped properly if it is a restricted character in Java regular expressions"

Looking at the CSV file it is clear that the coma delimiter on lines 1, 2, 4, 5, and 8 are not escaped therefore the coma delimiter on lines 3, 6, 7, and 9 are also not escaped.

You have 2 options to solve this issue:

Escape the coma delimiter for every line in your CSV file
Change the coma delimiter to something that is not in the data. You can specify the specific/non-standard delimiter in the overloaded SQLServerBulkCSVFileRecord constructor:

SQLServerBulkCSVFileRecord(java.lang.String fileToParse, java.lang.String encoding, java.lang.String delimiter, boolean firstLineIsColumnNames)

Please let me know if this helps resolve your issue.

Thx,
Barry

rishureetesh · 2024-04-30T05:47:21Z

Thanks Barry!

I encountered some strange thing while making use of the same file. When i increase the stack size of java, it runs smoothly without giving any stack overflow error.

For the rows 3, 6, 7, and 9, the data is passed in the enclosed form of double quote as its coming from the another database in real time scenario. To be more precise, the data is coming from the Postgres DB and being written to MSSQL.

barryw-mssql · 2024-05-09T17:19:08Z

Using your provided information, I have been able to replicate the problem locally. It appears that the source of the problem is that specific rows in your data set using the double quote in the data causes a very deep recursion call to properly parse the row into columns. You adding space to the Java stack allowed that recursion to complete. While this is a boundary condition and not likely to be encountered by most, it is still an error and we are looking at ways to eliminate the recursion in the regex.

Thx,
Barry

rishureetesh · 2024-05-10T10:22:40Z

Thanks Barry,

I appreciate your efforts!

Would love to hear more about it. Please do let me know if any fix is made.

Best Regards,
Reetesh

Jeffery-Wasty added this to Under Investigation in MSSQL JDBC via automation Apr 25, 2024

barryw-mssql added the Under Investigation Used for issues under investigation label Apr 26, 2024

Jeffery-Wasty added Waiting for Response Waiting for a reply from the original poster, or affiliated party and removed Under Investigation Used for issues under investigation labels Apr 26, 2024

Jeffery-Wasty moved this from Under Investigation to Waiting for Customer in MSSQL JDBC Apr 26, 2024

Jeffery-Wasty removed the Waiting for Response Waiting for a reply from the original poster, or affiliated party label May 6, 2024

Jeffery-Wasty moved this from Waiting for Customer to Under Investigation in MSSQL JDBC May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StackOverFlow when trying to write to Server #2400

StackOverFlow when trying to write to Server #2400

rishureetesh commented Apr 24, 2024

barryw-mssql commented Apr 26, 2024

rishureetesh commented Apr 29, 2024 •

edited

rishureetesh commented Apr 29, 2024

barryw-mssql commented Apr 29, 2024

rishureetesh commented Apr 30, 2024 •

edited

barryw-mssql commented May 9, 2024

rishureetesh commented May 10, 2024

StackOverFlow when trying to write to Server #2400

StackOverFlow when trying to write to Server #2400

Comments

rishureetesh commented Apr 24, 2024

Driver version

SQL Server version

Client Operating System

JAVA/JVM version

Problem description

JDBC trace logs

barryw-mssql commented Apr 26, 2024

rishureetesh commented Apr 29, 2024 • edited

rishureetesh commented Apr 29, 2024

barryw-mssql commented Apr 29, 2024

rishureetesh commented Apr 30, 2024 • edited

barryw-mssql commented May 9, 2024

rishureetesh commented May 10, 2024

rishureetesh commented Apr 29, 2024 •

edited

rishureetesh commented Apr 30, 2024 •

edited