New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StackOverFlow when trying to write to Server #2400
Comments
Hi Reetesh, Thanks for bringing this to our attention. This looks like the parsing logic is entering an endless recursion while parsing the input CSV file. I am working on trying to replicate the problem but could use your assistance. Could you please provide:
I am requesting the SQL statement because it will specify the various terminators used in the CSV file data. These terminators may be a factor in the parsing/recursion problem. I am requesting a sample CSV file because I am concerned that I may not be able to replicate the problem using the small sample you have provided in the problem statement above. The table schema and the terminators will allow us to determine how the CSV data should be inserted into the table. Thx, |
#DB Table Schema CREATE TABLE Special_Char_Issue ( SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(destinationConnection); fileRecord.addColumnMetadata(1, null, Types.INTEGER, 10, 0); fileRecord.setEscapeColumnDelimitersCSV(true); bulkCopy.addColumnMapping(1, 1); |
outputFile.csv |
Thank you Reetesh, the provided information is very helpful. Based upon the provided code and CSV file, I think the issue is with the data in the CSV. I noticed that some rows have 15 columns (i.e. as defined by the coma delimiter, and others have appear to have many more than 15 columns (e.g. rows 3, 6, 7, and 9). The rows with many more columns appear to have the coma delimiter embedded in the data. I believe it is these coma delimiters embedded in the data that is the cause of the recursion problem resulting in a stack overflow. The MS SQL Server documentation (https://learn.microsoft.com/en-us/sql/connect/jdbc/using-bulk-copy-with-the-jdbc-driver?view=sql-server-ver16&redirectedfrom=MSDN) states: "The delimiter specified for the CSV file shouldn't appear anywhere in the data and should be escaped properly if it is a restricted character in Java regular expressions" Looking at the CSV file it is clear that the coma delimiter on lines 1, 2, 4, 5, and 8 are not escaped therefore the coma delimiter on lines 3, 6, 7, and 9 are also not escaped. You have 2 options to solve this issue:
SQLServerBulkCSVFileRecord(java.lang.String fileToParse, java.lang.String encoding, java.lang.String delimiter, boolean firstLineIsColumnNames) Please let me know if this helps resolve your issue. Thx, |
Thanks Barry! I encountered some strange thing while making use of the same file. When i increase the stack size of java, it runs smoothly without giving any stack overflow error. For the rows 3, 6, 7, and 9, the data is passed in the enclosed form of double quote as its coming from the another database in real time scenario. To be more precise, the data is coming from the Postgres DB and being written to MSSQL. |
Using your provided information, I have been able to replicate the problem locally. It appears that the source of the problem is that specific rows in your data set using the double quote in the data causes a very deep recursion call to properly parse the row into columns. You adding space to the Java stack allowed that recursion to complete. While this is a boundary condition and not likely to be encountered by most, it is still an error and we are looking at ways to eliminate the recursion in the regex. Thx, |
Thanks Barry, I appreciate your efforts! Would love to hear more about it. Please do let me know if any fix is made. Best Regards, |
Driver version
mssql-jdbc driver version 11.2.0.jre8
SQL Server version
Provide the output of executing
SELECT @@VERSION
on your target SQL Server.Client Operating System
Operating System the client application is running on Windows.
JAVA/JVM version
Provide the JAVA/JVM version 1.8
Problem description
When i am trying to send the csv file with all the data to the MSSQL jdbc driver by calling writeToServer(), internally it splits all the data based on passed delimiter. When MSSQL is sending the data to spit the row data it gets into StackOverflowError.
This is the particular line which has been trying to split the data from file name SQLServerBulkCSVFileRecord.getRowData()
String[] data = this.escapeDelimiters && this.currentLine.contains(""") ? escapeQuotesRFC4180(this.currentLine.split(this.delimiter + "(?=(?:[^\"]"[^\"]")[^\"]$)", -1)) : this.currentLine.split(this.delimiter, -1);
JDBC trace logs
Provide the JDBC driver trace logs. Instructions can be found here: https://docs.microsoft.com/sql/connect/jdbc/tracing-driver-operation
LEVEL_1 trace - java.lang.StackOverflowError\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4240)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)\r\n at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)\r\n at java.util.regex.Pattern$Curly.match(Pattern.java:4248)\r\n at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)\r\n at java.util.regex.Pattern$Loop.match(Pattern.java:4799)\r\n at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)\r\n at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3812)
log is repeatative so cutting down to few lines.
Sample sinlge row data from the csv
"
!@#$%^&*()_-+=[]{}:;""'<>,.?/|!@#$%^&*()_-+=[]{}"`Same data has been copied multiple times to make its length of 1000, 4000, 8000, 16000 and 32000 for multiple columns
The text was updated successfully, but these errors were encountered: