Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to Disable Auto-Escaping During Parsing #1039

Open
VDumitrak opened this issue Feb 9, 2024 · 3 comments
Open

Option to Disable Auto-Escaping During Parsing #1039

VDumitrak opened this issue Feb 9, 2024 · 3 comments

Comments

@VDumitrak
Copy link

Summary:
Many users of PapaParse may encounter CSV files that contain pre-escaped characters. In the current implementation, PapaParse automatically adds escaping to these characters, which results in double-escaped characters in the output. This behavior can be problematic for CSVs that are expected to contain escape characters as part of the data.

Issue:
When parsing a CSV with pre-escaped quotes (either with a backslash or double quotes), PapaParse's parser automatically escapes these characters, leading to an unexpected doubling of escape characters in the output.

For example, an input CSV line like:
"Test \"Test string\" Test","Definitely \"real\" cash"
gets parsed to:
["Test \\\"Test string\\\" Test", "Definitely \\\"real\\\" cash"]
instead of the expected:
["Test \"Test string\" Test", "Definitely \"real\" cash"]

Similarly, a value enclosed in triple quotes to signify an internal quote like:
"""Test \"Test string\" Test"""
results in:
["\"Test \\\"Test string\\\" Test\""]
which should ideally remain:
["""Test \"Test string\" Test"""]

Feature Request:
It would be beneficial to have an option to disable auto-escaping entirely when parsing CSV files. This would allow users to work with CSV data that already includes the necessary escaping and expects it to be preserved as-is.

@pokoli
Copy link
Collaborator

pokoli commented Feb 9, 2024

I'm wondering if instead of adding an option it will be better to detect if the quotes are already scaped and always parse the right string. What do you think?

@VDumitrak
Copy link
Author

I'm wondering if instead of adding an option it will be better to detect if the quotes are already scaped and always parse the right string. What do you think?

That sounds like a great approach! If PapaParse could intelligently detect pre-escaped quotes and parse them correctly without additional configuration, it would seamlessly handle various CSV formats and make the parsing process much more intuitive.

@pokoli
Copy link
Collaborator

pokoli commented Feb 9, 2024

Maybe we just need to know if the next caracter is the same caracter, so this will mean that is already escaped.
But this is just a quick tought and maybe its more complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants