Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spaces around CSV quoted strings #15727

Merged

Conversation

thabetx
Copy link
Contributor

@thabetx thabetx commented May 11, 2024

Description

This PR adds an option to CSV parsing to detect quotes even if they are surrounded by whitespaces.

Current behavior when options.keepquotes == false:

  • "A" -> A
  • "A" -> "A" (The spaces around the 'A' are not removed and the quotes are kept)

New behavior after enabling the new option:

  • "A" -> A
  • "A" -> A

The new option is false by default to avoid breaking any code that relied on the old behavior.

Closes #13892.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented May 11, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. cuDF (Python) Affects Python cuDF API. labels May 11, 2024
@thabetx thabetx changed the title Fix spaces around quoted strings Fix spaces around CSV quoted strings May 11, 2024
Copy link
Contributor

@shrshi shrshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this - it looks pretty great already! A few small comments -

cpp/include/cudf/io/csv.hpp Outdated Show resolved Hide resolved
cpp/src/io/csv/csv_gpu.cu Outdated Show resolved Hide resolved
cpp/tests/io/csv_test.cpp Outdated Show resolved Hide resolved
@shrshi shrshi added non-breaking Non-breaking change feature request New feature or request labels May 17, 2024
@shrshi
Copy link
Contributor

shrshi commented May 17, 2024

/ok to test

@shrshi
Copy link
Contributor

shrshi commented May 17, 2024

It looks like the style check is failing. I'd suggest running pre-commit - which can be installed with conda or pip - to format the code and pushing changes again. Please refer to code formatting section of the guide.

@thabetx
Copy link
Contributor Author

thabetx commented May 18, 2024

@shrshi You can take another look now. Thanks for the review.

@vyasr vyasr requested a review from shrshi May 20, 2024 22:40
@shrshi
Copy link
Contributor

shrshi commented May 20, 2024

/ok to test

@shrshi
Copy link
Contributor

shrshi commented May 21, 2024

/ok to test

@shrshi shrshi marked this pull request as ready for review May 21, 2024 22:19
@shrshi shrshi requested review from a team as code owners May 21, 2024 22:19
@shrshi shrshi requested review from wence- and bdice May 21, 2024 22:19
Copy link
Contributor

@shrshi shrshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for applying the suggestions!

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two (non-blocking) queries, approving python changes.

cpp/src/io/utilities/parsing_utils.cuh Show resolved Hide resolved
@vyasr
Copy link
Contributor

vyasr commented May 22, 2024

/merge

@rapids-bot rapids-bot bot merged commit a5f6fa3 into rapidsai:branch-24.06 May 22, 2024
70 checks passed
@vyasr
Copy link
Contributor

vyasr commented May 22, 2024

Thanks for the contribution @thabetx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuDF (Python) Affects Python cuDF API. feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] CSV option to strip trailing white space after a quoted field.
5 participants