Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CSV.open with StringIO argument #302

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

MarcPer
Copy link

@MarcPer MarcPer commented May 4, 2024

Fix #300

lib/csv.rb Outdated Show resolved Hide resolved
@kou kou changed the title Allow CSV.foreach with StringIO argument Allow CSV.open with StringIO argument May 4, 2024
@MarcPer MarcPer requested a review from kou May 24, 2024 08:12
lib/csv.rb Outdated
Comment on lines 1604 to 1613
pos = filename_or_io.pos
f = StringIO.new(filename_or_io.read)
filename_or_io.seek(pos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use StringIO#string here?

Suggested change
pos = filename_or_io.pos
f = StringIO.new(filename_or_io.read)
filename_or_io.seek(pos)
f = StringIO.new(filename_or_io.string)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd then run into an issue similar to #98, which I confirmed with another test:

 def test_foreach_stringio_with_bom
   s = StringIO.new
   s << "\ufeff"  # BOM
   s << @data
   s.rewind
   s.read(3)
   rows = CSV.foreach(s, col_sep: "\t", row_sep: "\r\n").to_a
   assert_equal(@rows, rows)
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the most relevant counter-argument is, CSV.new also uses read, so using read here is also a matter of consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Could you also add a test for the BOM case?

How about using StringIO#set_encoding_by_bom for it?

        f = StringIO.new(filename_or_io.string)
        unless f.set_encoding_by_bom
          f.set_encoding(filename_or_io.external_encoding)
        end

CSV.new uses already opened IO. So it must not ignore the current position. So we must use read.
But CSV.open opens a new IO. So it must use all contents instead of contents after the current position.
Could you check the current behavior of CSV.open with sought real IO?

Copy link
Author

@MarcPer MarcPer May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But CSV.open opens a new IO. So it must use all contents instead of contents after the current position.

You're right, CSV.open reads the whole file content. I changed open again to use string instead of read.

Could you also add a test for the BOM case?

Sure thing. I added it, and together with using the **file_opts in the new StringIO, it's handled correctly.

How about using StringIO#set_encoding_by_bom for it?

I couldn't see any effect of doing it. It seems like including the file_opts is enough. Or is there a case I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see any effect of doing it.

Ah, b706d91 is a trick for it.

test/csv/interface/test_read.rb Outdated Show resolved Hide resolved
test/csv/interface/test_read.rb Outdated Show resolved Hide resolved
test/csv/interface/test_read.rb Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Read StringIO with foreach
2 participants