Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeqIO parse side effect remove #4280

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

BabaYaga1221
Copy link
Contributor

@BabaYaga1221 BabaYaga1221 commented Apr 10, 2023

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

Closes #...
Add __entry__ and __exit__ in Bio/SeqIO/Interfaces.py.
@mdehoon @peterjc

@mdehoon
Copy link
Contributor

mdehoon commented Apr 13, 2023

@BabaYaga1221 Thank you. Some of the tests are failing, can you have a look?

@BabaYaga1221
Copy link
Contributor Author

Okay sir, I will try handle them, but can provide assistance about where and why the test fails.@mdehoon

@peterjc
Copy link
Member

peterjc commented Apr 13, 2023

Test failures on Linux from the CI:

======================================================================
ERROR: test_write_tsa_data_division (test_GenBank.OutputTests)
Make sure we don't kill the TSA data_file_division for TSA files.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/biopython/biopython/Tests/test_GenBank.py", line 8328, in test_write_tsa_data_division
    infile.seek(0)
ValueError: I/O operation on closed file.

======================================================================
ERROR: test_conversion (test_SeqIO_Insdc.ConvertTestsInsdc)
Test format conversion by SeqIO.write/SeqIO.parse and SeqIO.convert.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/biopython/biopython/Tests/test_SeqIO_Insdc.py", line 117, in test_conversion
    self.check_conversion(filename, in_format, out_format)
  File "/home/runner/work/biopython/biopython/Tests/test_SeqIO.py", line 134, in check_conversion
    self.assertEqual(handle.getvalue(), handle2.getvalue(), msg=msg)
ValueError: I/O operation on closed file

======================================================================
ERROR: test_overlapping_clip (test_SeqIO_QualityIO.TestSFF)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/biopython/biopython/Tests/test_SeqIO_QualityIO.py", line 971, in test_overlapping_clip
    h.seek(0)
ValueError: I/O operation on closed file.

======================================================================
ERROR: test_conversion (test_SeqIO_QualityIO.TestsConverter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/biopython/biopython/Tests/test_SeqIO_QualityIO.py", line 1075, in test_conversion
    self.check_conversion(filename, in_format, out_format)
  File "/home/runner/work/biopython/biopython/Tests/test_SeqIO_QualityIO.py", line 1039, in check_conversion
    self.assertEqual(handle.getvalue(), handle2.getvalue(), msg=msg)
ValueError: I/O operation on closed file

======================================================================
ERROR: test_no_index (test_SffIO.TestErrors)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/biopython/biopython/Tests/test_SffIO.py", line 181, in test_no_index
    handle.seek(0)
ValueError: I/O operation on closed file.

----------------------------------------------------------------------

It appears to be closing the file earlier than those tests expect. You should be able to run those test files alone to reproduce this, e.g. test_SffIO.py

Bio/SeqIO/Interfaces.py Outdated Show resolved Hide resolved
@BabaYaga1221
Copy link
Contributor Author

Could you please review the error? It seems that the root cause of the issue is related to the dependencies. My assumption is that the problem lies within the pip library, but I would appreciate your confirmation on this. @peterjc @mdehoon

@mdehoon
Copy link
Contributor

mdehoon commented Apr 16, 2023

@peterjc

I realized that our design of SequenceIterator is a bit strange .. The parse function in SequenceIterator returns an iterator, but SequenceIterator itself is also an iterator. Keep in mind that yield, behind the scenes, creates a Python class. Then we end up with a double iterator.

For comparison, AlignmentIterator in Bio.Align uses only a single iterator.

@peterjc
Copy link
Member

peterjc commented Apr 16, 2023

Not sure what's going on with the CI dependencies, but yes this is unrelated:

Run python -m pip install --upgrade --upgrade-strategy eager -r ci-dependencies.txt
Requirement already satisfied: setuptools in /opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages (from -r ci-dependencies.txt (line 5)) (6[7](https://github.com/biopython/biopython/actions/runs/4710301412/jobs/8353863005?pr=4280#step:6:8).6.1)
Requirement already satisfied: wheel in /opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages (from -r ci-dependencies.txt (line 6)) (0.40.0)
Collecting black==19.10b0 (from -r ci-dependencies.txt (line 10))
  Downloading black-19.10b0-py36-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 kB 3.3 MB/s eta 0:00:00
Collecting coverage (from -r ci-dependencies.txt (line 13))
  Downloading coverage-7.2.3-cp310-cp310-manylinux_2_5_x[8](https://github.com/biopython/biopython/actions/runs/4710301412/jobs/8353863005?pr=4280#step:6:9)6_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux20[14](https://github.com/biopython/biopython/actions/runs/4710301412/jobs/8353863005?pr=4280#step:6:15)_x86_64.whl (227 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.0/228.0 kB 21.5 MB/s eta 0:00:00
ERROR: Could not find a version that satisfies the requirement codecov (from versions: none)
ERROR: No matching distribution found for codecov

@peterjc
Copy link
Member

peterjc commented Apr 16, 2023

Change 2f188af might help, that file was requesting quite an old version of black. Separately we should avoid that inconsistency in future if possible...

@peterjc
Copy link
Member

peterjc commented Apr 16, 2023

See #4281 for the codecov problem (it was deprecated and has been removed from PyPI).

@peterjc
Copy link
Member

peterjc commented May 5, 2023

Please revert your change for #820, this is not a self-test but deliberate functionality when running the file directly. See #820 (comment)

@BabaYaga1221
Copy link
Contributor Author

I apologize for my confusion earlier. I understand now that I accidentally committed code to the same branch that I was using to solve a separate issue related to SeqIO. I am reverting the changes.

@BabaYaga1221
Copy link
Contributor Author

@peterjc Sir, Can you help me that what should I need to change for resolving this issue.

Bio/SeqIO/Interfaces.py Outdated Show resolved Hide resolved
Bio/SeqIO/Interfaces.py Outdated Show resolved Hide resolved
@BabaYaga1221
Copy link
Contributor Author

@peterjc As suggested, I implemented the changes and the total count failures in appveyor decreased from 57 to 16, still some test fails.

@peterjc
Copy link
Member

peterjc commented May 9, 2023

Lots of errors Bio.StreamModeError: None files must be opened in text mode. and looking at the code you have partly changed the mode and fmt local to self.mode and self.fmt (why?) but not finished this. I suggest reverting back to just the local variables instead.

@peterjc
Copy link
Member

peterjc commented May 9, 2023

This is looking much more promising... CircleCI has failed while trying to install pycairo, so unrelated to your changes.

@BabaYaga1221
Copy link
Contributor Author

@peterjc I appreciate your help, Sir, but I've been struggling with this code for 15 days. Could you please review it and let me know if any changes are suggested?

@peterjc
Copy link
Member

peterjc commented May 9, 2023

@BabaYaga1221 the only stylistic change I have now is to remove rather than just comment out the old lines # self.should_close_stream = ...

@mdehoon over to you to review the overall changes (the history is messy, I'd squash-and-merge this if the changes are OK), since you are much more familiar with writing context managers than I am.

@peterjc peterjc requested a review from mdehoon May 9, 2023 12:09
@BabaYaga1221
Copy link
Contributor Author

@mdehoon @peterjc it's about 2 days for the review from your side, Sir. Would you like to review the code, so my work can be merged into the codebase?

@peterjc
Copy link
Member

peterjc commented May 10, 2023

@BabaYaga1221 Please wait a week before asking again.

raise

def __next__(self):
"""Return the next entry."""
try:
return next(self.records)
except Exception:
if self.should_close_stream:
if self.stream is not self.source:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to close the stream here.

raise

def __next__(self):
"""Return the next entry."""
try:
return next(self.records)
except Exception:
if self.should_close_stream:
self.stream.close()
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the try: except: block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the record is empty then, it might generate error and that's why I add try: except: block

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, end of file and no more records does seem likely to trigger this try/except.

try:
self.records = self.parse(self.stream)
except Exception:
if self.should_close_stream:
self.stream.close()
self.stream.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to close the stream here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the as per your suggestion.

@peterjc
Copy link
Member

peterjc commented May 12, 2023

Did you run the tests locally? They're failing now:

FAIL: test_embl_0_line (test_EMBL_unittest.EMBLTests)
Test SQ line with 0 length sequence.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/biopython/biopython/Tests/test_EMBL_unittest.py", line 43, in test_embl_0_line
    self.assertEqual(
AssertionError: 1 != 0 : Unexpected parser warnings: unclosed file <_io.TextIOWrapper name='EMBL/embl_with_0_line.embl' mode='rt' encoding='UTF-8'>

======================================================================
FAIL: test_qualifier_escaping_read (test_GenBank.GenBankTests)
Check qualifier escaping is preserved when parsing.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/biopython/biopython/Tests/test_GenBank.py", line 7792, in test_qualifier_escaping_read
    self.assertEqual(len(caught), 4)
AssertionError: 5 != 4

======================================================================
FAIL: test_features_spanning_origin (test_GenBank.TestFeatureParser)
Test that features that span the origin on circular DNA are included correctly for different ways of specifying the topology.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/biopython/biopython/Tests/test_GenBank.py", line 7389, in test_features_spanning_origin
    self.assertEqual(len(caught), 1)
AssertionError: 2 != 1

----------------------------------------------------------------------

@BabaYaga1221
Copy link
Contributor Author

actually, I didn't run them locally, And -

@peterjc

I realized that our design of SequenceIterator is a bit strange... The parse function in SequenceIterator returns an iterator, but SequenceIterator itself is also an iterator. Keep in mind that yield, behind the scenes, creates a Python class. Then we end up with a double iterator.

For comparison, AlignmentIterator in Bio.Align uses only a single iterator.

These errors might be due to the strange design of SequenceIterator which is a double iterator, correct me if I am wrong.

@BabaYaga1221 BabaYaga1221 requested a review from mdehoon May 12, 2023 18:15
@peterjc
Copy link
Member

peterjc commented May 12, 2023

Well, I didn't add a beginners tag to #4252 deliberately. The changes looked reasonable (prior to the last commit which broke the tests), but may still be doing something change. If you haven't already tried it, my advice is try adding logging to all the key methods (at least init, enter, exit, next) to examine what is happening with some simple use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants