Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support header-only LAS files--don't lose the last header section before a missng ~A section #326

Merged
merged 3 commits into from
May 8, 2020

Conversation

ejschoen
Copy link
Contributor

@ejschoen ejschoen commented May 2, 2020

In practice, we run into LAS files that contain header sections only--no array data section. We'd still like to extract useful information from these files (cf. Postel's Law).

For these files, with or without the ignore_data option to lasio.read, the last header section from the file is missing. This can lead to errors in which las.curves is empty, even though there is a curve section, and we get array index exceptions from las.well.STRT, for example.

This is because raw sections are only saved when the line-by-line reader in reader.py's read_file_contents runs into a ~ line, and not at the end of the line-by-line enumeration. Not a problem when there's an ~A section, since this handled specially and is the last section of the file. But when there is no ~A section, a last bit of code to save the section that was being scanned up to EOF is required.

This pull request adds the necessary bit of code, and adds a test to tests/test_read.py to verify that curves are correctly returned when there is no ~A section.

@kinverarity1
Copy link
Owner

Thank you so much for this, it is excellent!

I will try to figure out the two test failures (which I think probably relate more to the twisted mess of LASFile.read) and merge once I can.

@dcslagel
Copy link
Collaborator

dcslagel commented May 7, 2020

Hi @kinverarity1,

I spent some time to look at one of the two test failures. The short of it is that @ejschoen's code is probably handling the files properly and the test should be updated to reflect this.

The test I dug into is for an utf-16-le encoded file but without a BOM (tests/example/encodings_utf16le.las ). Due in-part to the missing BOM, chardet is incorrectly deciding that the file is encoded with windows-1252. The parsing then throws a KeyError which the test catches. With this change, the parsing will simply report that it didn't find the metadata and data sections.

On a side note, when this pull-request merges into chardet then the utf-16-le file without a BOM might be correctly identified and Lasio would be able to read it.
chardet/chardet#109 : UTF detection when missing Byte Order Mark.

DC

kinverarity1 added a commit that referenced this pull request May 8, 2020
@kinverarity1 kinverarity1 merged commit b292b3a into kinverarity1:master May 8, 2020
@kinverarity1
Copy link
Owner

Thanks @ejschoen and @dcslagel. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants