Support header-only LAS files--don't lose the last header section before a missng ~A section #326

ejschoen · 2020-05-02T13:59:06Z

In practice, we run into LAS files that contain header sections only--no array data section. We'd still like to extract useful information from these files (cf. Postel's Law).

For these files, with or without the ignore_data option to lasio.read, the last header section from the file is missing. This can lead to errors in which las.curves is empty, even though there is a curve section, and we get array index exceptions from las.well.STRT, for example.

This is because raw sections are only saved when the line-by-line reader in reader.py's read_file_contents runs into a ~ line, and not at the end of the line-by-line enumeration. Not a problem when there's an ~A section, since this handled specially and is the last section of the file. But when there is no ~A section, a last bit of code to save the section that was being scanned up to EOF is required.

This pull request adds the necessary bit of code, and adds a test to tests/test_read.py to verify that curves are correctly returned when there is no ~A section.

…ore a missing ~A section

Rename header_only.py to header_only.las

kinverarity1 · 2020-05-03T07:36:05Z

Thank you so much for this, it is excellent!

I will try to figure out the two test failures (which I think probably relate more to the twisted mess of LASFile.read) and merge once I can.

dcslagel · 2020-05-07T22:15:49Z

Hi @kinverarity1,

I spent some time to look at one of the two test failures. The short of it is that @ejschoen's code is probably handling the files properly and the test should be updated to reflect this.

The test I dug into is for an utf-16-le encoded file but without a BOM (tests/example/encodings_utf16le.las ). Due in-part to the missing BOM, chardet is incorrectly deciding that the file is encoded with windows-1252. The parsing then throws a KeyError which the test catches. With this change, the parsing will simply report that it didn't find the metadata and data sections.

On a side note, when this pull-request merges into chardet then the utf-16-le file without a BOM might be correctly identified and Lasio would be able to read it.
chardet/chardet#109 : UTF detection when missing Byte Order Mark.

DC

kinverarity1 · 2020-05-08T07:31:50Z

Thanks @ejschoen and @dcslagel. Much appreciated.

Eric Schoen and others added 3 commits May 2, 2020 08:41

Support header-only LAS files--don't lose the last header section bef…

7959509

…ore a missing ~A section

Rename header_only.py to header_only.las

a34f2f6

Merge pull request #1 from dcslagel/header-only-las-file

03c8821

Rename header_only.py to header_only.las

kinverarity1 added a commit that referenced this pull request May 8, 2020

Skip failing tests (see #326)

c64bed1

kinverarity1 merged commit b292b3a into kinverarity1:master May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support header-only LAS files--don't lose the last header section before a missng ~A section #326

Support header-only LAS files--don't lose the last header section before a missng ~A section #326

ejschoen commented May 2, 2020

kinverarity1 commented May 3, 2020

dcslagel commented May 7, 2020

kinverarity1 commented May 8, 2020

Support header-only LAS files--don't lose the last header section before a missng ~A section #326

Support header-only LAS files--don't lose the last header section before a missng ~A section #326

Conversation

ejschoen commented May 2, 2020

kinverarity1 commented May 3, 2020

dcslagel commented May 7, 2020

kinverarity1 commented May 8, 2020