Skip to content

Parquet Treats Embedded Arrow Schema as Authoritative #1663

Closed
@tustvold

Description

@tustvold

Describe the bug

As pointed out by @jorisvandenbossche on https://issues.apache.org/jira/browse/ARROW-16184 the embedded arrow schema should not be treated as authoritative for a parquet file, and instead should be used in a purely advisory capacity, for example, to aid in inferring list offsets, etc...

In particular

[It should be used as] a description of the original Arrow schema, and not for a description of what is in the Parquet file / for the Parquet schema.

This behavior appears to have been introduced by @carols10cents in apache/arrow#8354 and apache/arrow#8330, as a mechanism to round-trip types correctly. We should ensure the solution preserves this where applicable

To Reproduce

See #1459

Expected behavior

The embedded arrow schema should not be relied upon to be authoritative

Additional context

Working on as part of #1655

Metadata

Metadata

Assignees

Labels

bugparquetChanges to the parquet crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @alamb@tustvold

    Issue actions

      Parquet Treats Embedded Arrow Schema as Authoritative · Issue #1663 · apache/arrow-rs