Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapArrayReader Does Not Understand Nesting #1699

Closed
tustvold opened this issue May 13, 2022 · 2 comments · Fixed by #2500
Closed

MapArrayReader Does Not Understand Nesting #1699

tustvold opened this issue May 13, 2022 · 2 comments · Fixed by #2500
Labels

Comments

@tustvold
Copy link
Contributor

Describe the bug

The logic within MapArrayReader will not work if the MapArray is itself nested, in particular:

  • It does not expose the def_level / rep_levels of its children
  • It assumes that a repetition level of 0 corresponds to a new map entry

To Reproduce

A nullable StructArray with a child MapArray will likely result in an error, similarly a ListArray with a child MapArray will likely error

Expected behavior

This should be properly supported

@frolovdev
Copy link
Contributor

frolovdev commented Jun 17, 2022

@tustvold maybe I get your idea wrong but I can't reproduce it in any ways, trying with such incoming data

message table {
            repeated group my_group_map {
                REQUIRED BYTE_ARRAY kek;
                optional group map (MAP) {
                    repeated group key_value {
                        REQUIRED BYTE_ARRAY key;
                        OPTIONAL group nested_map (MAP) {
                            repeated group key_value {
                                REQUIRED BYTE_ARRAY key;
                                REQUIRED BYTE_ARRAY value;
                            }
                        }
                    }
                }
            }
            
        }

@tustvold
Copy link
Contributor Author

tustvold commented Jun 18, 2022

Hi @frolovdev. I'm afraid I'm away from a computer for the next few days, but taking a cursory look, lines like

        let entry_len = rep_levels.iter().filter(|level| **level == 0).count();

Are not able to correctly handle the case of a MapArrayReader inside another list, as it starts a new run when the repetition level is 0, as opposed to the repetition level of the map array (which will only be 0 if it is at the root).

If you compare the logic with the logic in ListArrayReader, that may help, as a MapArray is just a special case of a ListArray and the levels handling logic should be identical.

One way to fix this, might be to remove MapArrayReader and just add a teeny bit of additional logic to ListArrayReader to handle MapArray. We do something similar for the level computation on the writer side.

tustvold added a commit to tustvold/arrow-rs that referenced this issue Aug 18, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Aug 18, 2022
tustvold added a commit that referenced this issue Aug 18, 2022
* Fix MapArrayReader (#2484) (#1699) (#1561)

* Fix comments

* Review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants