Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

ExternalFormat("min value of a page is required") #1146

Closed
thesmartwon opened this issue Jul 7, 2022 · 6 comments
Closed

ExternalFormat("min value of a page is required") #1146

thesmartwon opened this issue Jul 7, 2022 · 6 comments
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@thesmartwon
Copy link

thesmartwon commented Jul 7, 2022

Not sure if this is an arrow2 or parquet2 problem (or both) but when I write from a PSV with comma separated fields like this:

let arr = arr.as_mut_any().downcast_mut::<MutableListArray<i32, MutablePrimitiveArray::<u8>>>().unwrap();
let val = match r {
	"" => None,
	v => {
		let vals = v
			.split(",")
			.map(|v| Some(v.parse::<u8>().unwrap()));
		Some(vals)
	}
};
arr.try_push(val).expect("pushed");

I get this panic when calling arrow2::io::parquet::write::FileWriter<File>.end():

ExternalFormat("min value of a page is required")

I don't get the panic if I instead try:

let arr = values[i].as_mut_any().downcast_mut::<MutableListArray<i32, MutablePrimitiveArray::<u8>>>().unwrap();
let val = r
	.split(",")
	.map(|v| match v {
		"" => None,
		v => Some(v.parse::<u8>().unwrap())
	});
arr.try_push(Some(val)).expect("pushed");

so I suspect it's a problem with the if null_count as usize == spec.num_values check in parquet2.

Edit: These files also can't be read :(

ExternalFormat("Invalid Parquet file. Corrupt footer")
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Jul 9, 2022
@jorgecarleitao
Copy link
Owner

Do you have the list that you tried to write? I am trying to repro this but I am not being able to.

E.g. adding

#[test]
fn list_utf8_nullable() -> Result<()> {
    let data = vec![
        Some(vec![Some("a".to_string())]),
        None,
        Some(vec![None, Some("b".to_string())]),
        Some(vec![]),
        Some(vec![Some("c".to_string())]),
        None,
    ];
    let mut array =
        MutableListArray::<i32, _>::new_with_field(MutableUtf8Array::<i32>::new(), "item", true);
    array.try_extend(data).unwrap();
    list_array_generic(true, array.into())
}

#[test]
fn list_int_nullable() -> Result<()> {
    let data = vec![
        Some(vec![Some(1)]),
        None,
        Some(vec![None, Some(2)]),
        Some(vec![]),
        Some(vec![Some(3)]),
        None,
    ];
    let mut array = MutableListArray::<i32, _>::new_with_field(
        MutablePrimitiveArray::<i32>::new(),
        "item",
        true,
    );
    array.try_extend(data).unwrap();
    list_array_generic(true, array.into())
}

to https://github.com/jorgecarleitao/arrow2/blob/main/tests/it/io/parquet/mod.rs#L1396 still passes.

Was this using the latest main or the latest release (0.12.0)?

@jorgecarleitao jorgecarleitao added investigation Issues or PRs that are investigations. Prs may or may not be merged. and removed bug Something isn't working labels Jul 13, 2022
@tjwilson90
Copy link

I'm also seeing this problem. I believe it only occurs if every entry in the ListArray is empty. In this case, the values array in the ListArray is empty and thus has no min_value.

This occurs with both the latest main and version 0.12.0.

@jorgecarleitao jorgecarleitao added bug Something isn't working and removed investigation Issues or PRs that are investigations. Prs may or may not be merged. labels Jul 19, 2022
@letitcrash
Copy link

letitcrash commented Aug 22, 2022

in my case I had an array filled with empty strings vec!["","",""] so after setting values specifically as None value of Option type instead, it worked

@jorgecarleitao
Copy link
Owner

Being fixed on jorgecarleitao/parquet2#193

@nlhepler
Copy link

Hi @jorgecarleitao, looks like you landed this fix in parquet2 but haven't published an updated version of parquet2 that includes it (or an arrow2 that depends on said version, thus also including a fix). Gently nudging as a new release with this fix would be most greatly appreciated!

@sundy-li
Copy link
Collaborator

it's in progress #1304

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

No branches or pull requests

6 participants