Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to get the dictionary index and values array reference #672

Closed
alamb opened this issue Aug 7, 2021 · 2 comments
Closed

Add a way to get the dictionary index and values array reference #672

alamb opened this issue Aug 7, 2021 · 2 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Aug 7, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
To work with the value of a dictionary array at a particular index, the following pattern shows up many times in the arrow-rs and datafusion codebase:

    fn do_something <K: ArrowDictionaryKeyType>(
        array: &ArrayRef,
        index: usize,
    ) -> Result<Self> {
        let dict_array = array.as_any().downcast_ref::<DictionaryArray<K>>().unwrap();

        // look up the index in the values dictionary
        let keys_col = dict_array.keys();
        let values_index = keys_col.value(index).to_usize().ok_or_else(|| {
            DataFusionError::Internal(format!(
                "Can not convert index to usize in dictionary of type creating group by value {:?}",
                keys_col.data_type()
            ))
        })?;
        
        // do actual work with dict_array.values and values_index
        // ...
    }
 }

Repeating this code "find the index" code is tedious

Describe the solution you'd like

Add a function such as the following on to DictionaryArray (would love suggestions about better names):

impl DictionaryArray<K: ArrowDictionaryKeyType> {
  
  // return the index into the dictionary values for array@index as well
  // as the dictionary values
  #[inline]
  fn dict_value(
    self: 
    index: usize,
  ) -> Result<(&ArrayRef, Option<usize>)>
  {
    // look up the index in the values dictionary
    let keys_col = self.keys();
    if !keyd_col.is_valid(index) {
      return Ok((self.values(), None));
    }
    let values_index = keys_col.value(index).to_usize().ok_or_else(|| {
        DataFusionError::Internal(format!(
            "Can not convert index to usize in dictionary of type creating group by value {:?}",
            keys_col.data_type()
        ))
    })?;

    Ok((self.values(), Some(values_index)))
}

Describe alternatives you've considered

Additional context

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Aug 7, 2021
@alamb
Copy link
Contributor Author

alamb commented Aug 9, 2021

Note the bug in the above code -- it does not handle NULLs correctly!

@tustvold
Copy link
Contributor

Closed by a combination of #2297 which added a strongly typed dictionary array accessor, and #2793 which removed the need for checked dictionary key conversions (they are already validated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants