New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easily retrieve mapping from OrdinalEncoder #28891
Comments
It's possible to construct the category maps via list and dict comprehensions: category_maps = [
{cat_name: cat_idx for cat_idx, cat_name in enumerate(cat_list)}
for cat_list in oe.categories_
] or as a dict of dicts to leverage feature names when available: category_maps = {
feature_name: {cat_name: cat_idx for cat_idx, cat_name in enumerate(cat_list)}
for feature_name, cat_list in zip(oe.feature_names_in_, oe.categories_)
} Maybe we could expose a public property (or two, in the presence of the feature names) on the Precomputing those reverse mappings is also possible but redundant. Another alternative would to use methods (instead of properties) to generate those reverse mappings on depend, maybe with optional argument to do it only for a specific set of features. |
I am not sure how providing encoding maps would help answer that question: what would be the "key" in the dict for infrequent or unseen categories? But we could expose dedicated fitted attributes to make the resulting encoding of missing and infrequent/unseen categories explicit for each feature. EDIT: such an attribute already exists for infrequent categories: OrdinalEncoder.infrequent_categories_ |
Thank you for your comments! I will try to argue more about my motivation for each question:
2.1. What would be the "key" for unseen categories - I think it is either some predefined "_unseen_category_hope_not_in_user_data", which is ugly, or possibly some new attribute like .unseen_encoded_value 2.2. OrdinalEncoder.infrequent_categories_ - while it is very useful for checking what categories turned out to be infrequent, it still doesnt answer the question into which value they are being encoded. |
Describe the workflow you want to enable
It would be nice to be able to easily retrieve mapping in the form of a dictionary
Currently .categories_ attribute only retrieves list of seen categories, without mapping.
This becomes especially important with options to handle missing or infrequent values, which are leading to questions "What value does infrequent categories map to?" and so on.
Describe your proposed solution
Add .categories_map_ attribute to OrdinalEncoder
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: