Skip to content

kdubovikov/blazing-encoders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔥 blazing-encoders

build

Blazing-fast categorical feature encoding.

This is a Python library written in Rust that allows you to encode categorical variables using smoothed mean target. In particular, this algorithm is used: A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems.

Installation

pip install blazing_encoders

Usage

from blazing_encoders import TargetEncoder_f64
import numpy as np

if __name__ == '__main__':
    np.random.seed(42)
    data = np.random.randint(0, 10, (5, 5)).astype('float')
    target = np.random.rand(5)

    encoder = TargetEncoder_f64.fit(data, target, smoothing=1.0, min_samples_leaf=1)
    encoded_data = encoder.transform(data)
    print(encoded_data)

You can use two of the available classes: TargetEncoder_f64, and TargetEncoder_f32 to control the balance between memory usage and numerical precision of your target encoding process.

Underneath, the library will share as much memory as possible so that overhead should be minimal. Also, it will parallelize target encoding computation so that the overall process will complete much faster.

Documentation and examples

You can find out how to use the library here. Also, you can read more at this blog post.

About

Blazing-fast categorical feature encoding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages