Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add pre-processing to PMML #16199

Open
wendycwong opened this issue May 16, 2024 · 5 comments
Open

How to add pre-processing to PMML #16199

wendycwong opened this issue May 16, 2024 · 5 comments
Assignees

Comments

@wendycwong
Copy link
Contributor

wendycwong commented May 16, 2024

A customer wants to add simple pre-processing to XGBoost mojo. However here is the trick:

  1. Customer has old mojo with earlier H2O-3 version;
  2. customer converted mojo to PMML version;

What I know is that we can add preprocessing to current model and use a flag to enable it as it would be disabled by default.

This is do-able. See comments below.

@wendycwong wendycwong self-assigned this May 16, 2024
@wendycwong wendycwong changed the title Investigate if generic model can be saved as mojo again. Investigate old mojo being loaded with new h2o-3 version and add new mojo features this way. May 25, 2024
@wendycwong
Copy link
Contributor Author

According to @narasimhard : Customer already has written a library to translate H2O-3 mojo to PMML:

They are currently using a JAVA env to convert here is a reference: https://github.com/jpmml/jpmml-h2o?tab=readme-ov-file#the-java-side-of-operations

java -jar pmml-h2o-example/target/pmml-h2o-example-executable-1.2-SNAPSHOT.jar --mojo-input mojo.zip --pmml-output mojo.pmml
10:45
It using the JAR pmml-h2o-example-executable-1.2-SNAPSHOT.jar

Using Intellij, I was able to generate pmml from h2o-3 mojo using their org.jpmml.h2o.example.Main.java.

@wendycwong
Copy link
Contributor Author

My idea here is to add more arguments to Main.java to if a specific argument is present: --fill-missing-values, we will generate PMML file with the preprocessing enabled.

From my reading on PMML, it is very easy to add missing value replacement. You need to add it to the mining schema.

Screenshot 2024-06-03 at 7 26 17 AM

@wendycwong
Copy link
Contributor Author

You can also look at the overview of variable scoping in PMML:
Screenshot 2024-06-03 at 7 27 16 AM

@wendycwong wendycwong changed the title Investigate old mojo being loaded with new h2o-3 version and add new mojo features this way. How to add pre-processing to PMML Jun 3, 2024
@wendycwong
Copy link
Contributor Author

wendycwong commented Jun 7, 2024

This really can be done!!!

New info:

in GLMMojoModelBaseConverter.java, lines 86-111, it did ask for missing value treatment.

The ImputerUtil.java is the place to add missing value treatment!!! It is called in ln 103.

Inside ImputerUtil.java, line 40 is the way to add what replacement value to add if we want to deal with missing values. This will add content to the decorator field. Looks like this is all we need to do.

  1. Add a new argument to main.java, if enabled, will gather specific values to replace missing values with;
  2. May need to add a field (boolean specialMissingValueProcessing)to XGBoostMojoModelConverter to see if we need to add special missing value processing;
  3. Inside XGBoostMojoModelConverter.java: add missing value treatments that replace with special values instead of mean/mode if the specialMissingValueProcessing is set;
  4. If basically means inside the XGBoostMojoModelConverter.java, need to add the missing value treatment to the toMojoModelSchema as in GLM. However, you need to use MissingValueTreatmentMethod.AS_VALUE since you are doing something special and not using the mean or mode.

@wendycwong
Copy link
Contributor Author

line 85 of converter.java is where the missing value treatment is added to the encoder as decorator.

schema = toMojoModelSchema(schema); // goto XGBoost....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant