-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics dashboard api #6
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
I've been thinking a bit about this over the weekend, and there are two slightly different sets of changes which are required. The first is on the level of the entire dashboard JSON. I think that we should merge the The second is the trickier one, and involves the contents of those dictionaries in the |
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
|
||
```python | ||
{ | ||
"prediction_type": "binary_classification" or "probabilistic_binary_classification" or "regression", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we're omitting multiclass classification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we don't have any support for it yet, but we can definitely add other prediction types in future.
"name": "y_true", | ||
"values": [0, 1, 1, 1, 0], | ||
}, | ||
"sample_weight": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is user-provided, not the one set within ExponentiatedGradient, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, perhaps it's worth documenting this with a short comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do. this is just an example of an array that we may want to pass to the metrics--since many metrics work with this kind of an argument.
"sensitive_feature gender" : { # an example feature | ||
"name": "gender", | ||
"values": [0, 1, 0, 0, 2], | ||
"value_names": ["female", "male", "non-binary"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a 'type' field in here, so things like 'prediction' and 'sensitive_feature' don't have to go into the key?
}, | ||
"cache" : [ | ||
{ | ||
"function": string, # python function name; we could either limit to fairlearn.metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use fully qualified names for sure.
"return_value": { | ||
"overall": 0.11, | ||
"by_group": { | ||
"keys": [0, 1, 2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the 'keys' necessary, if we required all categoricals to be integer-encoded?
"<array_key>" : { # the keys can be arbitrary strings; not sure we need to force any convention, but see examples below | ||
"name": string, # the name of a feature would be the feature name, of a prediction vector would be the model name | ||
"values": number[], | ||
"value_names": string[], # an optional field to encode categorical data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we also specify that extra keys (e.g. inserted by AzureML) are to be preserved.
This is something I've started cogitating on again, in the context of the AzureML - now MLFlow - integration. We do want to enable composability, but also avoid saving lots copies of Then again, we also need an API which allows users to 'mess around in a notebook' without having to set up a bunch of prerequisites. A small example of this is how the dashboard doesn't require model and sensitive feature names, but will generate them itself if invoked without them. In contrast |
No description provided.