Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels are wrongly associated with scores #43

Closed
marcverhagen opened this issue Dec 16, 2023 · 4 comments
Closed

Labels are wrongly associated with scores #43

marcverhagen opened this issue Dec 16, 2023 · 4 comments

Comments

@marcverhagen
Copy link
Contributor

marcverhagen commented Dec 16, 2023

Because

Somewhere along the line we started assigning the wrong label, probably because the way we dealt with configurations has changed several times.

The config file modeling/config/trainer.yml has:

bins:
  pre:
    slate:
      - "S"
    chyron:
      - "I"
      - "N"
      - "Y"
    credit:
      - "C"

And the classifier config in /modeling/models/20231214-193543.convnext_tiny.kfold_config.yml has

labels:
- slate
- chyron
- credit

But the method train.get_final_label_names() returns

['chyron', 'credit', 'slate', 'NEG']

and since that method is eventually used to set labels on an instance of Prediction we get a mismatch.

@marcverhagen marcverhagen self-assigned this Dec 16, 2023
@marcverhagen
Copy link
Contributor Author

This was fixed in 71b57a2 by using the labels field from the model config file. An alternative would be to change

bins:
  pre:
    slate:
      - "S"
    chyron:
      - "I"
      - "N"
      - "Y"
    credit:
      - "C"

into

bins:
  pre:
    - slate:
      - "S"
    - chyron:
      - "I"
      - "N"
      - "Y"
    - credit:
      - "C"

and update train.get_final_label_names().

@keighrim
Copy link
Member

This is probably because of this; yaml/pyyaml#110.

Trainer config:

bins:
pre:
slate:
- "S"
chyron:
- "I"
- "N"
- "Y"
credit:
- "C"

After yaml.dump-ed (exported config), keys are sorted:

bins:
pre:
chyron:
- I
- N
- Y
credit:
- C
slate:
- S

Considering added label mapping complexity when we add postbin handling in a very near future, I don't think classifier relying on the labels values from the model_config is a good idea, simply because that value is the "final labels" after pre+post binning. To that end, I think we should remove the labels key all together from export process to eliminate a big source of confusion.

configs_copy = copy.deepcopy(configs)
configs_copy['labels'] = get_final_label_names(configs)

To fix the original wrong association issue, just adding sort_keys=False argument to yaml.dump call should be enough.

@marcverhagen
Copy link
Contributor Author

Agreed on both counts. I hadn't looked closely at the new export code you had added. And indeed having both bins and labels in the model configuration is redundant. I am doing some cleanup of the code now and will include those fixes.

@marcverhagen
Copy link
Contributor Author

Fixed in 0ab78e0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants