Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the ndcg metric [WIP] #2632

Draft
wants to merge 38 commits into
base: master
Choose a base branch
from
Draft

Conversation

kamalojasv181
Copy link
Contributor

@kamalojasv181 kamalojasv181 commented Jul 23, 2022

Related #2631

Description: This is the implementation [WIP] for the NDCG metric.

Check list:

  • Create a basic implementation.
  • Handle rating ties.
  • Write and parameterize tests.
  • Add comments about its use
  • Add exponential implementation
  • Add Docs

@github-actions github-actions bot added the module: metrics Metrics module label Jul 23, 2022
@kamalojasv181 kamalojasv181 marked this pull request as draft July 23, 2022 16:45
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalojasv181 thanks for the PR. I left few comments on few points, but I haven't yet explored the code to compute ndcg.

Can you please write few tests vs scikit-learn as ref ?

ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the update @kamalojasv181 !
I left few other comments on the implementation and the API.

Let's also start working on docs and tests

ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/ndcg.py Outdated Show resolved Hide resolved
@kamalojasv181
Copy link
Contributor Author

Thanks for all the feedback. I will revert with a pull request addressing everything!

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jul 24, 2022

Thanks for all the feedback. I will revert with a pull request addressing everything!

You can just continue working with this pull request, no need to revert anything.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @kamalojasv181 !
I have few other code update suggestions.

tests/ignite/metrics/test_ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/recsys/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/recsys/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/recsys/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/recsys/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/recsys/ndcg.py Outdated Show resolved Hide resolved
ignite/metrics/recsys/ndcg.py Show resolved Hide resolved
tests/ignite/metrics/test_ndcg.py Show resolved Hide resolved
@kamalojasv181
Copy link
Contributor Author

@vfdev-5 there are a bunch of things I did in this commit:

  1. Put the data on GPU for tests
  2. Replaced torch.add with +.
  3. Changed the naming and ordering as per ignite as (y_pred, y_true ....)
  4. Added function to handle ties and wrote corresponding tests(A part of it is un-vectorized, TODO: Think of implementation to vectorise it.)
  5. Added check for log_base > 0 and not equal to 1.
  6. Added tests for log_base and exponential.
  7. Separated the _ndcg_smaple_score and _dcg_smaple_score from the NDCG class.
    NOTE: log_base tests could have been done with the other tests, but I put them separately so that if sklearn changes their code, then we know about it (as the tests would be failing)

If there is anything else, lemme know before I can finally add some comments against the class and documentation.

ignite/metrics/recsys/ndcg.py Show resolved Hide resolved
ignite/metrics/recsys/ndcg.py Outdated Show resolved Hide resolved
tests/ignite/metrics/test_ndcg.py Outdated Show resolved Hide resolved
tests/ignite/metrics/test_ndcg.py Outdated Show resolved Hide resolved
Comment on lines +52 to +54
discounted_gains = torch.tensor(
[_tie_averaged_dcg(y_p, y_t, discount_cumsum, device) for y_p, y_t in zip(y_pred, y_true)], device=device
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is no way to make it vectorized == without for-loop ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havent checked yet. For now I have added this implementation. It's a TODO

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 29, 2022

@kamalojasv181 thanks for the updates, I left few nits.
Next step is to write few tests for distributed configuration. @puhuk please provide few guidelines here on how to write them. Thanks !

@puhuk
Copy link
Contributor

puhuk commented Aug 30, 2022

@vfdev-5 @kamalojasv181

Belows are checklist for test in ddp configuration

  • Data generating with different random seed per each process
    • torch.random_seed(12 + rank + i)
  • Generated data size per each rank should be n_iters * batch_size to update with its batch_size in update()
    • y_true = torch.randint(0, n_classes, size=(n_iters * batch_size,))
  • Update data with batch style in update()
    • y_pred[i * batch_size : (i+1) * batch_size]
  • Check generated data from each processes are gathered for computing (after running Engine)
    •    engine.run(data=data, max_epochs=n_epochs)
         y_true = idist.all_gather(y_true)
         y_preds = idist.all_gather(y_preds)
  • Check calculated metric is same as reference
    • pytest.approx(calculated_value) == reference_value

The whole process should be seem like (Or you can refer from ignite/tests/ignite/metrics/test_accuracy.py)

def _test_distrib_integration_list_of_tensors_or_numbers(device):

acc = Accuracy(is_multilabel=True, device=metric_device)

# data generation
torch.manual_seed(12 + rank)
y_true = torch.randint(0, n_classes, size=(n_iters * batch_size,)).to(device)
y_preds = torch.rand(n_iters * batch_size, n_classes).to(device)

# update each batch
def update(engine, i):
            return (
                y_preds[i * batch_size : (i + 1) * batch_size, :],
                y_true[i * batch_size : (i + 1) * batch_size],
            )

# Initialize Engine
engine = Engine(update)
acc = Accuracy(device=metric_device)
acc.attach(engine, "acc")

data = list(range(n_iters))
engine.run(data=data, max_epochs=n_epochs)

# all gather data
y_pred = idist.all_gather(y_pred)
y = idist.all_gather(y)

res = engine.state.metrics["acc"]

# calculate reference value with scikit learn and compare
true_res = sklearn.metrics.accuracy_score(y_true.cpu().numpy(), torch.argmax(y_preds, dim=1).cpu().numpy())
assert pytest.approx(res) == true_res

Comment on lines 163 to 166
return (
[v for v in y_preds[i * batch_size : (i + 1) * batch_size, ...]],
[v for v in y_true[i * batch_size : (i + 1) * batch_size]],
)
Copy link
Collaborator

@vfdev-5 vfdev-5 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalojasv181 Why do you return tuple of 2 lists instead of a tuple of two tensors ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken inspiration from the code provided by @puhuk . Here each element of the list is a batch. We feed our engine one batch at a time; hence using a list is also ok. To maintain uniformity across the code, I have kept it this way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, he provided a wrong link. Yes, in accuracy we also a test on list of tensors and numbers but this is untypical. Here is a typical example

def _test_distrib_integration_multilabel(device):
rank = idist.get_rank()
def _test(n_epochs, metric_device):
metric_device = torch.device(metric_device)
n_iters = 80
batch_size = 16
n_classes = 10
torch.manual_seed(12 + rank)
y_true = torch.randint(0, 2, size=(n_iters * batch_size, n_classes, 8, 10)).to(device)
y_preds = torch.randint(0, 2, size=(n_iters * batch_size, n_classes, 8, 10)).to(device)
def update(engine, i):
return (
y_preds[i * batch_size : (i + 1) * batch_size, ...],
y_true[i * batch_size : (i + 1) * batch_size, ...],
)
engine = Engine(update)
acc = Accuracy(is_multilabel=True, device=metric_device)
acc.attach(engine, "acc")
data = list(range(n_iters))
engine.run(data=data, max_epochs=n_epochs)
y_true = idist.all_gather(y_true)
y_preds = idist.all_gather(y_preds)
assert (
acc._num_correct.device == metric_device
), f"{type(acc._num_correct.device)}:{acc._num_correct.device} vs {type(metric_device)}:{metric_device}"
assert "acc" in engine.state.metrics
res = engine.state.metrics["acc"]
if isinstance(res, torch.Tensor):
res = res.cpu().numpy()
true_res = accuracy_score(to_numpy_multilabel(y_true), to_numpy_multilabel(y_preds))
assert pytest.approx(res) == true_res
metric_devices = ["cpu"]
if device.type != "xla":
metric_devices.append(idist.device())
for metric_device in metric_devices:
for _ in range(2):
_test(n_epochs=1, metric_device=metric_device)
_test(n_epochs=2, metric_device=metric_device)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 31, 2022

@sadra-barikbin can you check why tests/ignite/metrics/test_precision.py::test_binary_input[None] is failing systematically ?

E       assert array([1.]) == approx([1.0 ± 1.0e-06, 0.0 ± 1.0e-12])
E         Impossible to compare arrays with different shapes.
E         Shapes: (2,) and (1,)

tests/ignite/metrics/test_precision.py:130: AssertionError

@puhuk
Copy link
Contributor

puhuk commented Dec 7, 2022

@kamalojasv181 Hi, do you need any help to finalize this PR? Please feel free to let me and @vfdev-5 know :)

@ili0820
Copy link

ili0820 commented Aug 27, 2023

Any updates on this ? If it is not finished yet, I'd love to contribute @vfdev-5

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 27, 2023

Yes, this PR is not finished, unfortunately. @ili0820 if you can help with getting it landed it would be great!

@ili0820
Copy link

ili0820 commented Aug 30, 2023

after reviewing all the comments,
As far as I understand, the testing part is the last and only problem that has been not resolved yet,
this seems resolved and
this is the problem.
Is this correct? @vfdev-5

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 30, 2023

after reviewing all the comments, As far as I understand, the testing part is the last and only problem that has been not resolved yet, this seems resolved and this is the problem. Is this correct? @vfdev-5

@ili0820 yes, it remains few things here:

In case if you are not familiar with DDP, please check: https://pytorch-ignite.ai/tutorials/advanced/01-collective-communication/. As for testing best practices, we would like to use now distributed fixture as here:

def test_distrib_integration(distributed, metric_device):

if you have any questions, you can reach out to us on Discord in #start-contributing channel.

@ili0820 ili0820 mentioned this pull request Aug 31, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: metrics Metrics module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants