Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.unique() method of Rotation does not return correct indexing for inversion #272

Merged
merged 10 commits into from Feb 11, 2022
Merged

.unique() method of Rotation does not return correct indexing for inversion #272

merged 10 commits into from Feb 11, 2022

Conversation

harripj
Copy link
Collaborator

@harripj harripj commented Feb 7, 2022

Rotation.unique appears to have a bug where the returned inverse map does not actually reconstruct the original data, which it should as defined the docs and NumPy docs. This is due to some internal sorting such that the returned unique rotations are no longer indexed correctly by the inverse map.

A quick solution is to remove the sorting and to add a test to ensure that the reconstructed data is the same as the original data. If there is some reason for the sorting, perhaps another solution can be found.

EDIT
The index sorting turns out is important, at least for the tests, so best to keep this functionality (not sure how I missed this before making the PR). In any case the PR has been updated to keep the original index sorting functionality, but correct the incorrect inverse map.

Progress of the PR

Minimal example of the bug fix or new feature

## before
>>> from orix.quaternion import Rotation
>>> r = Rotation.random((20,))
>>> u, inverse = r.unique(return_inverse=True)
>>> m = u[inverse] * ~r
>>> assert np.allclose(m.angle.data, 0)
False

For reviewers

  • The PR title is short, concise, and will make sense 1 year later.
  • [n/a] New functions are imported in corresponding __init__.py.
  • New features, API changes, and deprecations are mentioned in the
    unreleased section in CHANGELOG.rst.

@harripj harripj added the bug Something isn't working label Feb 7, 2022
@harripj
Copy link
Collaborator Author

harripj commented Feb 7, 2022

The failing code style seems to be as a result of a new black release https://github.com/psf/black/releases which formats to remove the spaces around the power operator ** (see psf/black#2726). I can replicate the new behaviour on my machine by updating to black 21.1.0.

@hakonanes
Copy link
Member

This looks like a good catch of a nasty bug.

I'm wondering why code using this method relies on the unique elements being sorted. I've certainly utilized this "feature", but not intentionally... It might be better to comply with NumPy without the fix, and then add a Rotation.sort() method, or something. I'll have a look.

Do you know if your fix increases memory use of Rotation.unique()? I remember the method being memory intensive before.

Thanks for looking up the reason for the code style failing, I've made an issue which I intend to fix shortly. We can continue on this PR without considering that check for now.

@harripj
Copy link
Collaborator Author

harripj commented Feb 7, 2022

Thanks for looking up the reason for the code style failing, I've made an issue which I intend to fix shortly. We can continue on this PR without considering that check for now.

I started to update this here, but have reverted the changes so you can continue with it in #273, which will make keeping track of the changes easier.

I'm wondering why code using this method relies on the unique elements being sorted. I've certainly utilized this "feature", but not intentionally... It might be better to comply with NumPy without the fix, and then add a Rotation.sort() method, or something. I'll have a look.

Do you know if your fix increases memory use of Rotation.unique()? I remember the method being memory intensive before.

I'm not sure why they need to be sorted, but the tests seem to rely on it, so I think it's easiest not to change this for now. In the fix there is an extra intermediate int array, the same size as Rotation, which is necessary to construct the inverse map. I just profiled the original code and the fix on a Rotation instance with shape (2000, 1500). There is a 23MB memory increase from this intermediate array and the same again from argsort, but it is small compared to the memory required for unique for example.

Original code:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     7    129.9 MiB    129.9 MiB           1   @profile
     8                                         def test():
     9
    10    246.7 MiB    116.9 MiB           1       rotation = Rotation.random((2000, 1500)).flatten()
    11                                             if True:
    12    498.5 MiB    251.8 MiB           1           abcd = rotation._differentiators()
    13                                             else:
    14                                                 abcd = np.stack(
    15                                                     [
    16                                                         rotation.a.data,
    17                                                         rotation.b.data,
    18                                                         rotation.c.data,
    19                                                         rotation.d.data,
    20                                                         rotation.improper,
    21                                                     ],
    22                                                     axis=-1,
    23                                                 ).round(6)
    24    796.1 MiB    297.5 MiB           1       _, idx, inv = np.unique(abcd, axis=0, return_index=True, return_inverse=True)
    25    818.9 MiB     22.9 MiB           1       idx_argsort = np.argsort(idx)
    26    841.8 MiB     22.9 MiB           1       idx_sort = idx[idx_argsort]
    27                                             # build inverse index map
    28                                             # inv_map = np.empty_like(idx_argsort)
    29                                             # inv_map[idx_argsort] = np.arange(idx_argsort.size)
    30                                             # inv = inv_map[inv]
    31    956.3 MiB    114.4 MiB           1       dat = rotation[idx_sort]
    32    956.3 MiB      0.0 MiB           1       dat.improper = rotation.improper[idx_sort]

Fix:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     7    129.6 MiB    129.6 MiB           1   @profile
     8                                         def test():
     9
    10    246.1 MiB    116.5 MiB           1       rotation = Rotation.random((2000, 1500)).flatten()
    11                                             if True:
    12    497.9 MiB    251.8 MiB           1           abcd = rotation._differentiators()
    13                                             else:
    14                                                 abcd = np.stack(
    15                                                     [
    16                                                         rotation.a.data,
    17                                                         rotation.b.data,
    18                                                         rotation.c.data,
    19                                                         rotation.d.data,
    20                                                         rotation.improper,
    21                                                     ],
    22                                                     axis=-1,
    23                                                 ).round(6)
    24    796.8 MiB    298.9 MiB           1       _, idx, inv = np.unique(abcd, axis=0, return_index=True, return_inverse=True)
    25    819.7 MiB     22.9 MiB           1       idx_argsort = np.argsort(idx)
    26    842.6 MiB     22.9 MiB           1       idx_sort = idx[idx_argsort]
    27                                             # build inverse index map
    28    842.6 MiB      0.0 MiB           1       inv_map = np.empty_like(idx_argsort)
    29    865.5 MiB     22.9 MiB           1       inv_map[idx_argsort] = np.arange(idx_argsort.size)
    30    865.5 MiB      0.0 MiB           1       inv = inv_map[inv]
    31    979.9 MiB    114.4 MiB           1       dat = rotation[idx_sort]
    32    979.9 MiB      0.0 MiB           1       dat.improper = rotation.improper[idx_sort]

@pc494 pc494 changed the title Unique inverse bug in Rotation .unique() method of Rotation does not return correct indexing for inversion Feb 7, 2022
@harripj harripj requested a review from pc494 February 7, 2022 13:33
@pc494
Copy link
Member

pc494 commented Feb 7, 2022

I spent some time on the unique code a while back and I recall deciding the sorting did serve a purpose, what I don't recall, but I think it's worth leaving it like that if we can.

@harripj
Copy link
Collaborator Author

harripj commented Feb 7, 2022

Thanks @pc494, sounds like a good idea to leave it as is and just correct the inverse map.

@hakonanes
Copy link
Member

I started to update this here, but have reverted the changes so you can continue with it in #273, which will make keeping track of the changes easier.

Please feel free to make the separate PR with the Black updates, since you've already done it in this branch :)

I'm not sure why they need to be sorted, but the tests seem to rely on it, so I think it's easiest not to change this for now.

I agree, sounds good.

In the fix there is an extra intermediate int array, the same size as Rotation, which is necessary to construct the inverse map. I just profiled the original code and the fix on a Rotation instance with shape (2000, 1500). There is a 23MB memory increase from this intermediate array and the same again from argsort, but it is small compared to the memory required for unique for example.

Looks acceptable. I'm curious, with which software did you profile your code?

Copy link
Member

@hakonanes hakonanes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy since the tests pass and all the results in the user guide notebooks look like expected. Thanks for fixing this, @harripj.

@pc494
Copy link
Member

pc494 commented Feb 7, 2022

I'm planning to review this too, but it might take a day or two

@harripj
Copy link
Collaborator Author

harripj commented Feb 7, 2022

Please feel free to make the separate PR with the Black updates, since you've already done it in this branch :)

Will do!

I'm curious, with which software did you profile your code?

I used memory_profiler, it provides a clear output which I think is easy to digest!

@hakonanes hakonanes added this to the v0.8.1 milestone Feb 7, 2022
@hakonanes hakonanes mentioned this pull request Feb 7, 2022
7 tasks
@hakonanes
Copy link
Member

I've merged #274, so bringing master into this branch should make the code style check pass.

I used memory_profiler, it provides a clear output which I think is easy to digest!

Thanks, will start to use this myself!

@harripj
Copy link
Collaborator Author

harripj commented Feb 8, 2022

I will just add a line to the changelog now.

@hakonanes
Copy link
Member

Good, I should have requested that during review, my bad.

Copy link
Member

@pc494 pc494 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this, merging

@pc494 pc494 merged commit f1c1096 into pyxem:master Feb 11, 2022
@harripj harripj deleted the unique_inverse_bug branch February 18, 2022 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants