Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add haversine cdist #30

Open
MuellerSeb opened this issue Apr 21, 2020 · 7 comments
Open

Add haversine cdist #30

MuellerSeb opened this issue Apr 21, 2020 · 7 comments
Labels

Comments

@MuellerSeb
Copy link

MuellerSeb commented Apr 21, 2020

Hey there,

nice package! I was wondering, if you could implement a routine to compute a pairwise distance matrix like scipy.spatial.distance.cdist does.

Cheers,
Sebastian

@MuellerSeb
Copy link
Author

@jdeniau
Copy link
Member

jdeniau commented Apr 21, 2020

Hi,

Thanks !

If your issue is with performance, #26 added the haversine_vector function to optimize performance.

You can see how to use it in the documentation

If that's not performance-related, can you explain why you need this ?

@MuellerSeb
Copy link
Author

cdist is constructing a n times m matrix for two input arrays with n and m points respectively. The vector version does not do this.

@jdeniau
Copy link
Member

jdeniau commented Apr 22, 2020

OK so let me rephrase to make sure I understand your needs :

You have a list of geopoints, and you need to calculate the distance between every points :

pseudo-code :

LYON = (45.7597, 4.8422)
PARIS = (48.8567, 2.3508)
NEW_YORK = (40.7033962, -74.2351462)

calculate_each_distances([LYON, PARIS, NEW_YORK])

and the output should be something like that :

[
  [    0,  392, 6163],
  [  392,    0, 5853],
  [ 6163, 5853,    0],
]

Is that it ?

@MuellerSeb
Copy link
Author

Almost. This is the pdist functionality.

One step further is, to provide two lists of points for rows and cols in the resulting matrix.

@Noezor
Copy link
Contributor

Noezor commented Feb 20, 2022

This already exists, see

unit = Unit.KILOMETERS
expected = [
[EXPECTED_LYON_PARIS[unit], EXPECTED_LONDON_PARIS[unit]],
[EXPECTED_LYON_NEW_YORK[unit], EXPECTED_LONDON_NEW_YORK[unit]]
]
assert_allclose( # See https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html#numpy.testing.assert_allclose
haversine_vector([LYON, LONDON], [PARIS, NEW_YORK], unit, comb=True),
expected
)

However, I believe the comb parameter is not clear from the docstring (one needs to read the implementation).
@jdeniau do you mind if I rework the docstring a bit?

@jdeniau
Copy link
Member

jdeniau commented Feb 27, 2022

@Noezor help yourself! Every improvement in the documentation is helpful 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants