Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 error when attempting to load Hipparcos data #454

Closed
resistor opened this issue Sep 24, 2020 · 16 comments
Closed

404 error when attempting to load Hipparcos data #454

resistor opened this issue Sep 24, 2020 · 16 comments

Comments

@resistor
Copy link

When I do load.open(hipparcos.URL) on a fresh install of skyfield, I'm getting this error:

OSError: cannot download http://cdsarc.u-strasbg.fr/ftp/cats/I/239/hip_main.dat.gz because HTTP Error 404: Not Found

@brandon-rhodes
Copy link
Member

@resistor — I can confirm that I get the error as well, using an independent tool that's not Python-related:

$ curl --head http://cdsarc.u-strasbg.fr/ftp/cats/I/239/hip_main.dat.gz
HTTP/1.1 404 Not Found
Date: Thu, 24 Sep 2020 03:09:44 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Type: text/html; charset=iso-8859-1

I have emailed the link that the FTP site advertises for assistance, and will let you know if I hear anything back! In the meantime, feel free to search for other sources for the data file, and let me know if you find alternative working URLs for it. Thanks!

@gilleslandais
Copy link

Hi -
this FTP URL is indeed not persistent, because tables in VizieR can be gziped or not - (and it can evolve)

it is better to query the VizieR table using URL: (for acceptable volumetry)
http://vizier.u-strasbg.fr/viz-bin/asu-txt?-source=I/239/hip_main&-out.max=unlimited
or
https://cdsarc.unistra.fr/viz-bin/nph-Cat/text?I/239/hip_main.dat (independent of gzipped storage)

However, in the long term, the more secure and persistent URL to access VizieR tables are the VizieR DOI (if exists - eg: 10.26093/cds/vizier.51470007) or the landing page https://vizier.unistra.fr/viz-bin/cat/I/239

@brandon-rhodes
Copy link
Member

@gilleslandais — Do you have any information about why the file was unzipped?

I am looking at your two links. Am I correct that they seem to present the data in a different format than the one documented for the Hipparcos catalog at the following link? Traditionally the file is |-delimited (pipe-delimited). I ask because Skyfield needs to support both users who already have hip_main.dat.gz downloaded, as well as those who will now need to be emergency-switched to a new URL, and if the two URLs give different formats for the data, then not only must I spend time to write a new parser, but Skyfield would need to dynamically determine whether the file was in an old or new format. Thanks for any further information you can provide!

https://heasarc.gsfc.nasa.gov/W3Browse/star-catalog/hipparcos.html

@brandon-rhodes
Copy link
Member

brandon-rhodes commented Sep 24, 2020

@resistor — I may have found an alternative URL that we could use for the moment, to get the library working again. Could you try this instead?

ftp://dbc.nao.ac.jp/DBC/NASAADC/catalogs/1/1239/hip_main.dat.gz

@brandon-rhodes
Copy link
Member

(And for my own records when I return to this issue in the future: the hip_main.dat in this directory is at least in the traditional format, though uncompressed.)

ftp://cdsarc.u-strasbg.fr/pub/cats/I/239/

@brandon-rhodes
Copy link
Member

(It also seems to still exist at Harvard, though it looks like it might be a simple mirror of Vizier, in which case the file will disappear as soon as the next mirror update is complete:)

http://vizier.cfa.harvard.edu/ftp/cats/i/239/hip_main.dat.gz

@gilleslandais
Copy link

gilleslandais commented Sep 24, 2020

There aren't any reasons why it is unzipped: but it is possible - (sorry for this response)
I discourage you to use the ftp access if you need a persistent URL.

Just some clarifications on the tables and their format in VizieR:

The CDS provides the "original table" and the enriched VizieR table.

  • the "original tables" (available by FTP or HTTP) are stored in ASCII-CDS format. This format comes from FORTRAN: the tables require the byte-by-byte section of the ReadMe catalogue (the same format is used ifor MRT table in the AAS journals)
  • the VizieR tables contain additional information (like added columns to Simbad for example). The VizieR service provides the tables in different format: TSV, ASCII-CDS, FITS, VOTable...

The ReadMe and the byte-by-bytes section describes the "original data" ONLY.

The FORTRAN format is a blank-aligned format, the | are not required. If you use astropy , you can query original table using the package https://docs.astropy.org/en/stable/api/astropy.io.ascii.Cds.html

@brandon-rhodes
Copy link
Member

the "original tables" (available by FTP or HTTP) are stored in ASCII-CDS format. This format comes from FORTRAN: the tables require the byte-by-byte section of the ReadMe catalogue (the same format is used ifor MRT table in the AAS journals)

Ah, thank you, @gilleslandais, for that clarification. So the file:

http://cdsarc.u-strasbg.fr/ftp/cats/I/239/hip_main.dat

— is indeed in the original format and could be used by Skyfield’s parser without modification. But, it would disappear and become a broken URL in the future if someone were to gzip the file again like it used to be.

Whereas, the table-query URLs that provide the data in an alternative enhanced format should always work, but Skyfield would not at this point be able to parse their output.

I may move to the more modern table-download format someday, but as I leave for vacation Saturday, I should not be attempting any large changes to Skyfield this week. I will probably try switching to the dbc.nao.ac.jp FTP site as a quick fix that I can release today, then think about a longer-term solution when I am back at the keyboard after vacation.

Thanks for replying quickly in the middle of your workday, @gilleslandais! It is amazing how quickly collaboration can happen between continents thanks to technology. It used to be that if an American had a question about a star catalog, they had to wait for the mail to be carried across the ocean on a wooden boat!

brandon-rhodes added a commit that referenced this issue Sep 24, 2020
This old experimental `NamedStar` API from 2015, which was never
documented, is now broken because the Hipparcos catalog is not (for the
moment) being downloaded in compressed form (see #454).  Rather than
delay today’s release to fix `NamedStar`, let’s remove it.

This is a dicey and uncharacteristic decision for me.  I usually pride
myself on not breaking anything that appears in a file like `api.py` and
that someone might have started using through their own research of the
code.  But in this case, with the function marked as deprecated for
several years, I am going to chance it.

Thanks to the original author, though, as the experiment led eventually
to the modern approach of loading stars using a Pandas Dataframe!

The actual dictionary of named stars is retained, per promises in #304.
@brandon-rhodes
Copy link
Member

I have just released Skyfield 1.28 which (fingers crossed) makes a successful switch to the new uncompressed URL. I will keep this issue open for the next few weeks, though, in the hope that I have time to learn about the other better-support table formats in which VizieR can return data tables.

In the meantime, I'll move this to being a feature request, as this issue will now track the new code involved and no longer this pre-1.28 breakage of the URL.

@brandon-rhodes brandon-rhodes changed the title 404 error when attempting to load Hipparcos data Switch from Hipparcos raw file to table download Sep 25, 2020
@Bernmeister
Copy link

Bernmeister commented Sep 25, 2020

@brandon-rhodes Crazy idea but why not put the onus of sourcing data files (hip_main.dat.gz and whatever planets.bsp you feel is most recent/correct) on to the end user (API caller)?

Make it clear in the documentation where to get these files (including multiple/alternative sites). You could even extend this to the timing component (cannot remember exactly what Skyfield downloads to maintain time as such but you referred to ∆T in a recent issue).

In short, it's really nice for Skyfield to download all this for me, but if Skyfield can be viewed as a library/engine, then the data should come from the caller. I do this anyway substituting de438.bsp for de421.bsp (and I even trim down de438.bsp to a smaller file called planets.bsp).

I know this is a can of worms, but this issue would only affect a new caller (customer) of the library. The rest of us would already have downloaded a local copy of hip_main.dat.gz. All you'd need to have done is change the documentation to alert users the URL had changed to no longer have the .gz (and of course be able to handle a non .gz input).

@brandon-rhodes
Copy link
Member

@Bernmeister — It is, indeed, a can of worms! I am going to be away-from-keyboard for the next week, but when I get back I'll think about the spectrum of choices between "Skyfield downloads everything" and "Skyfield downloads nothing". I've been adjusting that as I go along, and it will be helpful to review where things now stand.

In the meantime, have a good week, and I'll try to remember to comment here when I'm back!

@swshadle
Copy link

swshadle commented Oct 18, 2020

Hey, everyone,

I ran into the same error but found a downloaded version of the .gz file from when I'd previous run load.open successfully. If you can find a copy of the .gz file anywhere (online or in your files if you've previously had success), throw it into a subdirectory called data and try this. If they ever put the .gz file back up, the first try branch will work.

try:
    with load.open(hipparcos.URL) as f:
        df = hipparcos.load_dataframe(f)
except IOError as e:
    print(e)
    print('looking for a local copy stored in "data" subdirectory')
    try:
        with load.open('./data/hip_main.dat.gz') as f:
            df = hipparcos.load_dataframe(f)
    except IOError as e:
        print(e)
    except:
        print('unknown error opening local copy')
except:
    print('unknown error opening', hipparcos.URL)
finally:
    print('dataframe loaded successfully')

@me-at-git-hub
Copy link

Is this helpful? I do not have the physical published cd. I found this today browsing around the site. An alternate or new permanent location? This might possibly save a lot of work.

http://cdsarc.u-strasbg.fr/ftp/I/239/version_cd/cats/

In that directory is, the prize?

[DIR] Parent Directory
[   ] hip_dm.idx.gz           14-May-1997 17:16   86K
[   ] hip_dm_c.dat.gz         14-May-1997 17:16  2.4M
[   ] hip_dm_c.idx.gz         14-May-1997 17:16   86K
[   ] hip_dm_g.dat.gz         14-May-1997 17:16  147K
[   ] hip_dm_o.dat.gz         14-May-1997 17:16   28K
[   ] hip_dm_v.dat.gz         14-May-1997 17:16   19K
[   ] hip_dm_x.dat.gz         14-May-1997 17:16   13K
[   ] hip_ep.dat.gz           26-May-1997 10:20  120M
[   ] hip_ep.idx.gz           27-May-2019 15:26  371K
[   ] hip_ep_c.dat.gz         23-Apr-1997 16:23  325K
[   ] hip_ep_e.dat.gz         23-Apr-1997 16:23  192M
[   ] hip_i.dat.gz            02-May-1997 11:49  157M
[   ] hip_i.idx.gz            02-May-1997 11:49  345K
[   ] hip_j.dat.gz            02-May-1997 14:48  169M
[   ] hip_j.idx.gz            02-May-1997 14:48  155K
[   ] hip_main.dat.gz         14-May-1997 17:16   15M
[   ] hip_main.idx.gz         14-May-1997 17:16  246K
[   ] hip_rgc.dat.gz          02-May-1997 11:49   72K
[   ] hip_va.idx.gz           14-May-1997 17:16   40K
[   ] hip_va_1.dat.gz         14-May-1997 17:16  106K
[   ] hip_va_2.dat.gz         14-May-1997 17:16  148K
[   ] solar.idx.gz            14-May-1997 17:16   761
[   ] solar_ha.dat.gz         14-May-1997 17:16  115K
[   ] solar_hp.dat.gz         14-May-1997 17:16   53K
[   ] solar_t.dat.gz          14-May-1997 17:16   11K
[   ] tyc_ep.dat.gz           23-Apr-1997 13:49  175M
[   ] tyc_ep.idx.gz           23-Apr-1997 13:49  245K
[   ] tyc_main.dat.gz         14-May-1997 17:16  104M
[   ] tyc_main.idx.gz         14-May-1997 17:16   30K

@brandon-rhodes
Copy link
Member

Interesting! It depends on whether the catalog was revised after being distributed on CD in 1997. I won't have time to investigate that soon (I'll be away from the keyboard much of this week), but it would be interesting to compare the modern catalog file with the 1997 version that you link to above.

@me-at-git-hub
Copy link

me-at-git-hub commented Nov 30, 2020

As a rookie, I would not yet recognize the difference. However, also interesting to note, under the FTP tab at this url:

http://cdsarc.u-strasbg.fr/viz-bin/Cat?I/239

The listing there (which has only the .dat and not the .dat.gz) is actually dated, well, take a look...

hip_main.dat 25-Jun-1997 01:25 -r--r--r-- 51M - text - txt.gz - fits - fits.gz - html

Is that also not the updated one? (Rhetorical. Enjoy your week away from a keyboard!)

@brandon-rhodes brandon-rhodes changed the title Switch from Hipparcos raw file to table download 404 error when attempting to load Hipparcos data Jan 30, 2021
@brandon-rhodes
Copy link
Member

I am resetting the title of this issue back to its original value “404 error when attempting to load Hipparcos data” since the original issue was the change in the database’s URL. Now that everyone can get to Hipparcos again, I am (a) not aware of any users blocked waiting for Skyfield to learn how to parse Vizier table downloads, and (b) I am not sure that we would want to make that a Skyfield-only feature. It might make more sense for Python to have a 3rd party library that knew how to read the Vizier headers and NumPy-parse the lines of data that followed — for example, imagine something like:

https://astroquery.readthedocs.io/en/latest/vizier/vizier.html

— but able to return a plain Pandas dataframe instead of requiring a full AstroPy install. Then all kinds of tools, Skyfield included, could benefit from Vizier data, without each having to internally implement the same parsing logic.

I will be happy to consider counter-arguments against either of those premises, but at the moment they seem strong enough to me that I’m going to close this issue for now, since the original issue is long resolved.

If anyone who hadn't spoken up yet is indeed blocked on the lack of a general mechanism in Skyfield for Vizier table downloads, simply reply here with further information about which tables you need, and we can discuss whether the work is in scope for Skyfield contributors. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants