Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug for MassDownloader() #2794

Open
core-man opened this issue Mar 1, 2021 · 2 comments · May be fixed by #3188
Open

A bug for MassDownloader() #2794

core-man opened this issue Mar 1, 2021 · 2 comments · May be fixed by #3188
Labels

Comments

@core-man
Copy link
Contributor

core-man commented Mar 1, 2021

Codes

I used the following code to download seismic waveforms at station GE.CSS from GFZ data center.

from obspy.core import UTCDateTime
from obspy.clients.fdsn.mass_downloader import CircularDomain, Restrictions, MassDownloader

latitude, longitude = 33.901, 35.519
origin = UTCDateTime("2020-08-04T15:08:18.000Z")
starttime, endtime = origin - 60, origin + 6 * 60

domain = CircularDomain(latitude=latitude, longitude=longitude, minradius=0.0, maxradius=5.0)
restrictions = Restrictions(
    starttime=starttime,
    endtime=endtime,
    network="GE",
    station="CSS",
    reject_channels_with_gaps=False,
    minimum_length=0.0,
    sanitize=False,
    minimum_interstation_distance_in_m=0.0,
    channel_priorities=('HH[ZNE12]', 'HL[ZNE12]', 'BH[ZNE12]', 'BL[ZNE12]'),
)

mdl = MassDownloader(providers=["GFZ"])
mdl.download(domain, restrictions, mseed_storage="mseed", stationxml_storage="stations")

No data is downloaded:

[2021-03-01 20:16:40,602] - obspy.clients.fdsn.mass_downloader - INFO: Initializing FDSN client(s) for GFZ.
[2021-03-01 20:16:40,608] - obspy.clients.fdsn.mass_downloader - INFO: Successfully initialized 1 client(s): GFZ.
[2021-03-01 20:16:40,611] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2021-03-01 20:16:40,612] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Requesting unreliable availability.
[2021-03-01 20:16:41,071] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Successfully requested availability (0.46 seconds)
[2021-03-01 20:16:41,079] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Found 1 stations (3 channels).
[2021-03-01 20:16:41,080] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Will attempt to download data from 1 stations.
[2021-03-01 20:16:41,082] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Status for 3 time intervals/channels before downloading: NEEDS_DOWNLOADING
[2021-03-01 20:16:41,635] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - No data available for request.
[2021-03-01 20:16:41,637] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Launching basic QC checks...
[2021-03-01 20:16:41,639] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Downloaded 0.0 MB [0.00 KB/sec] of data, 0.0 MB of which were discarded afterwards.
[2021-03-01 20:16:41,640] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Status for 3 time intervals/channels after downloading: DOWNLOAD_FAILED
[2021-03-01 20:16:41,642] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - No station information to download.
[2021-03-01 20:16:41,643] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - No data could be downloaded.
[2021-03-01 20:16:41,644] - obspy.clients.fdsn.mass_downloader - INFO: ============================== Final report
[2021-03-01 20:16:41,644] - obspy.clients.fdsn.mass_downloader - INFO: 0 MiniSEED files [0.0 MB] already existed.
[2021-03-01 20:16:41,645] - obspy.clients.fdsn.mass_downloader - INFO: 0 StationXML files [0.0 MB] already existed.
[2021-03-01 20:16:41,646] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Acquired 0 MiniSEED files [0.0 MB].
[2021-03-01 20:16:41,647] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Acquired 0 StationXML files [0.0 MB].
[2021-03-01 20:16:41,647] - obspy.clients.fdsn.mass_downloader - INFO: Downloaded 0.0 MB in total.

If I commented channel_priorities, and directly indicated channel to be "HH*,HL*,BH*,BL*,SH*" in restrictions:

restrictions = Restrictions(
    starttime=starttime,
    endtime=endtime,
    network="GE",
    station="CSS",
    channel="HH*,HL*,BH*,BL*,SH*",
    reject_channels_with_gaps=False,
    minimum_length=0.0,
    sanitize=False,
    minimum_interstation_distance_in_m=0.0,
)

Data can be downloaded:

[2021-03-01 20:30:04,329] - obspy.clients.fdsn.mass_downloader - INFO: Initializing FDSN client(s) for GFZ.
[2021-03-01 20:30:04,336] - obspy.clients.fdsn.mass_downloader - INFO: Successfully initialized 1 client(s): GFZ.
[2021-03-01 20:30:04,337] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2021-03-01 20:30:04,338] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Requesting unreliable availability.
[2021-03-01 20:30:04,824] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Successfully requested availability (0.49 seconds)
[2021-03-01 20:30:04,831] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Found 1 stations (15 channels).
[2021-03-01 20:30:04,832] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Will attempt to download data from 1 stations.
[2021-03-01 20:30:04,835] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Status for 15 time intervals/channels before downloading: NEEDS_DOWNLOADING
[2021-03-01 20:30:06,018] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Successfully downloaded 9 channels (of 15)
[2021-03-01 20:30:06,020] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Launching basic QC checks...
[2021-03-01 20:30:06,037] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Downloaded 0.1 MB [107.00 KB/sec] of data, 0.0 MB of which were discarded afterwards.
[2021-03-01 20:30:06,038] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Status for 6 time intervals/channels after downloading: DOWNLOAD_FAILED
[2021-03-01 20:30:06,039] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Status for 9 time intervals/channels after downloading: DOWNLOADED
[2021-03-01 20:30:06,661] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Successfully downloaded 'stations/GE.CSS.xml'.
[2021-03-01 20:30:06,670] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Downloaded 1 station files [0.1 MB] in 0.6 seconds [157.69 KB/sec].
[2021-03-01 20:30:06,671] - obspy.clients.fdsn.mass_downloader - INFO: ============================== Final report
[2021-03-01 20:30:06,672] - obspy.clients.fdsn.mass_downloader - INFO: 0 MiniSEED files [0.0 MB] already existed.
[2021-03-01 20:30:06,673] - obspy.clients.fdsn.mass_downloader - INFO: 0 StationXML files [0.0 MB] already existed.
[2021-03-01 20:30:06,674] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Acquired 9 MiniSEED files [0.1 MB].
[2021-03-01 20:30:06,675] - obspy.clients.fdsn.mass_downloader - INFO: Client 'GFZ' - Acquired 1 StationXML files [0.1 MB].
[2021-03-01 20:30:06,675] - obspy.clients.fdsn.mass_downloader - INFO: Downloaded 0.2 MB in total.
$ ls mseed/
GE.CSS..BHE__20200804T150718Z__20200804T151418Z.mseed  GE.CSS..BLZ__20200804T150718Z__20200804T151418Z.mseed
GE.CSS..BHN__20200804T150718Z__20200804T151418Z.mseed  GE.CSS..SHE__20200804T150718Z__20200804T151418Z.mseed
GE.CSS..BHZ__20200804T150718Z__20200804T151418Z.mseed  GE.CSS..SHN__20200804T150718Z__20200804T151418Z.mseed
GE.CSS..BLE__20200804T150718Z__20200804T151418Z.mseed  GE.CSS..SHZ__20200804T150718Z__20200804T151418Z.mseed
GE.CSS..BLN__20200804T150718Z__20200804T151418Z.mseed

$ ls stations/
GE.CSS.xml

Descriptions of the bug
I checked this station by http://ds.iris.edu/mda/GE/CSS/. GE.CSS has some instruments, e.g., HL?, HH?, BL?, BH?, SH?.

Based on the second test, we know GE.CSS has data for BL?, BL?, SH?. But if we indicate channel priorities as in the first test, we may miss the data. For example, if channel_priorities=('HH[ZNE12]', 'HL[ZNE12]', 'BH[ZNE12]', 'BL[ZNE12]'),
the client will firstly try to find if the station has the HH? instrument. Unluckily, GE.CSS has this instrument although this instrument does not have data at that time. Therefore, we miss the data at BH?.

Instead, if I put 'BH[ZNE12]' or 'BL[ZNE12]' at the first position in channel_priorities, we can download data at 'BH[ZNE12]' or 'BL[ZNE12]'.

Possible reason
I guess the bug may be caused by how ObsPy determines if an instrument has seismic data. When we use set channel_priorities to be ('HH[ZNE12]', 'HL[ZNE12]', 'BH[ZNE12]', 'BL[ZNE12]'), the client found GE.CSS has the instrument
HH[ZNE12] and skip the left channels. Unluckily, this instrument has no data, so we finally miss seismic data at 'BH[ZNE12]' or 'BL[ZNE12]'.

If the client could go on to the next channel if the previous channel does not have data, the bug could be resolved. In other words, we should check waveform data availability instead of instrument availability, because the instrument may exist but has no data at that time. However, waveform data availability may take more time than instrument availability.

Other notes
I think the same thing could be applied to location_priorities.

Version information

  • ObsPy version, Python version and Platform (Windows, OSX, Linux ...) 1.2.2
  • How did you install ObsPy and Python (pip, anaconda, from source ...) anaconda
@megies
Copy link
Member

megies commented Oct 14, 2022

Actually looks like a bug..? not sure

@megies megies added this to the 1.4.0 milestone Oct 14, 2022
@megies megies added .clients.fdsn bug-unconfirmed reported bug that still needs to be confirmed labels Oct 19, 2022
megies added a commit that referenced this issue Oct 19, 2022
…issing data

currently (at least when only "weak" availability data is available),
any channels that are in principle available according to metadata
overrule all other channels that come later in the channel_priority
listing, e.g. if the the first item in channel_priority successfully
matches a channel, all other channels are ignored, even if the former
selected channel actually yields "No data available at server" while
some other channels actually do have data but are later in the
channel_priority (see #2794)

Currently the only way to fix this is to first try and download *all*
channels' data that match any of the given channel_priority wildards,
and then at the very end it is evaluated if some higher priority data
was downloaded and lower priority data get deleted again.

This certainly is not ideal, since it might blow up the amount of data
downloaded and subsequently discarded, but it is likely the lesser evil
compared to losing whole stations from the final download result
@megies megies linked a pull request Oct 19, 2022 that will close this issue
11 tasks
@megies
Copy link
Member

megies commented Nov 17, 2022

See #3188 (comment), I'll postpone this for now as the current fix is less than ideal and to properly tackle this would mean a major refactoring of the code

@megies megies modified the milestones: 1.4.0, Future release Nov 17, 2022
@megies megies added bug confirmed bug and removed bug-unconfirmed reported bug that still needs to be confirmed labels Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants