Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Departure time is outside of the time range covered by currently loaded GTFS data sets #364

Open
keyingtang opened this issue Oct 4, 2023 · 22 comments
Assignees
Labels
bug Something isn't working data Something related to input data sets (not necessarily an r5py bug)

Comments

@keyingtang
Copy link

Hello, I have a question in terms of the detailed GTFS data requirements.

Describe the bug
I want to use the function TravelTimeMatrixComputer, and for the departure time I typed in one time that I'm sure it's covered in the loaded GTFS data sets. But I got warning saying .../r5py/r5/regional_task.py:228: RuntimeWarning: Departure time 2023-10-07 12:30:00 is outside of the time range covered by currently loaded GTFS data sets. And all the output travel time value are NaN.

The GTFS datasets I provided only has calendar_dates.txt, no calendar.txt. Could this be a reason?

Environment:

  • OS: MacOS
  • Python package source (PyPi, conda, ...): Conda
  • Versions of Python, Java Development Kit, Python modules: Python3.11, Java 11.0.20
@christophfink
Copy link
Member

The warning is indicating that a GTFS data set, according to its own metadata, does not cover the requested date and time. Since the result seems to not find any connections, this also seems to be the case. Can you double-check that there are routes in the GTFS data set for the date and time you request?

@wklumpen could you comment on this, because I’m not 100% sure of the requirements of a GTFS file

@wklumpen
Copy link
Contributor

wklumpen commented Oct 10, 2023

Heya!

We have a known issue (#301) where our warning is not set up to check caldendar_dates.txt explicitly for coverage (which is why it's currently a warning not an error).

I assume though that R5 does include it as that would be a pretty big hole in the software.

Some more things to check:

  • The OSM file covers the area you're measuring
  • The points you're using for origin/destination are not in a weird projection (standard WSG is ideal)

If you are able to attach the GTFS file I can double check it to make sure that's not the issue.

@keyingtang
Copy link
Author

Hello both!

Thank you a lot for your answers! According to your advice, I have done the below checks, but I still got the same warning.

  • Due to some UnicodeDecodeError while loading GTFS data using gtfs-lite, I use gtfs_kit instead to check the GTFS data. I have double checked that there is transit service running at the departure data and time I requested.
  • I have also updated my OSM file to country level and all my origins and destinations dataset are within the country. So I'm sure it's covering my study areas.
  • I have checked the CRS of my origins/destinations points and both of them are EPSG:4326. I assume it's not a weird projection.
  • Lastly, I am unable to attach the GTFS file as it exceeds the size limit. You can easily download it via http://gtfs.ovapi.nl/nl/, in which you can find the GTFS file called gtfs-nl.zip.

Do you have any further advice on how I can solve this problem? Thanks in advance!

@wklumpen
Copy link
Contributor

wklumpen commented Oct 10, 2023

Just had a look at the GTFS file, a quick CTRL+F for 20231007 does not show in calendar_dates.txt

calendar_dates.txt

@wklumpen
Copy link
Contributor

Also as a side note can you open an issue for the UnicodeDecode error on the GTFS-Lite repo?

@keyingtang
Copy link
Author

keyingtang commented Oct 13, 2023

Hi, yeah that's true, because the newest gtfs-nl.zip file starts from 20231010, which you can see from feed_info.txt. And the departure date and time I requested is within the coverage indicated by feed_info.txt.

Btw, I have just opened the UnicodeDecode error issue on GTFS-Lite, thanks ;)

feed_info.txt

@wklumpen
Copy link
Contributor

Can you post the calendar.txt and calendar_dates.txt files from the service you are loading into R5py?

The feed_info.txt is only a recommended file and the data may not actually follow what it suggests. From the feed_info description on the feed_start_date:

The dataset provides complete and reliable schedule information for service in the period from the beginning of the feed_start_date day to the end of the feed_end_date day. Both days may be left empty if unavailable. The feed_end_date date must not precede the feed_start_date date if both are given. It is recommended that dataset providers give schedule data outside this period to advise of likely future service, but dataset consumers should treat it mindful of its non-authoritative status. If feed_start_date or feed_end_date extend beyond the active calendar dates defined in calendar.txt and calendar_dates.txt, the dataset is making an explicit assertion that there is no service for dates within the feed_start_date or feed_end_date range but not included in the active calendar dates.

The only way to be sure service is being run on that date is to check the calendar.txt and calendar_dates.txt. In the one provided I saw it was explicitly not available.

@christophfink christophfink added data Something related to input data sets (not necessarily an r5py bug) bug Something isn't working labels Oct 17, 2023
@keyingtang
Copy link
Author

Hello, thanks for your reply. As I said in the initial issue description, the GTFS dataset I want to use only has calendar_dates.txt. And here it is: calendar_dates.txt

The departure data and time I was trying to request is 2023-10-06 08:30, of which the date is shown in this file. And I have used gtfs-lite library to check and make sure that there are services running every hour on that day. But I always got the RuntimeWarning error when trying to use TravelTimeMatrixComputer function of r5py.

@wklumpen
Copy link
Contributor

The RuntimeWarning is a known bug with the date checking - I have encountered warnings but still generated results - it can be ignored if you're confident the data covers the date you're checking.

What's more concerning is the null results. There are a few possibilities:

  • A problem with the GTFS dates (still)
  • A problem with OSM underlying the data (projection issues? not sure)
  • A problem with how R5 handles calendar dates

The last one would be an upstream R5 issue (and is kinda concerning!). One thing you could do is manually create a calendar.txt file that includes all the service IDs for that date and covers the day of the week you need. If that runs and produces results, then we know the problem is likely upstream in R5, unless @christophfink does any GTFS processing before load.

@christophfink
Copy link
Member

unless @christophfink does any GTFS processing before load

No, we don’t process the GTFS files before passing them to R5

That said, I still think this is a data issue. @keyingtang , could you share the actual GTFS data set you are using? If you want to share it confidentially, please send it to christoph.fink@helsinki.fi

@sruinaard
Copy link

Hi everybody,

I wanted to join in on the conversation as I have been having the same issue for Sweden. In the regional and national GTFS files, we do have a calendar.txt file and I compared it to the Helsinki and Sao Paulo demo. What stands out to me most is that for Sweden, we do not have 1's for the weekdays, they are all 0. I do not know whether this causes the problem for me? I included a screenshot of the three files.

I tried changing some 0's to 1's, but that gave some errors, as I think I probably need to take a more systematic approach to improve my GTFS file.

This is the warning I get: RuntimeWarning: Departure time 2023-10-16 00:00:00 is outside of the time range covered by currently loaded GTFS data sets.

I also get this runtime error when tying other dates in the GTFS file, or changing the departure time (window). I'll send you the GTFS file I have been using, as it is too big to upload it here. Thank you in advance for taking a look at it!

Screenshot 2023-10-24 at 11 41 15

@wklumpen
Copy link
Contributor

wklumpen commented Oct 25, 2023

Okay so for both @keyingtang and @sruinaard I was able to confirm with GTFS-lite that there are trips which run on the days specified.

For @sruinaard - Are you getting results when you run or an empty matrix? If so, you can (I think) safely ignore the error as it appears there is service during the period specified.

More diagnoses is needed on the other input data to understand why there might be blank matrices. My experience in the past has been one of projection or corrupted input files.

@wklumpen
Copy link
Contributor

What stands out to me most is that for Sweden, we do not have 1's for the weekdays, they are all 0. I do not know whether this causes the problem for me? I included a screenshot of the three files.

It looks like the Sweden files have service_ids listed in calendar.txt that aren't used, while calendar_dates.txt lists individual services, so it's likely the calendar_dates.txt file that's being relied on.

I also noted that the files provided by both @keyingtang and @sruinaard are nested (folder-in-folder style with a MACOSX folder also). Can you try running the same analysis but just zipping the files directly? I would assume R5 would handle/throw an error while loading but I'm working on eliminating as many possible errors.

@wklumpen
Copy link
Contributor

Another possible source of the issue - see this R5 issue
.

Could the base version of R5 we use be causing this issue @christophfink?

@christophfink
Copy link
Member

We definitely use a version after the linked issue's fix was merged. That does not mean, however, that we're not affected by something similar

I'll take a closer look at our date checking code, maybe there's a convenient way to test whether services exist on the particular day requested, rather than whether the requested date is within the covered period of the GTFS data set

Not sure whether we open a(nother) hole in our logic: if there is a GTFS schedule for a service that runs, say, Mon-Sat and is valid for an extended period of time, should requesting a route on a Sunday trigger the warning?

@sruinaard
Copy link

Thanks for the discussion, I will try running it without the nested structure and let you know how it goes.

@wklumpen I'm getting results - the exact same results as walking (descriptives and maps), while I do expect some impact of bus lines. Therefore, I will check next week what is exactly included in the GTFS data for my area and time period, to look more exactly where I'd expect different results.

I tried to improve my gtfs files with the r scripts I sent over, and then I did not get the warning anymore, but still, my transit and walking results seemed exactly the same (descriptives).

I will also run everything for another area, which is more densely populated and therefore has more public transport, to see if still transit is the same as walking. Keep you posted!

@wklumpen
Copy link
Contributor

Not sure whether we open a(nother) hole in our logic: if there is a GTFS schedule for a service that runs, say, Mon-Sat and is valid for an extended period of time, should requesting a route on a Sunday trigger the warning?

I think we leave that kind of testing to other packages and remove the date check warning entirely. With that said, the GTFS-Lite 'date_trips()' function does take calendar and calendar date files into account for pulling all trips running on a given date.

We could use that but it does require loading the zip which slows the process down for large files.

I think if we've checked these particular feeds we should be able to safely say this shouldn't be a GTFS issue, but of course one is never sure.

@keyingtang
Copy link
Author

Thanks for the discussion!

I also noted that the files provided by both @keyingtang and @sruinaard are nested (folder-in-folder style with a MACOSX folder also). Can you try running the same analysis but just zipping the files directly? I would assume R5 would handle/throw an error while loading but I'm working on eliminating as many possible errors.

Regarding this, I have tried just zipping the files directly and running the same analysis, and I got the same error and 'NaN' values in travel_time.

More diagnoses is needed on the other input data to understand why there might be blank matrices. My experience in the past has been one of projection or corrupted input files.

About this, I checked the projection of my input origins and destinations datasets, CRS of them is EPSG:28992 (Amersfoort / RD New), which seems to be an uncommon CRS. Could this be a possible reason?

@wklumpen
Copy link
Contributor

Hmm. We do try to reproject to 4326 I think (@christophfink) but it's worth a try reprojecting it yourself first and then testing it again

@christophfink
Copy link
Member

Yes, I can confirm that r5py transparently reprojects input data and then transforms the results back into the input CRS

@sruinaard
Copy link

Hi all,

We found the reason why the travel time matrices were the same: as we're interested in a rural area, and the public transport service level is low, the median value of travel times almost always returned a result that was the same as walking. We will proceed with using the 5th percentile and a departure time window of 1 hour to capture hourly services. The GTFS files are working for us. Thank you for your time checking things for us!

@christophfink
Copy link
Member

@sruinaard thanks for the feedback. Indeed, the way R5 summarises trips over the departure time window is not immediately intuitive. At our lab, we reverted to using the first percentile in a 1h time window for most of our analyses, as we assume - especially in more rural study cases - that people can adapt their everyday mobility demands within these margins (which of course is not necessarily true, but we feel is a valid assumption for certain research)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data Something related to input data sets (not necessarily an r5py bug)
Projects
None yet
Development

No branches or pull requests

4 participants