Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date coverage in TransitLayer does not check calendar_dates.txt comprehensively. #301

Open
wklumpen opened this issue Jul 14, 2023 · 2 comments
Assignees
Labels
data Something related to input data sets (not necessarily an r5py bug) enhancement New feature or request reminder something to do one day (but not today) validation Specific input validation feature requests

Comments

@wklumpen
Copy link
Contributor

wklumpen commented Jul 14, 2023

Related to #266

As currently written, start_date does a basic check for dates but does not do a comprehensive check.

Note:

def start_date(self):
"""The earliest date the loaded GTFS data covers."""
try:
start_date = min(
[
parse_int_date(service.calendar.start_date)
for service in self._transit_layer.services
]
)
except (AttributeError, ValueError) as exception:
raise ValueError("No GTFS data set loaded") from exception
return start_date

It is also possible to have service coverage entirely specified in calendar_dates.txt. According to the spec it's possible to use either calendar.txt or calendar_dates.txt (or both) to specify valid date coverage.

We should handle this by taking the minimum of the minimum of both calendar and calendar_dates dates, and the maximum of maximum of both datasets (assuming they're both there).

@wklumpen wklumpen added this to the Release v0.0.5 milestone Jul 14, 2023
@wklumpen wklumpen self-assigned this Jul 14, 2023
@wklumpen
Copy link
Contributor Author

wklumpen commented Aug 2, 2023

Okay so this is much more complicated due to some upstream wonkiness in the R5 package.

Note that while the Calendar object in R5 uses integers to represent the dates, for some reason the CalendarDate object date column is represented by a LocalDate which requires some form of parsing (which I am not sure I can access easily via Python as it requires a date formatter), and the whole hash map doesn't seem to let me access the exception type anyway which is also necessary to make sure that the date is being added, not removed from the service.

I'm going to be bold here and propose that we remove date-level coverage validation on initiation on our end entirely and rely on two other things:

  1. R5 potentially erroring out or providing an empty matrix if there simply isn't any valid transit trips
  2. We provide an example/walkthrough of validating a GTFS feed using both the Mobility Data Validator and the GTFS-lite package. These provide pure Python ways of checking inputs and are much more comprehensive anyway as they account for all of the added/removed service. It also accounts for the fact that service dates can extend past midnight, which currently aren't checked accurately.

Alternatively we could invoke the GTFS-lite package (no extra requirements needed) and run the date check that way instead of wrestling with it in the setters.

I'm also attaching an example of a GTFS package that is valid but contains only calendar dates (I shrunk the Helsinki feed down to one route)
test_gtfs_calendar_dates_only.zip

@christophfink
Copy link
Member

Let’s push this to a future release, so we have time to write up an extensive set of instructions on how to check GTFS data sets. Currently it’s a very short paragraph here:

https://r5py--238.org.readthedocs.build/en/238/user-guide/user-manual/data-requirements.html#public-transport-schedules-in-gtfs-format

image

@christophfink christophfink added enhancement New feature or request reminder something to do one day (but not today) validation Specific input validation feature requests data Something related to input data sets (not necessarily an r5py bug) labels Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Something related to input data sets (not necessarily an r5py bug) enhancement New feature or request reminder something to do one day (but not today) validation Specific input validation feature requests
Projects
None yet
Development

No branches or pull requests

2 participants