Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add optional datetime range filter when parsing large calendar files #474

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

8633brown
Copy link

closes #473. managed to figure out the Component\VEvent::isInTimeRange method and figured it was the right method for this task.

@codecov
Copy link

codecov bot commented Oct 14, 2019

Codecov Report

Merging #474 into master will increase coverage by 0.07%.
The diff coverage is 28.57%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #474      +/-   ##
============================================
+ Coverage     98.68%   98.76%   +0.07%     
- Complexity     1796     1820      +24     
============================================
  Files            65       65              
  Lines          4272     5184     +912     
============================================
+ Hits           4216     5120     +904     
- Misses           56       64       +8
Impacted Files Coverage Δ Complexity Δ
lib/Parser/MimeDir.php 97.56% <28.57%> (-2%) 91 <1> (+5)
lib/Property/Boolean.php 53.84% <0%> (-4.49%) 7% <0%> (ø)
lib/Parser/XML.php 96.96% <0%> (-0.57%) 51% <0%> (ø)
lib/Component.php 98.34% <0%> (-0.36%) 100% <0%> (ø)
lib/Cli.php 97.9% <0%> (-0.14%) 121% <0%> (ø)
lib/Property/Binary.php 100% <0%> (ø) 8% <0%> (ø) ⬇️
lib/Writer.php 100% <0%> (ø) 4% <0%> (ø) ⬇️
lib/Property/ICalendar/Recur.php 100% <0%> (ø) 68% <0%> (ø) ⬇️
lib/StringUtil.php 100% <0%> (ø) 4% <0%> (ø) ⬇️
lib/Document.php 100% <0%> (ø) 24% <0%> (ø) ⬇️
... and 38 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebe9748...9637f9c. Read the comment docs.

lib/Parser/MimeDir.php Outdated Show resolved Hide resolved
@evert
Copy link
Member

evert commented Oct 14, 2019

Doing this 'right' might be a more difficult than you think, especially when things like recurrence and exceptions come into play. You might need to do 2 passes of the file. The first to create an index of related events, and a second to do the actual filtering.

I'm a little on the fence here. I also feel that the use-case (reducing 30MB memory to 5MB) is not incredibly significant if this is mostly used for one-off purposes. It's definitely a significant number in terms of memory usage, but is this something you're dealing with very often?

@8633brown
Copy link
Author

Basically what I'm working on is a personalised teacher student booking site. I dont particularly want to go down the route of self hosting a caldav server but that might be the. Ext best option. The plan was to allow teachers (which theres only 2 of at the moment) to pass a public ical link to the server to be parsed to find the freebusy times for when students can create booking. Most of the calendars I'm dealing with are hosted on the apple servers so I'm struggling to find a sensible way to query these with caldav requests. As mentioned one of the calendars has over 2000 events and once parsed with reader::read and everything is put into memory it around 30MB. Once reduced down to the events I'm interested in (I was wrong about 5MB that was also including another calendar) it's closer to 1.5MB.
So this action is possibly going to have to happen every time a lesson is modified and every Xhours to keep it upto date (when not considering checking the file for changes or checking Etag headers and the likes.)
I'm very open to any new suggestions thrown my way.

@evert
Copy link
Member

evert commented Oct 14, 2019

I guess where I'm sort of coming from is that even at the 30MB point, you can still run about >100 in parallel in theory. If it takes 5 seconds to parse one of these larger ones, 1200 this system should be able to handle 100K of these calendar updates per hour. I'm just wondering: is that really your bottleneck?

@8633brown
Copy link
Author

You're right its probably not a bottleneck at all. I just thought a method like this could've been a quick win. I was assuming the functionality would more or less be the same as the FreeBusyGenerator::calculateBusy as this is pretty much identical to the VEvent::isInTimeRange method. I also seen the min and max dateTimes in the settings which i originally thought would do what i wanted. I guess i'm just struggling to see the difference in parsing the VEvent in the FreeBusyGenerator when i could just parse them out when i read the calendar the first time.
Apologies if i'm just being naive thinking this is easier than it is.

@evert
Copy link
Member

evert commented Oct 15, 2019

@8633brown there is a difference with the FreeBusyIterator, because this object actually right now needs the full calendar in memory to do its thing. It specifically uses EventIterator, which handles all recurrence-related stuff.

Specifically one issue here is recurring events and exceptions. To calculate all instances of a recurrence, you need to need to have all VEVENT objects that have a matching UID. UID's are not unique and when multiple events share a UID it means they belong together. There's no guarantee that events with the same UID appear in sequence.

@8633brown
Copy link
Author

Ahh interesting. So when calling the event iterator within the isInTimeRange function on this line, if I was to call this on a recurring event without all other occurrences with the same uid it could give me the wrong result? As isInTimeRange only passes a single VEvent to the event iterator?
If that's correct I think I understand where the double pass comes in now.

Would it be an option to not filter recurring events. And only filter events which dont have an RRule?
Or as you said do a double pass and just put all VEvents with an RRule and the same UID into an array to do the second filter?
I guess I'm wondering if @evert you're still on the fence or if there is a chance this would be merged at all?

@evert
Copy link
Member

evert commented Oct 15, 2019

I don't consider myself a maintainer anymore, so ultimately it's up to others. However, I would lean towards 'no' because I feel it's a bit of an edge case, and somewhat imperfect due to the 'recurrence exception' issue.

I can see a need for this at some point, but maybe there's other ways to solve it. The VCardSplitter class for example stemmed from a similar issue with large VCard exports. This was a bit more grounded in real issues, because addressbook exports can be really massive due to pictures being stored in vcards. Possible in iCalendar too, but more rare.

For what it's worth, CalDAV does have a similar problem and also has a popular 'filter by timerange' feature. This is solved in sabre/dav by first creating indexes on events and finding out the 'first time an event occurs' and the 'last time an event occurs', and after that unprecise filter is applied, events are filtered again with a slower precise filter that expands every occurrence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing large calendar files.
3 participants