Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONTENT_FALLBACK_DIRS option along with cherrypy/devserver support. #6865

Merged
merged 11 commits into from May 26, 2020

Conversation

jamalex
Copy link
Member

@jamalex jamalex commented May 11, 2020

Summary

For Flatpak-based preloading of Endless content, as well as future SD card-based static content distribution for Android, this PR adds a CONTENT_FALLBACK_DIRS option that allows for additional search paths to be specified beyond the standard, primary CONTENT_DIR.

Content is only downloaded to the primary content location, but a file is found in one of the fallback directories instead, that path will be used by the server, and the file will be considered available.

CONTENT_FALLBACK_DIRS is a semicolon-separated list of paths. Those paths should each contain subfolders called "storage" and "databases", in the same form as the <KOLIBRI_HOME>/content directory (or other CONTENT_DIR path).

Reviewer guidance

Run Kolibri with the environment variable CONTENT_FALLBACK_DIRS set to a path (or semi-colon-delimited list of paths) containing Kolibri content. Note that you then need to run kolibri manage scanforcontent to pick up any new content.

Note: will not currently work properly with kolibri-server or behind any of our standard nginx proxy configs, as those have the /static prefix hardcoded to a single lookup path.

TODO

  • Add automated tests for primary expected behavior
  • Test with nginx and see if it's possible to provide easy nginx compatibility guidance
  • Test various interactions with content scanning and availability updating (e.g. update get_channel_ids_for_content_database_dir and references to it)

Contributor Checklist

PR process:

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Testing:

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

Reviewer Checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@jamalex jamalex added the work-in-progress Not ready for review label May 11, 2020
@jamalex jamalex added this to the 0.13.3 milestone May 11, 2020
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation can be improved - one flag about potential impact on performance.

kolibri/utils/options.py Show resolved Hide resolved
kolibri/utils/options.py Outdated Show resolved Hide resolved
conf.OPTIONS["Deployment"]["URL_PATH_PREFIX"]
).lstrip("/")
content_dirs = [paths.get_content_dir_path()] + paths.get_content_fallback_paths()
dispatcher = MultiStaticDispatcher(content_dirs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where the fallback paths are not used, how much overhead does the multistatic dispatcher add compared to how cherrypy normally handles static files?

One way to be conservative about this would be to only use the MultiStaticDispatcher in the case where there are fallback content paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still using the same cherrypy static handlers. The additional code that's executed per file request is:
https://github.com/learningequality/kolibri/pull/6865/files/78ae3149620cc3e98743ca87968e6557529b95c0#diff-601cde52caf17fa9d77c64ca9e73e927R239-R248

In the case of no fallback content paths, this loop will only be executed once, but it does do an extra os.path.exists call in there, which wouldn't be needed in the "single handler" case. Actually, the simplest way to "handle" that might be to just short-circuit the find_handler method if there's only one candidate handler anyway.

@rtibbles
Copy link
Member

Should we target this to the app-support branch, btw?

@kollivier
Copy link
Contributor

kollivier commented May 19, 2020

I took this for a spin locally and the fallback paths are working for me with a fallback path on an external hard drive. The main issue was that even with smaller channels, scanforcontent took a couple minutes to run on a fairly fast machine.

@jamalex For cases where we know in advance what content will be in the fallback dir(s), can we extend the preseeded Kolibri trick to run scanforcontent so it can scan in the channels and load them into the Kolibri db beforehand? (I'm assuming the scanforcontent time hit is mostly due to a need to import things from the channel db into a Kolibri installation db?)

@indirectlylit
Copy link
Contributor

Should we target this to the app-support branch, btw?

The "click to log in" and password-free changes for Android will be going in app-support. However I believe the Endless timeline might be a bit accelerated compared to that branch.

If app-support is currently stable enough for Endless to use and we are waiting on the Android changes, I would say yes: this change should go there and Endless should release builds from that branch.

Otherwise we can merge here and have the MrPau team do an integration test pass.

In either case, we'll need Gherkin stories for these changes in order for them to test it eventually.

@indirectlylit indirectlylit added the TODO: needs gherkin update Add to our manual integration tests label May 19, 2020
@kollivier
Copy link
Contributor

I appear to have gotten this working, along with porting the preseeded Kolibri stuff to Mac so I can test Kolibri while keeping all the big channels on my external drive. :) Have to say that just loading right into a Kolibri with plenty of content after initial setup is a nice experience.

@jredrejo
Copy link
Member

For kolibri-server, the CONTENT_FALLBACK_DIRS can be removed/inserted in every restart of the server with a modification here. https://github.com/learningequality/kolibri-server/blob/master/kolibri_server_setup.py#L159
I'll work on it after this is approved

@indirectlylit indirectlylit changed the base branch from release-v0.13.x to app-support May 20, 2020 22:56
@indirectlylit indirectlylit modified the milestones: 0.13.3, flatpack-mvp May 20, 2020
@jamalex
Copy link
Member Author

jamalex commented May 21, 2020

@kollivier:

For cases where we know in advance what content will be in the fallback dir(s), can we extend the preseeded Kolibri trick to run scanforcontent so it can scan in the channels and load them into the Kolibri db beforehand?

We could, in principle. Except I can't imagine too many scenarios where we would know specifically what content would be there at install time. Except for full OS images, and in that case the simplest would be to start Kolibri, stop Kolibri, and then deprovision (with data left in place).

(I'm assuming the scanforcontent time hit is mostly due to a need to import things from the channel db into a Kolibri installation db?)

It's hard to say. That's likely to be a decent portion of it (and if the external drive is non-SSD and/or over slower USB, then that could be a bottleneck for the DB reads). The other part would be annotation, if you have a lot of new content available in the fallback dirs. But for that, it shouldn't be much slower than if the content were on your primary disk/content path. Could you time the stages based on log outputs when you run scanforcontent?

@jamalex
Copy link
Member Author

jamalex commented May 21, 2020

In either case, we'll need Gherkin stories for these changes in order for them to test it eventually.

Opened a tech debt issue here:
#6904

@codecov
Copy link

codecov bot commented May 26, 2020

Codecov Report

Merging #6865 into app-support will decrease coverage by 0.03%.
The diff coverage is 46.15%.

Impacted Files Coverage Δ
...core/content/management/commands/scanforcontent.py 0.00% <0.00%> (ø)
kolibri/utils/server.py 36.33% <12.50%> (-1.88%) ⬇️
kolibri/core/views.py 82.85% <40.00%> (-4.52%) ⬇️
kolibri/core/content/utils/paths.py 95.49% <92.15%> (-3.24%) ⬇️
kolibri/core/content/upgrade.py 76.66% <100.00%> (ø)
kolibri/core/content/utils/channels.py 89.04% <100.00%> (+0.98%) ⬆️
kolibri/core/urls.py 100.00% <100.00%> (ø)
kolibri/core/content/utils/check_schema_db.py 100.00% <0.00%> (+8.69%) ⬆️

@jamalex jamalex removed the work-in-progress Not ready for review label May 26, 2020
@rtibbles rtibbles dismissed their stale review May 26, 2020 21:12

Comments addressed.

@jamalex jamalex merged commit aa4e9d5 into learningequality:app-support May 26, 2020
@jamalex jamalex deleted the content_fallback_paths branch May 28, 2020 00:35
@indirectlylit indirectlylit modified the milestones: flatpak-mvp, 0.14.0 Jul 3, 2020
@indirectlylit indirectlylit added the changelog Important user-facing changes label Jul 3, 2020
@radinamatic radinamatic removed the TODO: needs gherkin update Add to our manual integration tests label Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog Important user-facing changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants