CONTENT_FALLBACK_DIRS option along with cherrypy/devserver support. #6865

jamalex · 2020-05-11T05:00:13Z

Summary

For Flatpak-based preloading of Endless content, as well as future SD card-based static content distribution for Android, this PR adds a CONTENT_FALLBACK_DIRS option that allows for additional search paths to be specified beyond the standard, primary CONTENT_DIR.

Content is only downloaded to the primary content location, but a file is found in one of the fallback directories instead, that path will be used by the server, and the file will be considered available.

CONTENT_FALLBACK_DIRS is a semicolon-separated list of paths. Those paths should each contain subfolders called "storage" and "databases", in the same form as the <KOLIBRI_HOME>/content directory (or other CONTENT_DIR path).

Reviewer guidance

Run Kolibri with the environment variable CONTENT_FALLBACK_DIRS set to a path (or semi-colon-delimited list of paths) containing Kolibri content. Note that you then need to run kolibri manage scanforcontent to pick up any new content.

Note: will not currently work properly with kolibri-server or behind any of our standard nginx proxy configs, as those have the /static prefix hardcoded to a single lookup path.

TODO

Add automated tests for primary expected behavior
Test with nginx and see if it's possible to provide easy nginx compatibility guidance
Test various interactions with content scanning and availability updating (e.g. update get_channel_ids_for_content_database_dir and references to it)

Contributor Checklist

PR process:

PR has the correct target branch and milestone
PR has 'needs review' or 'work-in-progress' label
If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
If this is an important user-facing change, PR or related issue has a 'changelog' label
If this includes an internal dependency change, a link to the diff is provided

Testing:

Contributor has fully tested the PR manually
If there are any front-end changes, before/after screenshots are included
Critical user journeys are covered by Gherkin stories
Critical and brittle code paths are covered by unit tests

Reviewer Checklist

Automated test coverage is satisfactory
PR is fully functional
PR has been tested for accessibility regressions
External dependency files were updated if necessary (yarn and pip)
Documentation is updated
Contributor is in AUTHORS.md

rtibbles

Validation can be improved - one flag about potential impact on performance.

kolibri/utils/options.py

rtibbles · 2020-05-11T16:16:27Z

kolibri/utils/server.py

+        conf.OPTIONS["Deployment"]["URL_PATH_PREFIX"]
+    ).lstrip("/")
+    content_dirs = [paths.get_content_dir_path()] + paths.get_content_fallback_paths()
+    dispatcher = MultiStaticDispatcher(content_dirs)


In the case where the fallback paths are not used, how much overhead does the multistatic dispatcher add compared to how cherrypy normally handles static files?

One way to be conservative about this would be to only use the MultiStaticDispatcher in the case where there are fallback content paths.

It's still using the same cherrypy static handlers. The additional code that's executed per file request is:
https://github.com/learningequality/kolibri/pull/6865/files/78ae3149620cc3e98743ca87968e6557529b95c0#diff-601cde52caf17fa9d77c64ca9e73e927R239-R248

In the case of no fallback content paths, this loop will only be executed once, but it does do an extra os.path.exists call in there, which wouldn't be needed in the "single handler" case. Actually, the simplest way to "handle" that might be to just short-circuit the find_handler method if there's only one candidate handler anyway.

rtibbles · 2020-05-13T16:23:00Z

Should we target this to the app-support branch, btw?

kollivier · 2020-05-19T00:28:13Z

I took this for a spin locally and the fallback paths are working for me with a fallback path on an external hard drive. The main issue was that even with smaller channels, scanforcontent took a couple minutes to run on a fairly fast machine.

@jamalex For cases where we know in advance what content will be in the fallback dir(s), can we extend the preseeded Kolibri trick to run scanforcontent so it can scan in the channels and load them into the Kolibri db beforehand? (I'm assuming the scanforcontent time hit is mostly due to a need to import things from the channel db into a Kolibri installation db?)

indirectlylit · 2020-05-19T17:55:27Z

Should we target this to the app-support branch, btw?

The "click to log in" and password-free changes for Android will be going in app-support. However I believe the Endless timeline might be a bit accelerated compared to that branch.

If app-support is currently stable enough for Endless to use and we are waiting on the Android changes, I would say yes: this change should go there and Endless should release builds from that branch.

Otherwise we can merge here and have the MrPau team do an integration test pass.

In either case, we'll need Gherkin stories for these changes in order for them to test it eventually.

kollivier · 2020-05-19T19:31:04Z

I appear to have gotten this working, along with porting the preseeded Kolibri stuff to Mac so I can test Kolibri while keeping all the big channels on my external drive. :) Have to say that just loading right into a Kolibri with plenty of content after initial setup is a nice experience.

jredrejo · 2020-05-20T15:06:17Z

For kolibri-server, the CONTENT_FALLBACK_DIRS can be removed/inserted in every restart of the server with a modification here. https://github.com/learningequality/kolibri-server/blob/master/kolibri_server_setup.py#L159
I'll work on it after this is approved

jamalex · 2020-05-21T00:11:00Z

@kollivier:

For cases where we know in advance what content will be in the fallback dir(s), can we extend the preseeded Kolibri trick to run scanforcontent so it can scan in the channels and load them into the Kolibri db beforehand?

We could, in principle. Except I can't imagine too many scenarios where we would know specifically what content would be there at install time. Except for full OS images, and in that case the simplest would be to start Kolibri, stop Kolibri, and then deprovision (with data left in place).

(I'm assuming the scanforcontent time hit is mostly due to a need to import things from the channel db into a Kolibri installation db?)

It's hard to say. That's likely to be a decent portion of it (and if the external drive is non-SSD and/or over slower USB, then that could be a bottleneck for the DB reads). The other part would be annotation, if you have a lot of new content available in the fallback dirs. But for that, it shouldn't be much slower than if the content were on your primary disk/content path. Could you time the stages based on log outputs when you run scanforcontent?

jamalex · 2020-05-21T00:15:14Z

In either case, we'll need Gherkin stories for these changes in order for them to test it eventually.

Opened a tech debt issue here:
#6904

…o content_fallback_paths

codecov · 2020-05-26T20:00:32Z

Codecov Report

Merging #6865 into app-support will decrease coverage by 0.03%.
The diff coverage is 46.15%.

Impacted Files	Coverage Δ
...core/content/management/commands/scanforcontent.py	`0.00% <0.00%> (ø)`
kolibri/utils/server.py	`36.33% <12.50%> (-1.88%)`	⬇️
kolibri/core/views.py	`82.85% <40.00%> (-4.52%)`	⬇️
kolibri/core/content/utils/paths.py	`95.49% <92.15%> (-3.24%)`	⬇️
kolibri/core/content/upgrade.py	`76.66% <100.00%> (ø)`
kolibri/core/content/utils/channels.py	`89.04% <100.00%> (+0.98%)`	⬆️
kolibri/core/urls.py	`100.00% <100.00%> (ø)`
kolibri/core/content/utils/check_schema_db.py	`100.00% <0.00%> (+8.69%)`	⬆️

App support

Comments addressed.

CONTENT_FALLBACK_DIRS option along with cherrypy/devserver support.

78ae314

jamalex added the work-in-progress Not ready for review label May 11, 2020

jamalex added this to the 0.13.3 milestone May 11, 2020

rtibbles previously requested changes May 11, 2020

View reviewed changes

jamalex added 2 commits May 11, 2020 21:09

Get scanforcontent/upgrade working with fallback content dirs

9e0df31

Add scanforcontent args to filter by channel and modify import behavior

a60aae5

indirectlylit added the TODO: needs gherkin update Add to our manual integration tests label May 19, 2020

indirectlylit changed the base branch from release-v0.13.x to app-support May 20, 2020 22:56

indirectlylit modified the milestones: 0.13.3, flatpack-mvp May 20, 2020

jamalex added 2 commits May 20, 2020 16:58

Validate the path list for KOLIBRI_CONTENT_FALLBACK_DIRS

dd39641

Add a short-circuit if there's only one possible handler to dispatch

6a90799

jamalex mentioned this pull request May 21, 2020

Automated tests and gherkin stories still needed for #6865 #6904

Closed

jamalex added 4 commits May 20, 2020 17:19

Merge branch 'app-support' of github.com:learningequality/kolibri int…

9ec24ad

…o content_fallback_paths

Fix path expansion logic for multi-path lists

e9e7b02

Fix get_content_fallback_pathspaths function to handle list, not str

6422761

run black on updated code

dabc4a1

jamalex force-pushed the content_fallback_paths branch from 22a465e to dabc4a1 Compare May 22, 2020 19:52

Bump zeroconf to 0.19.6 for flatpak-compatibility

9a3dfc0

Merge pull request #4 from learningequality/app-support

5d05c61

App support

jamalex removed the work-in-progress Not ready for review label May 26, 2020

jamalex merged commit aa4e9d5 into learningequality:app-support May 26, 2020

jamalex deleted the content_fallback_paths branch May 28, 2020 00:35

indirectlylit modified the milestones: flatpak-mvp, 0.14.0 Jul 3, 2020

indirectlylit added the changelog Important user-facing changes label Jul 3, 2020

indirectlylit mentioned this pull request Jul 3, 2020

Links to Content on Linux Fail in 0.13.0, 0.13.2, 0.13.3 #7003

Closed

radinamatic removed the TODO: needs gherkin update Add to our manual integration tests label Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTENT_FALLBACK_DIRS option along with cherrypy/devserver support. #6865

CONTENT_FALLBACK_DIRS option along with cherrypy/devserver support. #6865

jamalex commented May 11, 2020 •

edited

rtibbles left a comment

rtibbles May 11, 2020

jamalex May 12, 2020

rtibbles commented May 13, 2020

kollivier commented May 19, 2020 •

edited

indirectlylit commented May 19, 2020

kollivier commented May 19, 2020

jredrejo commented May 20, 2020

jamalex commented May 21, 2020 •

edited

jamalex commented May 21, 2020

codecov bot commented May 26, 2020 •

edited

CONTENT_FALLBACK_DIRS option along with cherrypy/devserver support. #6865

CONTENT_FALLBACK_DIRS option along with cherrypy/devserver support. #6865

Conversation

jamalex commented May 11, 2020 • edited

Summary

Reviewer guidance

TODO

Contributor Checklist

Reviewer Checklist

rtibbles left a comment

Choose a reason for hiding this comment

rtibbles May 11, 2020

Choose a reason for hiding this comment

jamalex May 12, 2020

Choose a reason for hiding this comment

rtibbles commented May 13, 2020

kollivier commented May 19, 2020 • edited

indirectlylit commented May 19, 2020

kollivier commented May 19, 2020

jredrejo commented May 20, 2020

jamalex commented May 21, 2020 • edited

jamalex commented May 21, 2020

codecov bot commented May 26, 2020 • edited

Codecov Report

jamalex commented May 11, 2020 •

edited

kollivier commented May 19, 2020 •

edited

jamalex commented May 21, 2020 •

edited

codecov bot commented May 26, 2020 •

edited