Poll faculty for spring changes + improvements #30

ml8 · 2021-10-26T19:16:20Z

@anesepark and @walidabualafia -- can you chat with @msuperdock @betswms @CatieWelsh @pkirlin about any improvements they might imagine for the environment and libraries for the spring semester? It looks like @msuperdock is the only one teaching 141 in the spring.

Some friction points are noted in issues already (e.g., trusting notebooks, making github pulling less error-prone, etc.). Can you see what else has been points of friction for students and faculty? Send out an email?

msuperdock · 2021-10-27T03:41:44Z

Unless something changes, I will be the only one teaching 141 in the spring.

A few points of friction that come to mind quickly, for me:

Server load times are slower than I'd like, and seemed to get slower in early October, but seem a bit better now. Obviously this is a big & vague problem, but I wonder whether there might be a simple way to track server load times. It would be nice to know whether load times really are fluctuating for everyone, or be alerted whenever there's a really slow load time.
Do we really need individual servers per student? This is a naive question, and I'm assuming individual servers is just what JupyterHub supports, but what would happen if we had one server with lots of user accounts?
I'd like to be able to more easily find a particular student's notebook from the admin page. Currently the only identifier is a student's Rhodes email; it would be nice to be able to see a name also.
It often happens that a student (or me) gets some kind of error, e.g. "Forbidden," in the middle of working on a notebook. The only fix I'm aware of is to open rhodes-notebook.org from a different tab. Can we make this happen less often?

OK, now here are some dreams of how this whole system could work better. I don't expect these to be easily implementable, but I figure I'll share them anyway:

What if we could get live results of students' completion of a set of in-class exercises? (based on automated tests)
What if we could quickly pull up a student's solution to an in-class problem to look at it as a class?
What if a teacher could quickly look through different students' in-progress solutions to a single in-class problem?
What if a teacher could quickly push code not already in the notebook to the whole class?
What if students could work collaboratively in a notebook, Google-docs style?

msuperdock · 2021-10-27T03:47:23Z

Another point of friction--to post a notebook on my JupyterHub to the class, I have to download it and commit it to our course repository. I'd like this to be simpler.

It would be nice if I could maintain a folder (e.g. superdock-dev) on my JupyterHub, then run a command and have superdock-dev pushed to everyone's notebooks as superdock.

ml8 · 2021-10-27T06:07:12Z

This is great feedback! Lots of things that @anesepark and @walidabualafia can look through here. I'll offer some initial thoughts.

Server load times are slower than I'd like, and seemed to get slower in early October, but seem a bit better now. Obviously this is a big & vague problem, but I wonder whether there might be a simple way to track server load times. It would be nice to know whether load times really are fluctuating for everyone, or be alerted whenever there's a really slow load time.

There's a few things on the critical path to being dropped into the server:

Provisioning resources: we hold slots for ~3 (I believe) new arrivals, but when a class starts, we always need to provision resources. If we over-provision by ~50% of a node's capacity, that would reduce this time. This is the easiest/quickest win.
Pulling docker image, mounting disk, and booting: this is the minimum critical path and can't be reduced without modifying our deployment model and deviating from JupyterHub supported configs. Fortunately it is probably the quickest (<30s).
Running post-start hooks: here, we pip install the library and do the sync with the course repo. When there is new material, this could take a few seconds. The install could be fixed by putting the install of course libraries in the Docker image, but that prevents hot fixes. Since it's no longer being actively developed, this sounds like a win. Syncing the repo is a fixed cost if we go with the synced repo approach.
Getting telemetry should be fairly easy. These are pod events and are logged, so we could at the very least run over the logs to determine mean and 90% boot times.

Do we really need individual servers per student? This is a naive question, and I'm assuming individual servers is just what JupyterHub supports, but what would happen if we had one server with lots of user accounts?

If you mean physical server: this scales to about 1-2 classes max and requires a big box (e.g., 192GB of RAM, 32 cores, etc.). Running in the cloud, this is a HUGE spend, since memory dominates vCPU cost. I've proposed on-prem, but then we run into...

But more importantly this is essential for isolation. There's some notes about this in the docs, but capping CPU, memory, and disk without containerizing is a pain. e.g., OOM killer will kill all processes, not just the culprit server. If we run containers, that's this solution.

I'd like to be able to more easily find a particular student's notebook from the admin page. Currently the only identifier is a student's Rhodes email; it would be nice to be able to see a name also.

We choose the OAuth response field for username. Issue here is ensuring IDs are unique--email provides that. We could explore concatenating different fields.

It often happens that a student (or me) gets some kind of error, e.g. "Forbidden," in the middle of working on a notebook. The only fix I'm aware of is to open rhodes-notebook.org from a different tab. Can we make this happen less often?

This happens when the user is active and not idle? I assume this is a OneLogin integration issue (maybe a timeout? spurious invalidation?). We should investigate. These are in the server logs and we can root-cause. Please ping the slack when this next happens and they can investigate.

What if we could get live results of students' completion of a set of in-class exercises? (based on automated tests)

Hacky solution-enhance our wrapper to Okpy and either submit to okpy OR just ping an endpoint (e.g., our own or the sheets api or...).

What if we could quickly pull up a student's solution to an in-class problem to look at it as a class?

Again, a hack: This would be a cool add-on to the library: upload a cell's contents to a pastebin (or our own endpoint). A more robust solution breaks the JupyterHub security model, I believe.

What if a teacher could quickly look through different students' in-progress solutions to a single in-class problem?

This would be covered by the above idea. Or something similar. We could also explore a plug-in to use the admin API to authenticate against a bunch of user servers.

What if a teacher could quickly push code not already in the notebook to the whole class?

If we are using nbgitpuller+GitHub to distribute files, this can be done with a periodic poll mechanism, or by moving to the link-based sync in the docs.

We could also add something to the library or the UI (if we want to start writing JupyterHub plugins) to save the current notebook and run nbgitpuller.

What if students could work collaboratively in a notebook, Google-docs style?

That would be cool. Fwiw Collab (doing exactly your use case for paper graphs right now!) and other IPython front ends (I believe) support this, but not JupyterHub.

If we want to consider moving to Collab or something like it, I would not be opposed, assuming we could get feature parity, especially for admin access.

ml8 · 2021-10-27T06:12:26Z

Another point of friction--to post a notebook on my JupyterHub to the class, I have to download it and commit it to our course repository. I'd like this to be simpler.

It would be nice if I could maintain a folder (e.g. superdock-dev) on my JupyterHub, then run a command and have superdock-dev pushed to everyone's notebooks as superdock.

So, disk isolation prevents this. Non-isolated disk was a huge issue but is 100% simpler to do this sort of stuff.

I could imagine a back-channel, but that's kind of a hack.

The link-based pull is a nice middle ground. Or other ideas above.

But fwiw I did all of my course development on the server. There's a CLI with git (just choose new shell vs. notebook). I didn't install ssh and use 2fa for GitHub (which only allows ssh), so had to generate a token, but that was the most annoying part. There's a small note in the docs about that.

ml8 · 2021-10-27T06:14:03Z

Again this is awesome feedback and we can start to address this or at least gather some telemetry. @anesepark and @walidabualafia can you ping the Slack and I can point you to some things?

ml8 · 2021-10-27T06:25:10Z

Another thought: we could probably do a global shared read-only file system. Kubernetes supports it. That might facilitate some reduction in friction.

Ultimately we use nbgitpuller because it will do smart diff resolving on a per file basis and guarantees not to clobber student work and get the latest version. We could point it at a shared volume instead of GitHub.

When we started with this project, we (mostly me and Catie) used script-based distribution, but that had issues that nbgitpuller addresses. Also, isolating disk is a pretty important requirement.

ml8 · 2021-10-27T07:01:28Z

Just random thoughts:

We're an interesting use case of this deployment model, since we require some degree of "productionization" but also want to retain some agility. Other users seem to be big enough scale that rapid iteration is not a requirement (e.g., Berkeley classes w/ Nxxx students/semester like data8).
Since Matt is the only one teaching in Spring, it might(?) be worth exploring a different model that meets just his needs (like I did when I started). I'm willing to help get that off the ground, but I think it might end up with the same problems we repeatedly ran into when we scaled to bigger than just my courses.
Along the same lines, if this is getting in the way more than helping, we can always let it idle (no pun intended 😉).
The pastebin idea seems neat. A possibly-deanonymized pastebin for these sorts of things sounds cool. There's a ton of clones out there that could be quickly deployed.

CatieWelsh · 2021-10-27T14:20:33Z

While the updated cs1.graphics library is great in that it allows for clicking on the canvas and students don't have to call anything extra to have their image show up, one of the issues we're having this semester is that the notebooks no longer save the images when they're submitted to okpy. So, when we need to grade projects that include graphics, our graders have to re-run all the projects on the JupyterHub server to see the images that the programs create. Before the changes were made to the graphics library, these projects were very easy to grade since you only had to look at the notebook to see the drawing. This currently affects Project 3 (Potato Heads), Project 6 (Connect the Dots), and Project 8 (PPM Images). Is there anyway to modify cs1.graphics library to make the images persist when a notebook is submitted?

ml8 · 2021-10-28T04:12:37Z

While the updated cs1.graphics library is great in that it allows for clicking on the canvas and students don't have to call anything extra to have their image show up, one of the issues we're having this semester is that the notebooks no longer save the images when they're submitted to okpy. So, when we need to grade projects that include graphics, our graders have to re-run all the projects on the JupyterHub server to see the images that the programs create. Before the changes were made to the graphics library, these projects were very easy to grade since you only had to look at the notebook to see the drawing. This currently affects Project 3 (Potato Heads), Project 6 (Connect the Dots), and Project 8 (PPM Images). Is there anyway to modify cs1.graphics library to make the images persist when a notebook is submitted?

I think this is probably do-able -- either as part of the submit or as part of the library.

Fwiw for the programs that output images, I asked my grader to download the zip of the submissions, unzip it on their server, and then run this from a notebook:

import os
pys = []
for path, dirs, files in os.walk('.'):
    for file in files:
        if file.endswith(".py"):
            pys.append(os.path.join(path, file))

for py in sorted(pys):
    print(py)
    %run {py}

That gives them the user + the output image.

ml8 · 2021-10-28T04:48:04Z

Re: saving images...

I think the best thing we can do is wait for this feature to make it into the underlying library (tracked in this issue), which recently saw some progress with this PR.

I can imagine other paths, like using the widget's observe to register a callback like in the docs.

Ultimately, there's not going to be really clean way to get the previous behavior back. The main reason is that we were previously having the cell output be a png. Animation/image updates worked in a hacky way, by updating the output of the cell. The underlying library that we using now facilitates interaction, as well as making animation much smoother by being an HTML5 canvas. So, the cell output is a widget object that knows how to display itself, rather than a png.

If we want image files to be part of submissions, I think the best way to do this is to explicitly save them and submit them. I think reverting to the previous implementation would be regression in other dimensions.

We already have save_image_as_filename in the API surface. Maybe we can just add that as a cell in the notebook? We can even then just load the image and display it during submit (as the cell output, so it is persisted in the checkpoint), so that the grader can just look at the notebook? This could be an API call that wraps both (e.g., checkpoint_image() or submit_image() or something).

Thoughts?

ml8 · 2021-10-28T06:45:37Z

Rhodes-CS-Department/comp141-libraries#10 is a POC of what I meant, obvious needs a better UX.

msuperdock · 2021-12-16T03:12:06Z

If I were to choose two issues to resolve for the upcoming spring semester, it would be these:

(1) Addressing the "Forbidden" error I mentioned above. @matthewlang, sorry for never pinging you about this, but I'm pretty sure I could reproduce it without trouble. It happens even while active, and it seems to consistently happen a certain amount of time after initial authentication. (For example, it typically happens once during my 11a class, after first opening JupyterHub shortly before my 10a class.) It affects students as well as me. My standard fix is to open rhodes-notebook.org in another tab, which will open immediately without requiring a OneLogin authentication; then I immediately close this tab, return to my original tab, and everything's fine. If you'd like, I could try to reproduce it and give time of initial login & time of error.

(2) This is one I haven't mentioned above--usually among 50 submissions for a lab or project, I have about two that are just empty on okpy--the student's notebook says "submitted successfully," but no files are actually sent to okpy. If the student clicks the URL, they catch the issue and submit again. But my less conscientious students don't bother clicking the URL, and I've even had instances where a student repeatedly tries to submit and can't get anything to actually submit. Any insight into why this happens? The ideal fix would be to make this not happen; a perhaps more attainable fix is to give the student a specific warning if the files don't actually go through.

ml8 · 2021-12-16T23:24:29Z

(1) Addressing the "Forbidden" error I mentioned above. @matthewlang, sorry for never pinging you about this, but I'm pretty sure I could reproduce it without trouble. It happens even while active, and it seems to consistently happen a certain amount of time after initial authentication. (For example, it typically happens once during my 11a class, after first opening JupyterHub shortly before my 10a class.) It affects students as well as me. My standard fix is to open rhodes-notebook.org in another tab, which will open immediately without requiring a OneLogin authentication; then I immediately close this tab, return to my original tab, and everything's fine. If you'd like, I could try to reproduce it and give time of initial login & time of error.

I figured this out, I think. I mean I should verify the following is correct by looking at the logs, but that seems like work... ;)

Here's what I think is happening:

JupyterHub has two OAuth flows -- one to authenticate the user against an external OAuth for access to the hub (this is OneLogin in our case) and the other is an internal OAuth, where the hub is an OAuth provider for user notebook servers. When you do the first flow, the hub will generate an OAuth token that you pass to the server in a cookie, and the server validates the token against the hub.

The default timeout for this used to be an hour but was changed to be configurable and default to one day in this commit. A subsequent commit updated this to the cookie max age, which is 14 days.

We are currently using version 0.11.1 of the helm chart, which is version 1.3.0 of JupyterHub. If you look at the release tags for those commits, they are in everything after 1.4.0. Updating to at least 1.0.0 will pick up those changes, and a 14 day timeout seems reasonable and I don't think we need to further configure it.

Action item here is to update our helm chart version and pick up all of the post 1.3.0 changes. @anesepark @walidabualafia

(2) This is one I haven't mentioned above--usually among 50 submissions for a lab or project, I have about two that are just empty on okpy--the student's notebook says "submitted successfully," but no files are actually sent to okpy. If the student clicks the URL, they catch the issue and submit again. But my less conscientious students don't bother clicking the URL, and I've even had instances where a student repeatedly tries to submit and can't get anything to actually submit. Any insight into why this happens? The ideal fix would be to make this not happen; a perhaps more attainable fix is to give the student a specific warning if the files don't actually go through.

Yea, this is an ongoing issue with okpy. I haven't debugged why this happens. It looks like the okpy maintainers haven't either, looking at their issues.

ml8 · 2021-12-16T23:39:10Z

As to why you were getting a 403, it looks like that would happen if the page is long-lived. Refreshing the page or going to the root url (and then being redirected to the server) generates a new token.

ml8 · 2021-12-30T21:16:01Z

#34 updates the jupyterhub version. Leaving open for the time being in order to capture suggestions

ml8 mentioned this issue Oct 26, 2021

Update helm chart + docker image #31

Closed

ml8 added a commit that referenced this issue Oct 30, 2021

for #30 - move 141 lib install out of critical path

bd0be3d

ml8 added a commit that referenced this issue Oct 30, 2021

for #30 - overprovision by a node

4bcf23f

ml8 added a commit that referenced this issue Jul 17, 2022

for #30 - move 141 lib install out of critical path

2924515

ml8 added a commit that referenced this issue Jul 17, 2022

for #30 - overprovision by a node

bd2db2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poll faculty for spring changes + improvements #30

Poll faculty for spring changes + improvements #30

ml8 commented Oct 26, 2021

msuperdock commented Oct 27, 2021

msuperdock commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021 •

edited

CatieWelsh commented Oct 27, 2021

ml8 commented Oct 28, 2021 •

edited

ml8 commented Oct 28, 2021

ml8 commented Oct 28, 2021

msuperdock commented Dec 16, 2021

ml8 commented Dec 16, 2021

ml8 commented Dec 16, 2021

ml8 commented Dec 30, 2021

Poll faculty for spring changes + improvements #30

Poll faculty for spring changes + improvements #30

Comments

ml8 commented Oct 26, 2021

msuperdock commented Oct 27, 2021

msuperdock commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021

ml8 commented Oct 27, 2021 • edited

CatieWelsh commented Oct 27, 2021

ml8 commented Oct 28, 2021 • edited

ml8 commented Oct 28, 2021

ml8 commented Oct 28, 2021

msuperdock commented Dec 16, 2021

ml8 commented Dec 16, 2021

ml8 commented Dec 16, 2021

ml8 commented Dec 30, 2021

ml8 commented Oct 27, 2021 •

edited

ml8 commented Oct 28, 2021 •

edited