there should be tests for ReceiveBuffer's big-O behavior #16

njsmith · 2016-10-12T06:51:51Z

It's security-critical that ReceiveBuffer's methods have linear worst-case complexity, and the code we use to accomplish this is non-trivial. There should be tests that we have successfully avoided the O(n^2) trap.

Automated tests for speed are difficult, but I think it's not so bad in this case, because quadratic behavior is so dramatic. So like we could do a worst case test (e.g. repeatedly read 1 byte from an N byte buffer) with N = k and N = 100*k. If we got it right then the latter will be 100x slower; if we got it wrong it'll be 10,000x slower. So we can check that it's less than 1,000x slower, and that gives us a large safety margin on both sides.

Edit: _obsolete_line_fold in _readers.py is another place where it'd be very easy to be accidentally quadratic.

The text was updated successfully, but these errors were encountered:

eugene-eeo · 2016-10-16T03:48:36Z

@njsmith not too familiar with the codebase but this seems reasonable. You might want to explore the JIT disabling flags for PyPy though (if you're testing against it) in order to prevent any well intentioned surprises. Also, any chance of just writing your own bytearray ~~module~~ implementation in pure Python, then testing deletion from the front? I imagine it would be much simpler.

update:

$ pypy --jit off

should do the trick.

njsmith · 2016-10-16T05:19:02Z

We do test against PyPy, yes. I'm not quite sure how the JIT would cause a problem here, and I'd be nervous about testing a configuration we don't run on, since this would be a safety check, but it makes sense to watch out for it.

ReceiveBuffer is already a pretty thin layer on top of bytearray. (I say "non-trivial" in the original post, and it is, but I just mean "complicated enough to need a test". It's still only like 30 lines or something :-). A little bit of that is to get O(1) reads from the front; more of it is to get good big-O behavior for doing repeated searches for the same terminator (e.g. line endings) in a buffer that keeps growing as more data arrives.

eugene-eeo · 2016-10-16T06:26:45Z

@njsmith I was assuming that in order to do timing tests you'd run the function repeatedly and compare the total running times. But if you're suggesting to only run the function once the JIT would probably not have a large impact if it didn't already compile the code due to repeated calls from the previous tests.

My worry is that in the middle of running the code repeatedly, say 1000 times for N=k, when the tests start for N=100*k the JIT kicks in and optimises the method, which may produce much faster code and thereby hiding the big-O behaviour.

njsmith · 2016-10-16T06:55:13Z

Ahh, I see, right, that would be a big concern! I was imagining that we'd pick k such that the shorter test takes long enough that it's timing is not too hard to measure (likes tens of milliseconds say) so that we can get reasonable timings by running it ~3-5 times. There still is the concern that because the test itself contains a loop the JIT might kick in at some point (there's a for i in range(k): ...) even with a small number of runs. I can imagine a few possible counter-measures... pre-warm the test by running it "enough" times before taking timings, interleave the runs of the small and large tests so that if the JIT kicks in at some point it affects them both ~equally. Or just always run the big large test first, so if the JIT does mess things up it will cause a visible failure. Or maybe it won't even come up as a problem :-). Probably we need to try it before spending too much time worrying.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

there should be tests for ReceiveBuffer's big-O behavior #16

there should be tests for ReceiveBuffer's big-O behavior #16

njsmith commented Oct 12, 2016 •

edited

eugene-eeo commented Oct 16, 2016 •

edited

njsmith commented Oct 16, 2016

eugene-eeo commented Oct 16, 2016

njsmith commented Oct 16, 2016

there should be tests for ReceiveBuffer's big-O behavior #16

there should be tests for ReceiveBuffer's big-O behavior #16

Comments

njsmith commented Oct 12, 2016 • edited

eugene-eeo commented Oct 16, 2016 • edited

njsmith commented Oct 16, 2016

eugene-eeo commented Oct 16, 2016

njsmith commented Oct 16, 2016

njsmith commented Oct 12, 2016 •

edited

eugene-eeo commented Oct 16, 2016 •

edited