Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there should be tests for ReceiveBuffer's big-O behavior #16

Open
njsmith opened this issue Oct 12, 2016 · 4 comments
Open

there should be tests for ReceiveBuffer's big-O behavior #16

njsmith opened this issue Oct 12, 2016 · 4 comments

Comments

@njsmith
Copy link
Member

njsmith commented Oct 12, 2016

It's security-critical that ReceiveBuffer's methods have linear worst-case complexity, and the code we use to accomplish this is non-trivial. There should be tests that we have successfully avoided the O(n^2) trap.

Automated tests for speed are difficult, but I think it's not so bad in this case, because quadratic behavior is so dramatic. So like we could do a worst case test (e.g. repeatedly read 1 byte from an N byte buffer) with N = k and N = 100*k. If we got it right then the latter will be 100x slower; if we got it wrong it'll be 10,000x slower. So we can check that it's less than 1,000x slower, and that gives us a large safety margin on both sides.

Edit: _obsolete_line_fold in _readers.py is another place where it'd be very easy to be accidentally quadratic.

@eugene-eeo
Copy link

eugene-eeo commented Oct 16, 2016

@njsmith not too familiar with the codebase but this seems reasonable. You might want to explore the JIT disabling flags for PyPy though (if you're testing against it) in order to prevent any well intentioned surprises. Also, any chance of just writing your own bytearray module implementation in pure Python, then testing deletion from the front? I imagine it would be much simpler.

update:

$ pypy --jit off

should do the trick.

@njsmith
Copy link
Member Author

njsmith commented Oct 16, 2016

We do test against PyPy, yes. I'm not quite sure how the JIT would cause a problem here, and I'd be nervous about testing a configuration we don't run on, since this would be a safety check, but it makes sense to watch out for it.

ReceiveBuffer is already a pretty thin layer on top of bytearray. (I say "non-trivial" in the original post, and it is, but I just mean "complicated enough to need a test". It's still only like 30 lines or something :-). A little bit of that is to get O(1) reads from the front; more of it is to get good big-O behavior for doing repeated searches for the same terminator (e.g. line endings) in a buffer that keeps growing as more data arrives.

@eugene-eeo
Copy link

@njsmith I was assuming that in order to do timing tests you'd run the function repeatedly and compare the total running times. But if you're suggesting to only run the function once the JIT would probably not have a large impact if it didn't already compile the code due to repeated calls from the previous tests.

My worry is that in the middle of running the code repeatedly, say 1000 times for N=k, when the tests start for N=100*k the JIT kicks in and optimises the method, which may produce much faster code and thereby hiding the big-O behaviour.

@njsmith
Copy link
Member Author

njsmith commented Oct 16, 2016

Ahh, I see, right, that would be a big concern! I was imagining that we'd pick k such that the shorter test takes long enough that it's timing is not too hard to measure (likes tens of milliseconds say) so that we can get reasonable timings by running it ~3-5 times. There still is the concern that because the test itself contains a loop the JIT might kick in at some point (there's a for i in range(k): ...) even with a small number of runs. I can imagine a few possible counter-measures... pre-warm the test by running it "enough" times before taking timings, interleave the runs of the small and large tests so that if the JIT kicks in at some point it affects them both ~equally. Or just always run the big large test first, so if the JIT does mess things up it will cause a visible failure. Or maybe it won't even come up as a problem :-). Probably we need to try it before spending too much time worrying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants