Question: efficiency when only a few keys are used #29

sanmai-NL · 2017-12-19T20:22:18Z

I suppose the full mapping will be included in the binary when only a few keys are used?

abonander · 2017-12-19T20:37:47Z

Can you elaborate? Because if you can statically prove that you'll only need certain keys, then this library will be kind of redundant won't it? You can just hardcode those in.

sanmai-NL · 2017-12-19T20:52:26Z

Suppose this crate would support 1 000 different filename extensions as keys. Of those, I need 50. My thinking is ... Manually defining 50 media types isn’t very maintainable. Doing that also wouldn’t allow comparing for equality with proper RFC 6838 media type strings, since their grammar is more complex than just a type and subtype.

Currently mime has evolved to one item per Mime value. Problem is of course, the limited number of media types defined in there and the question how many should even be added (whole IANA registry?). How does your current solution based on PHF maps compare in terms of efficiency to a match on a bunch of extensions that evaluates to mime Mime const items?

Update: manually declaring mappings of extensions to Mime values also isn’t very maintainable. So I’m speaking about the keys as well. I hope to reuse a standardized mapping like yours. I see, however, that this crate still parses the media type strings that the keys were mapped to at runtime. Doesn’t that defeat the point (or let’s say limit the usefulness) of the static mapping?

abonander · 2017-12-19T21:18:43Z

Are we talking storage or search time constraints? It kind of seems like you're asking about both at once.

PHF maps are guaranteed to be O(1) in the worst-case: they don't deal with collisions, the hash function is tuned so that each key maps directly to the expected index slot, and if it doesn't match then the key isn't in the map.

I don't know when match cases go from jump tables to binary searches in Rust/LLVM but that's entirely implementation-defined, so it could be O(1) with so many mappings and suddenly switch to O(n log n) (or some super-constant hybrid approach) because of heuristics in the optimizer. LLVM could statically eliminate unused cases but this library is primarily intended for arbitrary user input where removing parts of the mapping is counterproductive.

We could structure the mappings differently so there's a separate map for every top-level type, so two lookups per search but searching fewer elements overall. It'd be relatively trivial at that point to disable certain top-level types based on the intended application, but all that would reduce is binary size (since the O-factor on the top-level search is negligible).

sanmai-NL · 2017-12-19T21:30:02Z

I’m discussing both space and time yeah. I suspect though that that the media type parsing time cost dwarfs the things you mention.

abonander · 2017-12-19T23:11:26Z

We could structure the mappings differently so there's a separate map for every top-level type, so two lookups per search but searching fewer elements overall.

I forgot that we're actually already doing this.

abonander · 2017-12-20T20:12:31Z

Yeah, when Rust someday gets better const-eval we could potentially run the Mime constructors at compile-time. With cache misses it's hard to say whether the search or the mime parsing will be slower. I guess I should profile it but I haven't heard any complaints thus far. I have benchmarks implemented but they're rather naive and I haven't run them in forever anyway.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: efficiency when only a few keys are used #29

Question: efficiency when only a few keys are used #29

sanmai-NL commented Dec 19, 2017

abonander commented Dec 19, 2017

sanmai-NL commented Dec 19, 2017 •

edited

abonander commented Dec 19, 2017

sanmai-NL commented Dec 19, 2017

abonander commented Dec 19, 2017

abonander commented Dec 20, 2017

Question: efficiency when only a few keys are used #29

Question: efficiency when only a few keys are used #29

Comments

sanmai-NL commented Dec 19, 2017

abonander commented Dec 19, 2017

sanmai-NL commented Dec 19, 2017 • edited

abonander commented Dec 19, 2017

sanmai-NL commented Dec 19, 2017

abonander commented Dec 19, 2017

abonander commented Dec 20, 2017

sanmai-NL commented Dec 19, 2017 •

edited