Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

Performance improvements #111

Open
PavelCibulka opened this issue Apr 9, 2015 · 1 comment
Open

Performance improvements #111

PavelCibulka opened this issue Apr 9, 2015 · 1 comment

Comments

@PavelCibulka
Copy link

I've examined, why user agent parsing is slow. Here are some tips:

This could be done just with HashMap<String, Robot>. Note no regexp here.
AbstractUserAgentStringParser.examineAsBrowser()
for (final Robot robot : data.getRobots()) {
if (robot.getUserAgentString().equals(builder.getUserAgentString())) {

Lazy OS detection. OS is not always needed.
Lazy Device detection. Same here. Device is not always needed.

Whole regular expression loop. This is probably good for development and maintenance but not so great for performance. Here is idea:
We can make enum with some tests and check browser EnumSet, if contains this Enum before testing regex. Example:
EnumTest1: User agent starts with string "Mozilla"
If this return false, don't test any rexep that start with /^Mozilla

EnumTest2: User agent starts with string "M"
If this return true, don't test any regex starting with /^ but not starting with /^M

There are 631 <browser_reg>, 150 starts with /^Mozilla, 246 starts with /^ but not with /^M. This two checks can be implemented without any change to uasdata.

There also can be list of words that uastring has to contain. Split the UA string into HashMap with words and check this rules before regexp. This would be fast. Example:
/mozilla._AppleWebKit._NetFrontLifeBrowser/([0-9.]+)/si
requiredWords: mozilla, AppleWebKit, NetFrontLifeBrowser
test: if ( hashmap.containsAll( requiredWords ) )
This would need probable new field for required words in uasdata.

Regards, Pavel

@arouel
Copy link
Owner

arouel commented Apr 11, 2015

@PavelCibulka sounds good. Would you do a Pull Request that prototypes your proposed changes?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants