Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JA4+ #204

Open
dhw opened this issue Nov 3, 2023 · 2 comments
Open

JA4+ #204

dhw opened this issue Nov 3, 2023 · 2 comments

Comments

@dhw
Copy link

dhw commented Nov 3, 2023

Has anyone tested the fingerprints for this yet?

@jjsaunier
Copy link
Contributor

jjsaunier commented Nov 3, 2023

yes, it's interesting and not interesting at the same time; it's under patent and miss important data point of modern TLS

  • Extension settings are finger printable - like ja3 it's ignored except for SNI (see next point) and ALPN
  • GREASE must be part of the fingerprint, position and occurrence (basically replaced via a fixed placeholder as value) - GREASE and non GREASE TLS produce the same hash which is innacurate
  • In TLS1.2, 1.3 introduced the concept of session (ticket, PSK) - completely ignored like JA3 - it's not really a cristism, because it's not that easy to fingerprint that and can be choice on purpose, but still. (This required the support of fuzzy hash to be implemented correctly, but see below with pseudo-fuzzy)
  • SNI is part of the fingerprint (That means each site will produced a dedicated hash, no matter if the TLS FP is the same) the target domain is totally unrelated to the TLS and now the fingerprint does not really reflect it and mismatch
  • I don't understand the split of the hash into pieces, pseudo fuzzy hash I guess do to something like LIKE %_xxx_xxx, just do fuzzy hash like ssdeep
  • The methodology and the logic to build the FP is complex for nothing IMO
  • There is no dataset comparison between ja3 and ja4 TLS to measure the matching accuracy of client identification (due to SNI it's already out) and false positive, so without any measure to compare with, it's not better and can be worst

So I don't think it's serve the same purpose of ja3 despite it's called ja4, the patent and licensing already killed it I think

@dhw
Copy link
Author

dhw commented Nov 3, 2023

I thought the hash in JA4 was aimed to make it easier to search logs for set parts in a fingerprint instead of a whole md5 hash allowing you to narrow down a client easier and I suppose sell you access to a database.

A good example was this

For example; GreyNoise is an internet listener that identifies internet scanners and is implementing JA4+ into their product. They have an actor who scans the internet with a constantly changing single TLS cipher. This generates a massive amount of completely different JA3 fingerprints but with JA4, only the b part of the JA4 fingerprint changes, parts a and c remain the same. As such, GreyNoise can track the actor by looking at the JA4_ac fingerprint (joining a+c, dropping b).

Tho this does not apply in this situation due to trying to not be unique. The only other thing I have run into is order of headers and correct format.

The databases I have come across don't contain User-Agent and either have just a md5 hash would be nice to have to have the string that makes up the hash with it. Anyways thanks for letting me know this will work just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants