Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New wordlists #106

Open
piegamesde opened this issue Jan 15, 2021 · 1 comment
Open

New wordlists #106

piegamesde opened this issue Jan 15, 2021 · 1 comment

Comments

@piegamesde
Copy link
Member

piegamesde commented Jan 15, 2021

Since I've started using Magic Wormhole, I've never been fond of the word list it uses. Quick summary:

  • It's called the PGP word list
  • It contains two sets of 256 words which are used in alternance (one byte per word)
  • The words are chosen to be phonetically distinct for oral transmission.

Here's the problems I have with it:

  • Not prefix unique
  • The phonetic distinction and the even/odd thing only work if you already know the word list by heart. The words may still be similar to others that are not in the list. Of course, this won't help the average user, so I'd argue that this property does not hold in practice.
  • Very small list, giving only 8 bit entropy per word.
  • Has not been translated yet
  • Uppercase letters

I think if we give up holding on that "phonetically distinct" property (as it does not really work out IMO), we can start discussing more secure (and localized) alternatives.

  • The easiest step is to merge the even and the odd word list, giving one additional bit of entropy per word. This would quarter the chance of an attacker randomly guessing a code.
  • Diceware uses a wordlist with 7,776 words, giving almost 13 bits of entropy. However, I haven't found any convincing translations yet.
  • Linked from the python issue (below), I've found the BIP 39 word list (bitcoin enhancement proposal). It has several nice properties:
    • The first three or four characters uniquely identify a word, good for auto-completion
    • 12 bits of entropy (I think)
    • Translated into multiple languages. There's a list of criteria for each of the languages how the words were picked
  • Don't use words for codes
    • Only 4 hex characters are needed for 16 bit entropy. That's still easy to remember: "C6A3"
    • Using the alphabet gives 36⁴ possibilities, or more than 20 bit entropy (we'd get the same with 5 hex digits FWIW)
    • Obvious problem is how to communicate casing (even if we say it doesn't matter)

If we have multiple word lists or code generation schemes, we then need a way for the user to choose one. CLI flags alone won't cut it, as specifying them every single time is tedious.

Python issue: magic-wormhole/magic-wormhole#301

@piegamesde
Copy link
Member Author

piegamesde commented Nov 19, 2021

Update:
The BIP word list translations sadly kind of died off. In any case, translating word lists is really hard and probably not worth it. I instead found https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases, which includes a really great one: 10.3 bits/word, three letter prefix uniqueness (the only one from the diceware category). If we stick to words, this would be it.

On the other hand, the non-word codes still sound quite appealing to me. Hex is out because it is strictly inferior, but base 36 (aka alphanumeric) looks promising:

d442
06yw
vsyt
i6uw
cgrh
irei
a2oc
gn8f
65he
5gve

Generally, I'd like to aim for at least 20 bits entropy by default. This significantly reduces the probability of an attacker guessing the password, while allowing us to reuse a code up to 16 times (e.g. for send-many) without losing security compared to before.

Also, I'd like to point out that emojis are a thing: https://spec.matrix.org/latest/client-server-api/#sas-method-emoji This must obviously be optional (especially as we cannot easily build plain text fallback), but every emoji from that list gives us 6 bits of entropy, so we'd only need 3-4 for a password.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant