Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only create fake divs for <input> fields that have text containing types #1117

Open
melink14 opened this issue Jul 5, 2022 · 3 comments
Open

Comments

@melink14
Copy link
Owner

melink14 commented Jul 5, 2022

More than that though, I think we should limit the types of inputs we try to extract text from. It was originally added when <input> was just text but now it can be things like range, date, and number which aren't useful to try to get Japanese text from. We should probably add a allowlist for input types which rikaikun tries to process since it's a waste of cycles if nothing else!

Let's leave this bug for properly cleaning up the added

and I'll open another one for being more choosy in trying to process <input>s

Originally posted by @melink14 in #1114 (comment)

Some decisions:

  • Should we use an allowlist or a denylist? An allowlist is safest in the face of new types but a denylist would mean new types would work automatically in case they also contained text.
  • Review a list of types and divide them into relevant and irrelevant wrt rikaikun.
@tora-pan
Copy link
Contributor

List of input types for reference:
from w3schools

button, checkbox, color, date, datetime-local, email, file, 
hidden, image, month, number, password, radio, range, 
reset, search, submit, tel, text, time, url, week

Keep:

button (value could be Japanese text)
text (value could be Japanese text)
search (docs say it behaves like a text field so I'd assume we keep it)

Safe to ignore:

checkbox color date datetime-local email hidden image month
number password radio range reset submit tel time url week

Thoughts?

@melink14
Copy link
Owner Author

(Recommending https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input over w3schools but mostly just preference)

Others to think about:
submit this is the same as button I think in that it can have arbitrary text?
date this can actually have japanese text so actually it might be useful to keep except that maybe the labels aren't exposed. I guess it's not enough for there to be text but we also need to have access to that text!
file if the file name is in Japanese then this works; the only problem is that it's hard to actually accurately render the value since the value isn't exactly the same as what's shown.
email if the email address has Japanese characters then this one should still work.

I think it's okay to pick the ones that can be directly used and ignore the rest. In that case, maybe just add submit and email to your list.

@tora-pan
Copy link
Contributor

tora-pan commented Apr 17, 2023

Woah! I didn't know non-ascii emails was a thing!

Also, good catch on the submit.
<input type="submit" value="日本語で検索" />
I forgot about the value.

I did a bit of searching and found some info that mentions it but have never seen something like that. Crazy.

Snippet from RFC6539

Full use of electronic mail throughout the world requires that
(subject to other constraints) people be able to use close variations
on their own names (written correctly in their own languages and
scripts) as mailbox names in email addresses. This document
introduces a series of specifications that define mechanisms and
protocol extensions needed to fully support internationalized email
addresses. These changes include an SMTP extension and extension of
email header syntax to accommodate UTF-8 data. The document set also
includes discussion of key assumptions and issues in deploying fully
internationalized email. This document is a replacement for RFC
4952; it reflects additional issues identified since that document
was published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants