You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I should say that Kotlin language standard library has File::readText extension function for Java File class that treats the file as UTF-8 if no charset is provided by the user.
The text was updated successfully, but these errors were encountered:
My thought process when designing that API was to make it explicit to the caller that they should aim to provide the character set, and that if it wasn't set, that jsoup would have to guess. The goal is to make it more explicit that a possibly incorrect default is going to be used.
One of jsoup's goals is to minimize dependencies and the required jar size, so I don't plan to include the Tika scan/guess.
I'm not clear on how we could both assume UTF-8 but also use the default charset of the JVM (if that were not UTF-8).
Is it possible to provide a
Jsoup.parse(file)
method which does not have the charset parameter?It will make the code a tiny little bit more pleasant.
The method can use either of these approaches:
UTF-8
http-equiv
meta tag, if presentThe first two are what is documented for the parse method when it is passed
null
.I should say that Kotlin language standard library has File::readText extension function for Java
File
class that treats the file asUTF-8
if no charset is provided by the user.The text was updated successfully, but these errors were encountered: