Skip to content

Commit

Permalink
Added Jsoup.parse(File) method
Browse files Browse the repository at this point in the history
Fixes #1693
  • Loading branch information
jhy committed Dec 28, 2021
1 parent 0bcb923 commit 3a6e7fa
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 1 deletion.
3 changes: 3 additions & 0 deletions CHANGES
Expand Up @@ -25,6 +25,9 @@ jsoup changelog
as to preserve applicable settings, such as the Pretty Print settings.
<https://github.com/jhy/jsoup/issues/763>

* Improvement: added a convenience method Jsoup.parse(File).
<https://github.com/jhy/jsoup/issues/1693>

* Bugfix: boolean attribute names should be case-insensitive, but were not when the parser was configured to preserve
case.
<https://github.com/jhy/jsoup/issues/1656>
Expand Down
18 changes: 17 additions & 1 deletion src/main/java/org/jsoup/Jsoup.java
Expand Up @@ -142,12 +142,28 @@ public static Document parse(File file, @Nullable String charsetName, String bas
@return sane HTML
@throws IOException if the file could not be found, or read, or if the charsetName is invalid.
@see #parse(File, String, String)
@see #parse(File, String, String) parse(file, charset, baseUri)
*/
public static Document parse(File file, @Nullable String charsetName) throws IOException {
return DataUtil.load(file, charsetName, file.getAbsolutePath());
}

/**
Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.
The charset used to read the file will be determined by the byte-order-mark (BOM), or a {@code <meta charset>} tag,
or if neither is present, will be {@code UTF-8}.
<p>This is the equivalent of calling {@link #parse(File, String) parse(file, null)}</p>
@param file the file to load HTML from. Supports gzipped files (ending in .z or .gz).
@return sane HTML
@throws IOException if the file could not be found or read.
@see #parse(File, String, String) parse(file, charset, baseUri)
*/
public static Document parse(File file) throws IOException {
return DataUtil.load(file, null, file.getAbsolutePath());
}

/**
Parse the contents of a file as HTML.
Expand Down
7 changes: 7 additions & 0 deletions src/test/java/org/jsoup/integration/ParseTest.java
Expand Up @@ -229,6 +229,13 @@ public void testXwikiExpanded() throws IOException {
assertEquals(wantHtml, doc.select("[data-id=userdirectory]").outerHtml());
}

@Test public void testFileParseNoCharsetMethod() throws IOException {
File in = getFile("/htmltests/xwiki-1324.html.gz");
Document doc = Jsoup.parse(in);
assertEquals("XWiki Jetty HSQLDB 12.1-SNAPSHOT", doc.select("#xwikiplatformversion").text());
}


public static File getFile(String resourceName) {
try {
URL resource = ParseTest.class.getResource(resourceName);
Expand Down

0 comments on commit 3a6e7fa

Please sign in to comment.