Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Only use HTML rules if mimeType matches #338

Merged
merged 25 commits into from Oct 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
a405d39
feat: Add `DOMImplementation.createHTMLDocument`
karfau Oct 22, 2021
b5d1061
refactor: Create HTML document from `DOMHandler`
karfau Oct 24, 2021
a6b72cf
feat: Use minimal `Object.assign` ponyfill
karfau Oct 24, 2021
fb752f0
fix: Prevent `DOMParserOptions` `locator` and `xmlns` from being mutated
karfau Oct 24, 2021
97c9f8d
refactor: Copy DOMParserOptions `locator` and `xmlns` to instance
karfau Oct 24, 2021
175aa0a
refactor: Copy DOMParserOptions `normalizeLineEndings` to instance
karfau Oct 24, 2021
6579add
refactor: Copy DOMParserOptions `errorHandler` to instance
karfau Oct 24, 2021
581a9ef
fix: Replace `DOMParserOptions` `domBuilder` by `domHandler`
karfau Oct 24, 2021
dea863f
refactor: Drop `DOMParser.options` property
karfau Oct 24, 2021
f305328
feat: Correctly handle all case modifications
karfau Oct 26, 2021
980734f
test: Remove redundant test case after rebase
karfau Feb 15, 2022
380851d
style: Improve wording and drop some whitespace
karfau Feb 16, 2022
aae7654
fix(sax): Only apply HTML rules if mimeType is present
karfau Feb 16, 2022
0b032bb
fix: Add 'use strict' to lib/entities.js
karfau Feb 17, 2022
8626514
style: Format code
karfau Feb 21, 2022
fef1d79
fix: Add 'use strict' to all files
karfau Feb 21, 2022
52acd24
docs: Tweak doc comments
karfau Feb 21, 2022
5fd4e1b
feat(conventions): List HTML boolean attributes and void elements
karfau Feb 28, 2022
ceff927
fix(sax): Handle raw text elements in HTML
karfau Mar 6, 2022
dc62bf5
fix(conventions): Restore ES5 compatibility
karfau Mar 6, 2022
bbe7790
style: Format test code
karfau Mar 6, 2022
ae2c7da
refactor: Exclude escapable from isHTMLRawTextElement
karfau Mar 6, 2022
9b46871
fix(dom): Serialize according to document type
karfau Mar 6, 2022
48f49be
test(examples): Check for undefined before using document
karfau Mar 6, 2022
1b88b30
Merge remote-tracking branch 'upstream/master' into 203-html-document
karfau Mar 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/typescript-node-es6/src/index.ts
Expand Up @@ -7,6 +7,8 @@ const source = `<xml xmlns="a">

const doc = new DOMParser().parseFromString(source, 'text/xml')

if (!doc) throw 'expected Document but was undefined'

const serialized = new XMLSerializer().serializeToString(doc)

if (source !== serialized) {
Expand Down
82 changes: 42 additions & 40 deletions index.d.ts
@@ -1,43 +1,45 @@
/// <reference lib="dom" />

declare module "@xmldom/xmldom" {
var DOMParser: DOMParserStatic;
var XMLSerializer: XMLSerializerStatic;
var DOMImplementation: DOMImplementationStatic;

interface DOMImplementationStatic {
new(): DOMImplementation;
}

interface DOMParserStatic {
new (): DOMParser;
new (options: Options): DOMParser;
}

interface XMLSerializerStatic {
new (): XMLSerializer;
}

interface DOMParser {
parseFromString(xmlsource: string, mimeType?: string): Document;
}

interface XMLSerializer {
serializeToString(node: Node): string;
}

interface Options {
locator?: any;
errorHandler?: ErrorHandlerFunction | ErrorHandlerObject | undefined;
}

interface ErrorHandlerFunction {
(level: string, msg: any): any;
}

interface ErrorHandlerObject {
warning?: ((msg: any) => any) | undefined;
error?: ((msg: any) => any) | undefined;
fatalError?: ((msg: any) => any) | undefined;
}
declare module '@xmldom/xmldom' {
var DOMParser: DOMParserStatic
var XMLSerializer: XMLSerializerStatic
var DOMImplementation: DOMImplementationStatic

interface DOMImplementationStatic {
new (): DOMImplementation
}

interface DOMParserStatic {
new (): DOMParser
new (options: DOMParserOptions): DOMParser
}

interface XMLSerializerStatic {
new (): XMLSerializer
}

interface DOMParser {
parseFromString(source: string, mimeType?: string): Document | undefined
}

interface XMLSerializer {
serializeToString(node: Node): string
}

interface DOMParserOptions {
errorHandler?: ErrorHandlerFunction | ErrorHandlerObject
locator?: boolean
normalizeLineEndings?: (source: string) => string
xmlns?: Record<string, string | null | undefined>
}

interface ErrorHandlerFunction {
(level: 'warn' | 'error' | 'fatalError', msg: string): void
}

interface ErrorHandlerObject {
warning?: (msg: string) => void
error?: (msg: string) => void
fatalError?: (msg: string) => void
}
}
192 changes: 183 additions & 9 deletions lib/conventions.js
Expand Up @@ -9,7 +9,7 @@
*
* @template T
* @param {T} object the object to freeze
* @param {Pick<ObjectConstructor, 'freeze'> = Object} oc `Object` by default,
* @param {Pick<ObjectConstructor, 'freeze'>} [oc=Object] `Object` by default,
* allows to inject custom object constructor for tests
* @returns {Readonly<T>}
*
Expand Down Expand Up @@ -47,6 +47,155 @@ function assign(target, source) {
return target
}

/**
* A number of attributes are boolean attributes.
* The presence of a boolean attribute on an element represents the `true` value,
* and the absence of the attribute represents the `false` value.
*
* If the attribute is present, its value must either be the empty string
* or a value that is an ASCII case-insensitive match for the attribute's canonical name,
* with no leading or trailing whitespace.
*
* Note: The values `"true"` and `"false"` are not allowed on boolean attributes.
* To represent a `false` value, the attribute has to be omitted altogether.
*
* @see https://html.spec.whatwg.org/#boolean-attributes
* @see https://html.spec.whatwg.org/#attributes-3
*/
var HTML_BOOLEAN_ATTRIBUTES = freeze({
allowfullscreen: true,
async: true,
autofocus: true,
autoplay: true,
checked: true,
controls: true,
default: true,
defer: true,
disabled: true,
formnovalidate: true,
hidden: true,
ismap: true,
itemscope: true,
loop: true,
multiple: true,
muted: true,
nomodule: true,
novalidate: true,
open: true,
playsinline: true,
readonly: true,
required: true,
reversed: true,
selected: true,
})

/**
* Check if `name` is matching one of the HTML boolean attribute names.
* This method doesn't check if such attributes are allowed in the context of the current document/parsing.
*
* @param {string} name
* @return {boolean}
* @see HTML_BOOLEAN_ATTRIBUTES
* @see https://html.spec.whatwg.org/#boolean-attributes
* @see https://html.spec.whatwg.org/#attributes-3
*/
function isHTMLBooleanAttribute(name) {
return HTML_BOOLEAN_ATTRIBUTES.hasOwnProperty(name.toLowerCase())
}

/**
* Void elements only have a start tag; end tags must not be specified for void elements.
* These elements should be written as self closing like this: `<area />`.
* This should not be confused with optional tags that HTML allows to omit the end tag for
* (like `li`, `tr` and others), which can have content after them,
* so they can not be written as self closing.
* xmldom does not have any logic for optional end tags cases and will report them as a warning.
* Content that would go into the unopened element will instead be added as a sibling text node.
*
* @type {Readonly<{area: boolean, col: boolean, img: boolean, wbr: boolean, link: boolean, hr: boolean, source: boolean, br: boolean, input: boolean, param: boolean, meta: boolean, embed: boolean, track: boolean, base: boolean}>}
* @see https://html.spec.whatwg.org/#void-elements
* @see https://html.spec.whatwg.org/#optional-tags
*/
var HTML_VOID_ELEMENTS = freeze({
area: true,
base: true,
br: true,
col: true,
embed: true,
hr: true,
img: true,
input: true,
link: true,
meta: true,
param: true,
source: true,
track: true,
wbr: true,
})

/**
* Check if `tagName` is matching one of the HTML void element names.
* This method doesn't check if such tags are allowed
* in the context of the current document/parsing.
*
* @param {string} tagName
* @return {boolean}
* @see HTML_VOID_ELEMENTS
* @see https://html.spec.whatwg.org/#void-elements
*/
function isHTMLVoidElement(tagName) {
return HTML_VOID_ELEMENTS.hasOwnProperty(tagName.toLowerCase())
}

/**
* Tag names that are raw text elements according to HTML spec.
* The value denotes whether they are escapable or not.
*
* @see isHTMLEscapableRawTextElement
* @see isHTMLRawTextElement
* @see https://html.spec.whatwg.org/#raw-text-elements
* @see https://html.spec.whatwg.org/#escapable-raw-text-elements
*/
var HTML_RAW_TEXT_ELEMENTS = freeze({
script: false,
style: false,
textarea: true,
title: true,
})

/**
* Check if `tagName` is matching one of the HTML raw text element names.
* This method doesn't check if such tags are allowed
* in the context of the current document/parsing.
*
* @param {string} tagName
* @return {boolean}
* @see isHTMLEscapableRawTextElement
* @see HTML_RAW_TEXT_ELEMENTS
* @see https://html.spec.whatwg.org/#raw-text-elements
* @see https://html.spec.whatwg.org/#escapable-raw-text-elements
*/
function isHTMLRawTextElement(tagName) {
var key = tagName.toLowerCase();
return HTML_RAW_TEXT_ELEMENTS.hasOwnProperty(key) && !HTML_RAW_TEXT_ELEMENTS[key];
}
/**
* Check if `tagName` is matching one of the HTML escapable raw text element names.
* This method doesn't check if such tags are allowed
* in the context of the current document/parsing.
*
* @param {string} tagName
* @return {boolean}
* @see isHTMLRawTextElement
* @see HTML_RAW_TEXT_ELEMENTS
* @see https://html.spec.whatwg.org/#raw-text-elements
* @see https://html.spec.whatwg.org/#escapable-raw-text-elements
*/
function isHTMLEscapableRawTextElement(tagName) {
var key = tagName.toLowerCase();
return HTML_RAW_TEXT_ELEMENTS.hasOwnProperty(key) && HTML_RAW_TEXT_ELEMENTS[key];
}

/**
* All mime types that are allowed as input to `DOMParser.parseFromString`
*
Expand All @@ -72,14 +221,32 @@ var MIME_TYPE = freeze({
* @param {string} [value]
* @returns {boolean}
*
* @see https://www.iana.org/assignments/media-types/text/html IANA MimeType registration
* @see https://en.wikipedia.org/wiki/HTML Wikipedia
* @see https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString MDN
* @see https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-domparser-parsefromstring */
* @see [IANA MimeType registration](https://www.iana.org/assignments/media-types/text/html)
* @see [Wikipedia](https://en.wikipedia.org/wiki/HTML)
* @see [`DOMParser.parseFromString` @ MDN](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString)
* @see [`DOMParser.parseFromString` @ HTML Specification](https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-domparser-parsefromstring)
*/
isHTML: function (value) {
return value === MIME_TYPE.HTML
},

/**
* For both the `text/html` and the `application/xhtml+xml` namespace
* the spec defines that the HTML namespace is provided as the default in some cases.
*
* @param {string} mimeType
* @returns {boolean}
*
* @see https://dom.spec.whatwg.org/#dom-document-createelement
* @see https://dom.spec.whatwg.org/#dom-domimplementation-createdocument
* @see https://dom.spec.whatwg.org/#dom-domimplementation-createhtmldocument
*/
hasDefaultHTMLNamespace: function (mimeType) {
return (
MIME_TYPE.isHTML(mimeType) || mimeType === MIME_TYPE.XML_XHTML_APPLICATION
)
},

/**
* `application/xml`, the standard mime type for XML documents.
*
Expand Down Expand Up @@ -164,7 +331,14 @@ var NAMESPACE = freeze({
XMLNS: 'http://www.w3.org/2000/xmlns/',
})

exports.assign = assign;
exports.freeze = freeze;
exports.MIME_TYPE = MIME_TYPE;
exports.NAMESPACE = NAMESPACE;
exports.assign = assign
exports.freeze = freeze
exports.HTML_BOOLEAN_ATTRIBUTES = HTML_BOOLEAN_ATTRIBUTES
exports.HTML_RAW_TEXT_ELEMENTS = HTML_RAW_TEXT_ELEMENTS
exports.HTML_VOID_ELEMENTS = HTML_VOID_ELEMENTS
exports.isHTMLBooleanAttribute = isHTMLBooleanAttribute
exports.isHTMLRawTextElement = isHTMLRawTextElement
exports.isHTMLEscapableRawTextElement = isHTMLEscapableRawTextElement
exports.isHTMLVoidElement = isHTMLVoidElement
exports.MIME_TYPE = MIME_TYPE
exports.NAMESPACE = NAMESPACE