Skip to content

raywhite/pico-dom

Repository files navigation

pico-dom

CircleCI

Create, transform and serialize lightweight HTML abstract syntax trees in a functional and composable style. You know... for the web!

About

pico-dom is a tool for parsing, transformation and serialization of HTML. It is built on top of the excellent and forgiving parse5 module, and is based on a simple premise; any markup can be represented as a simple data structure (a Abstract Syntax Tree - not unlike the DOM implemented in browsers) - Just like a hash, array, or any other collection, we can transform that tree using functional utilities like map and reduce, and compose sets of those utilities into streams that perform complex transformations.

pico-dom has several use cases:

  • markup sanitization
  • markup transformation
  • markup templating
  • web scraping

API

parse([doc, ] markup)

  • doc (boolean) - whether or not to parse as a document (default: false)

  • markup (string) - the markup to be parsed into an AST.

This is a wrapper around the parse and parseFragment methods from parse5. It will return an AST in the in the htmlparser2 format.

Note: Where doc is not provided, the markup will be parsed as a document fragment, if you'd like for the markup to be parsed as a valid document (the way a browser would interpret it) then you should set doc to true.

stringify(tree)

  • tree (object) - the AST to be serialized.

This wraps the serialize method from parse5 - it'll serialize any AST in the htmlparser2 format.

adapter

pico-dom's API re-exposes and extends several methods exposed by parse5 htmlparser2 tree adapter. The exported adapter is extended with the methods isRootNode, createTextNode, cloneNode and createNode (which is JSX compatible), see docs for these below.

adoptAttributes(recipient, attributes)

  • recipient (object) - the element to copy attributes into.
  • attributes (array) - the attributes to copy.

appendChild(parentNode, newNode)

  • parentNode (object) - the parent node.
  • newNode (object) - the new node to append.

cloneNode(node)

  • node (object) - the node to clone.

createCommentNode(data)

  • data (string) - the text content of the comment node.

createDocument()

Creates a spec compliant document that new nodes can be insterted into / appended to.

createDocumentFragment()

Creates a document fragment that new nodes can be insterted into / appended to.

createElement(tagName, namespaceURI, attributes)

  • tagName (string) - the elements tag name.
  • namespaceURI (string) - the elements namespace.
  • attributes (array) - the elements attributes.

createNode(tagName, attrs, childNodes)

  • tagName (string|function) - the tag name, or function to be called with attrs as props.
  • attrs (object) - the attribute (key:value) pairs or props.
  • childNodes (...object|array|string) - the child nodes of this node.

This is a higher level, JSX compatible function that should be used for the creation of elements or reusable functional components. It is effectively a templating helper function. In order to use this function with JSX, you need to be using transpilation and specify the custom pragma (compiler directive) adapter.createNode. See the babel documentation for setting this up here.

Note: this function supports the use of two extra tagNames; Passing 'fragment' as will create a document fragment root node and passing 'document' will create a document root node - this means that you won't have to manually append nodes to a document or fragment node in order to produce a serializable abstract syntax tree.

createTextNode(data)

  • data (string) - the text nodes content.

detachNode(node)

  • node (object) - the node to detach from it's parent.

getAttrList(element)

  • element (object) - the element to get the attributes list of.

getChildNodes(node)

  • node (object) - the element to get the child nodes of.

getCommentNodeContent(commentNode)

  • commentNode (object) - the comment node to get the content of.

getDocumentMode(document)

  • document (object) - the document to get the mode of.

getDocumentTypeNodeName(doctypeNode)

  • doctypeNode (object) - the document type node to get the node name of.

getDocumentTypeNodePublicId(doctypeNode)

  • doctypeNode (object) - the document type node to get the public id of.

getDocumentTypeNodeSystemId(doctypeNode)

  • doctypeNode (object) - the document type node to get the system id of.

getFirstChild(node)

  • node (object) - the node to get the first child of.

getNamespaceURI(element)

  • element (object) - the element to get the namespace RUI of.

getParentNode(node)

  • node (object) - the node to get the parent of.

getTagName(element)

  • node (object) - the element node to get the tag name of.

getTemplateContent(templateElement)

  • templateElement (object) - the template element node to get the content of.

getTextNodeContent(textNode)

  • textNode (object) - the text node to get the content of.

insertBefore(parentNode, newNode, referenceNode)

  • parentNode (object) - the parent node.
  • newNode (object) - the new node to insert.
  • referenceNode (object) - the reference node.

insertText(parentNode, text)

  • parentNode (object) - the parent node.
  • text (string) - the text content.

insertTextBefore(parentNode, text, referenceNode)

  • parentNode (object) - the parent node.
  • text (string) - the text content.
  • referenceNode (object) - the reference node.

isCommentNode(node)

  • node (object) - the node to test.

isDocumentTypeNode(node)

  • node (object) - the node to test.

isElementNode(node)

  • node (object) - the node to test.

isRootNode(node)

  • node (object) - the node to test.

isTextNode(node)

  • node (object) - the node to test.

setDocumentMode(document, mode)

  • document (object) - the document to set the mode of.
  • mode (string) - the document mode.

setDocumentType(document, name, publicId, systemId)

  • document (object) - the document to set the mode of.
  • name (string) - the document mode.
  • publicId (string) - the document public id.
  • systemId (string) - the document system id.

setTemplateContent(templateElement, contentElement)

  • templateElement (object) - the template element node to set the content of.
  • contentElement (object) - the fragment containing the content.

map(fn, node)

  • fn (function) - the callback that will recurse the AST.
  • node (object) - the root node of the AST to map.

The map method functions in a manner analogous to Array.prototype.map - The callback function fn will be passed the current node; fn(node). The order of iteration through the tree is from the most deeply nested child of node to the least deeply nested, and finally node itself. The callback should return the node(s) to replace node with in the new (mapped) tree - you may return a single node, an array of nodes or null (in which case no children will be appended to the node passed to the next invokation). Note that each node is cloned internally to avoid mutation, which means that it is safe to simply return node.

The order in which arguments are passed to map is designed to allow for partial application using Function.prototype.bind.

reduce(fn, i, node)

  • fn (function) - the callback that will recurse the AST.
  • i (number|string|function) - the initial value for the reduction.
  • node (object) - the root node of the AST to reduce.

The reduce method functions in a manner analogous to Array.prototype.reduce - The callback function fn will be passed the initial value i or the return value of the previous invokation, as well as the the current node; fn(i, node). The order of iteration through the tree is from the most deeply nested child of node to the least deeply nested, and finally node itself. Note that each node is cloned internally to avoid mutation.

The order in which arguments are passed to reduce is designed to allow for partial application using Function.prototype.bind.

compose(...fns)

  • fns (...function) - the functions to be composed.

sequence(...fns)

  • fns (...function) - the functions to be sequenced.

License

MIT © Ray White, 2017 •