GitHub - saintamh/klon: Klon is a collection of utilities for building ElementTrees.

Klon is a collection of Python utilities for manipulating ElementTrees. It's a thin-ish, transparent wrapper around the lxml.etree module.

klon.build_etree

Source code: klon/build.py

A utility for building element trees using list and string literals.

>>> from klon import build_etree
>>> etree = build_etree(
...     'html',
...     [
...         'head',
...         ['title', 'Test Document'],
...     ],
...     [
...         'body',
...         ['h1#title', 'This is a test'],
...         ['a', {'href': '/page'}, ['img', {'src': 'image.jpg'}]],
...         [
...             'p.text',
...             'This is a text',
...             ['br'],
...             'This is a tail',
...         ],
...     ],
... )

Nested lists are translated to nested elements:

the first element in the list must be a string, and becomes the tag name
optionally, the second element can be a dict, specifying tag attributes
any other elements become the tag's children

As a convenience, the id and class attributes can be set directly from the tag name string, using CSS-like syntax: tag#id and tag.class.

klon.tostring

Source code: klon/utils.py

A thin wrapper around lxml.etree.tostring.

>>> from klon import tostring
>>> print(tostring(etree, pretty_print=True))
<html>
  <head>
    <title>Test Document</title>
  </head>
  <body>
    <h1 id="title">This is a test</h1>
    <a href="/page">
      <img src="image.jpg"/>
    </a>
    <p class="text">This is a text<br/>This is a tail</p>
  </body>
</html>

The main difference with the underlying LXML function is that encoding=str by default, i.e. it produces strings by default, rather than bytes.

klon.extract_text

Source code: klon/text.py

Extracts all text from the given node and its descendants.

By default, all contiguous whitespace is normalized to a single ASCII space, and so the output will always be a single line of text. However if multiline=True is specified, paragraph-breaking tags are preserved, in the same way that a web browser would. Other whitespace is still normalized, but the output now contains both ASCII spaces and ASCII newlines.

>>> from klon import extract_text

>>> body = etree.find('body')  # using the same example etree defined above
>>> print(tostring(body, pretty_print=True))
<body>
  <h1 id="title">This is a test</h1>
  <a href="/page">
    <img src="image.jpg"/>
  </a>
  <p class="text">This is a text<br/>This is a tail</p>
</body>

>>> extract_text(body)
'This is a test This is a text This is a tail'

>>> extract_text(body, multiline=True)
'This is a test\n\nThis is a text\nThis is a tail'

Note that the <p> tag translates to a double newline, while the <br> tag translates to a single \n, mimicking how a browser renders them.

klon.detach

Source code: klon/utils.py

Takes one node as argument, and removes it from its tree. Takes care to preserve the node's tail text by reattaching it to the correct position in the tree.

>>> from klon import detach

>>> print(tostring(body, pretty_print=True))
<body>
  <h1 id="title">This is a test</h1>
  <a href="/page">
    <img src="image.jpg"/>
  </a>
  <p class="text">This is a text<br/>This is a tail</p>
</body>

>>> br = detach(body.xpath('.//br')[0])

>>> print(tostring(body, pretty_print=True))
<body>
  <h1 id="title">This is a test</h1>
  <a href="/page">
    <img src="image.jpg"/>
  </a>
  <p class="text">This is a textThis is a tail</p>
</body>

Note that This is a tail, which was the tail of the <br> node, has been preserved, in this case by appending it to the text of its parent node.

klon.make_all_urls_absolute

Source code: klon/html.py

Takes a URL and a document etree, and modifies the etree in place to convert all relative URLs to absolute ones, using the given URL as a base. All standard tag attributes that specify a URL (e.g. <a href="...">, <img src="...">, <form action="..."> etc) are converted.

>>> from klon import make_all_urls_absolute

>>> print(tostring(body.find('a')))
<a href="/page"><img src="image.jpg"/></a>

>>> make_all_urls_absolute('https://site.com/path/', etree)

>>> print(tostring(body.find('a')))
<a href="https://site.com/page"><img src="https://site.com/path/image.jpg"/></a>

klon.parse_form

Source code: klon/forms.py

Takes an ElementTree whose root is a <form> node, and returns a requests.Request that corresponds to the request that would be sent by a browser if the form was submitted.

>>> from klon import parse_form

>>> form = build_etree(
...     'form',
...     {'method': 'POST', 'action': '/publish'},
...     [
...         'div',
...         ['input', {'name': 'title', 'value': 'Some title'}],
...     ],
...     [
...         'select',
...         {'name': 'kind'},
...         ['option', {'value': 'comment'}, 'Comment'],
...         ['option', {'value': 'question', 'selected': 'yes'}, 'Question'],
...     ],
... )

>>> request = parse_form(form, base_url='https://web.site/')
>>> request.url
'https://web.site/publish'
>>> request.method
'POST'
>>> request.data
{'title': 'Some title', 'kind': 'question'}

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
klon		klon
test		test
.gitignore		.gitignore
.mypy.ini		.mypy.ini
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

klon

klon

test

test

.gitignore

.gitignore

.mypy.ini

.mypy.ini

.pre-commit-config.yaml

.pre-commit-config.yaml

.pylintrc

.pylintrc

.travis.yml

.travis.yml

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.md

README.md

setup.py

setup.py

Repository files navigation

klon.build_etree

klon.tostring

klon.extract_text

klon.detach

klon.make_all_urls_absolute

klon.parse_form

About

Releases

Packages

Contributors 2

Languages

License

saintamh/klon

Folders and files

Latest commit

History

Repository files navigation

klon.build_etree

klon.tostring

klon.extract_text

klon.detach

klon.make_all_urls_absolute

klon.parse_form

About

Topics

Resources

License

Stars

Watchers

Forks

Languages