Skip to content

ProstoKSI/html-cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html-cleaner ===========

Html cleaner and sanitizer for Python projects and as standalone app

Requirements

  • python >= 2.5
  • BeautifulSoup

Use html-cleaner ==============

clear_html_code

html_cleaner.clear.clear_html_code(text)

Clean up HTML code from tags that are not allowed. Structure of allowed tags can be found at needs.cfg. clear.py is generated by html_cleaner/generator.py with needs.cfg as config file.

Simple usage: :

from html_cleaner.clear import clear_html_code

clear_html_code("""
    <a href="/" title="test" alt="test">link</a>
    <javascript>alert(0);</javascript>
""")

generator

./generator.py

Will generate clear.py source code file, according to rules specified at needs.cfg. Example of simpler configuration file can be found in example.cfg.

Configuration file

Configuration file contains hierarchical rules for white-list of html cleaner. For example look at example.cfg and needs.cfg (we use this one).

Contributing

Development of html-cleaner happens at github: https://github.com/ProstoKsi/html-cleaner/

License

Copyright (C) 2009-2013 Illia Polosukhin, Vladyslav Frolov. This program is licensed under the MIT License (see LICENSE)

About

Html cleaner and sanitizer for Python projects and as standalone app

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages