Clinical Records Anonymisation and Text Extraction (CRATE)

Purpose

Create and use de-identified databases for research.

Anonymises relational databases.
Extracts and de-identifies text from associated binary files.
Performs some specific preprocessing tasks; e.g.
- preprocesses some specific databases (e.g. Servelec RiO EMR);
- drafts a "data dictionary" for anonymisation, with special knowledge of some databases (e.g. TPP SystmOne);
- fetches some word lists, e.g. forenames/surnames/eponyms.
Provides tools to link databases, including via Bayesian personal identity matching, in identifiable or de-identified fashion.
Provides a natural language processing (NLP) pipeline, including built-in NLP, support for external tools, and client/server support for the Natural Language Processing Request Protocol (NLPRP).
Web app for
- querying the anonymised database;
- providing a de-identification API;
- managing a consent-to-contact process.

Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
Licensed under the GNU GPL v3+: see LICENSE file.
Some third-party libraries have slightly different licences; see the documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 3,121 Commits
.github		.github
bug_reports		bug_reports
built_packages		built_packages
crate_anon		crate_anon
debugging		debugging
docker/dockerfiles		docker/dockerfiles
docs		docs
github_action_scripts		github_action_scripts
installer		installer
stubs		stubs
tools		tools
working		working
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MAKE_PYTHON_PACKAGE.sh		MAKE_PYTHON_PACKAGE.sh
MANIFEST.in		MANIFEST.in
README.rst		README.rst
changelog.Debian		changelog.Debian
requirements-ubuntu.txt		requirements-ubuntu.txt
setup.cfg		setup.cfg
setup.py		setup.py