GitHub - cerebrosoft/entity-extractor: Lightweight Java-based entity extraction engine

Lightweight entity extraction engine

Extract entities based on a lexicon and/or library of patterns.

Easy to use API
High performance lexicon-based extraction, up to several hundred times faster than regex
Utilizes Lucene analyzers to improve match results
Java 8+

Usage

// create a lexicon
List<Entity> items = new ArrayList<>();
items.add(new DefaultEntity.Builder("John Smith").type("Person").build());
EntityBook entityBook = new EntityBook(items);

// create patterns
List<PatternDef> patterns = new ArrayList<>();
patterns.add(new PatternDef("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+", "Email", false));
PatternBook patternBook = new PatternBook(patterns);

//perform extraction
AggregateExtractor extractor = new AggregateExtractor();
ExtractionManifest manifest = extractor.extract(text, entityBook, patternBook);

for (ExtractionRegion region : manifest.getRegions()) {
    //do something with entities inside each region
}

Examples

The ner-example project contains two example programs, SigBlockExample and WonderlandExample, that illustrate extracting entities from text and using the results to mark up those entities in the document.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
core		core
etc		etc
example		example
server		server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core

core

etc

etc

example

example

server

server

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Lightweight entity extraction engine

Usage

Examples

About

Releases

Packages

Languages

License

cerebrosoft/entity-extractor

Folders and files

Latest commit

History

Repository files navigation

Lightweight entity extraction engine

Usage

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Languages