Skip to content

A spaCy custom component that extracts and normalizes temporal expressions

License

Notifications You must be signed in to change notification settings

paulrinckens/timexy

Repository files navigation

⏳ Timexy

Package version Codecov

A spaCy custom component that extracts and normalizes dates and other temporal expressions.

Features

  • 💥 Extract dates and durations for various languages. See here a list of currently supported languages
  • 💥 Normalize dates to timestamps or normalize dates and durations to the TimeML TIMEX3 standard

Supported Languages

  • 🇩🇪 German
  • 🇬🇧 English
  • 🇫🇷 French

Installation

pip install timexy

Usage

After installation, simply integrate the timexy component in any of your spaCy pipelines to extract and normalize dates and other temporal expressions:

import spacy
from timexy import Timexy

nlp = spacy.load("en_core_web_sm")

# Optionally add config if varying from default values
config = {
    "kb_id_type": "timex3",  # possible values: 'timex3'(default), 'timestamp'
    "label": "timexy",       # default: 'timexy'
    "overwrite": False       # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")

doc = nlp("Today is the 10.10.2010. I was in Paris for six years.")
for e in doc.ents:
    print(f"{e.text}\t{e.label_}\t{e.kb_id_}")    
>>> 10.10.2010    timexy    TIMEX3 type="DATE" value="2010-10-10T00:00:00"
>>> six years     timexy    TIMEX3 type="DURATION" value="P6Y"

Normalization of temporal expressions

Timexy allows the normalization of all temporal expressions to

  • TimeML Timex3 standard
  • timestamp

The normalization is configured with the kb_id_type config parameter:

config = {
    "kb_id_type": "timex3",  # possible values: 'timex3'(default), 'timestamp'
    "label": "timexy",       # default: 'timexy'
    "overwrite": False       # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")

NOTE: Normalizing temporal expressions that are not concrete dates to timestamp is not viable. Therefore, all non-date temporal expressions are always normalized to timex3 regardless of the kb_id_type config.

Contributing

Please refer to the contributing guidelines here.