Active Learning for Entity Alignment

This repository contains the source code for the paper

Active Learning for Entity Alignment
Max Berrendorf*, Evgeniy Faerman*, and Volker Tresp
https://arxiv.org/abs/2001.08943

Installation

Setup and activate a virtual environment:

python3.8 -m venv ./venv
source ./venv/bin/activate

Install requirements (in this virtual environment):

pip install -U pip
pip install -U -r requirements.txt

Preparation

In order to track results to a MLFlow server, start it first by running

mlflow server

Note: When storing the result for many configurations, we recommend to setup a database backend following the instructions. For the following examples, we assume that the server is running at

TRACKING_URI=http://localhost:5000

Experiments

For all experiments the results are logged to the running MLFlow instance. You can inspect the results during training by accessing the TRACKING_URI through a browser. Moreover, all experiments are synced via the MLFlow instance. Thus, you can start multiple instances of each command on different worker machines to parallelize the experiment.

Random Baseline

To run the random baseline use

PYTHONPATH=./src python3 executables/evaluate_active_learning_heuristic.py --phase=random --tracking_uri=${TRACKING_URI}

Hyperparameter Search

To run the hyperparameter search use

PYTHONPATH=./src python3 executables/evaluate_active_learning_heuristic.py --phase=hpo --tracking_uri=${TRACKING_URI}

Note: The hyperparameter searches takes a significant amount of time (~multiple days), and requires access to GPU(s). You can abort the script at any time, and inspect the current results via the web interface of MLFlow.

Best Configurations

To rerun the best configurations we found in our hyperparameter search use

PYTHONPATH=./src python3 executables/evaluate_active_learning_heuristic.py --phase=best --tracking_uri=${TRACKING_URI}

Evaluation

To reproduce the tables and numbers of the paper use

PYTHONPATH=./src python3 executables/collate_results.py --tracking_uri=${TRACKING_URI}

To avoid re-downloading data from a remote MLFLow instance, the metrics and parameters get buffered. To enforce a re-download, e.g., since you conducted additional runs, use --force.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
executables		executables
src/kgm		src/kgm
tests		tests
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executables

executables

src/kgm

src/kgm

tests

tests

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Active Learning for Entity Alignment

Installation

Preparation

Experiments

Random Baseline

Hyperparameter Search

Best Configurations

Evaluation

About

Releases 2

Packages

Languages

License

mberr/ea-active-learning

Folders and files

Latest commit

History

Repository files navigation

Active Learning for Entity Alignment

Installation

Preparation

Experiments

Random Baseline

Hyperparameter Search

Best Configurations

Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages