Polyglot

Polyglot is an extensible, configurable, experimenting automation for researching the performance of Recurrent Neural Nets language models.

Built as a need for our Bachelor of Science Software Engineering degree final project - Improving performance of RNN based LSTM language models using concurrent Machine Learning techniques with TensorFlow for Python.

Tasks

In addition to the basic task of predicting the next word of a sentence, theses are the additional tasks that we forced the language model to improve it's performance:

Part of Speech (POS) - Predicting what part of speech the next word should be (verb, noun, adjective etc.)
Generated Classifier - Given the original dataset and a dataset generated by the language model itself, the classifier should detect which sentences are generated.
Same as 2 but this time use an unlearing softmax function as the lost function.
Write your own.

Techniques

Multitask Learning - training the same language model with different tasks concurrently.
Transfer Learning - training the same language model with different tasks sequentially.
Write your own.

Installation

Using Docker

Install Docker https://docs.docker.com/install/

Manual

Install TensorFlow version: 1.12 - https://www.tensorflow.org/install
Install Python dependancies:

pip install -r requirements.txt

Open Python console and run:

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

How to Run

Using Docker

Build the image (notice the dot at the end):

docker build -t lstm_fast .

Run the image:

docker run lstm_fast

Manual

Make sure your current working directory is the project's root folder

python main.py

Limitations

We assume of the following:

Your vocab file fits in memory (train, test and validation datasets are unlimited)
We need the num batches of each {train, test, validation} to be inserted into hyperparameters.json this is due to the fact that tf.data.Dataset loads in mini batches your data without taking into how much data there is.

Configuring the JSON Schema

Using PyCharm

Open the experiment_config.json in Pycharm
Click on the "No Json Schema" button (marked red):
Choose "New Schema Mapping":
Give the mapping any name you wish
Choose the file schema - schema.json:
Choose Schema version - "JSON Schema Version 7"
Your final settings should look this:

Using The Web

Go to https://www.jsonschemavalidator.net/
Copy the content of schema.json to the left side of the web page
Copy the content of experiments_config.json to the right side of the web page
If your JSON is valid you should see the green message:
If your json is invalid you should see the red message indication the error:

Name		Name	Last commit message	Last commit date
Latest commit History 270 Commits
config		config
rnnlm		rnnlm
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
experiment_config.json		experiment_config.json
main.py		main.py
requirements.txt		requirements.txt
schema.json		schema.json

amitlevy21/polyglot

Folders and files

Latest commit

History

Repository files navigation

Polyglot

Tasks

Techniques

Installation

Using Docker

Manual

How to Run

Using Docker

Manual

Limitations

Configuring the JSON Schema

Using PyCharm

Using The Web

About

Topics

Resources

Stars

Watchers

Forks

Languages