Overcoming Theoretical Limitations of Soft Attention

Authors: Giacomo Camposampiero, Bernhard Hilmarsson, Franz Nowak, and Clemente Pasti

Date: 31st July 2022

Introduction

In response to the limitations postulated by Hahn (2020), Chiang and Cholak (2022) showed that there exist transformers that can recognize PARITY with perfect accuracy.

In this work, we have verified their results experimentally, finding that the exact solutions do indeed work as described, but that PARITY cannot be learned. We then extended their results by deriving custom transformer weights that, at least in in theory, can recognize instances of the regular language ONE and the context free language PALINDROME for arbitrary input sizes n.

In practice, however, the solution for PALINDROME does not generalise to longer sequences due to floating point precision errors.

This work was developed as course project for the course Advanced Formal Language Theory, Spring 2022 at ETH Zürich.

This README provides an overview of the code developed for our project on limitations of soft attention.

Getting started with the code

To install all the required dependencies, use

pip install -r requirements.txt

Replicating the experiments

To replicate our results for learned implementations, use the following commands. The replication of learned experiments might take few days, depending on the machine where the code is run.

chmod 700 experiments/learned_experiments.sh
./experiments/learned_experiments.sh

To replicate our results for exact implementations, use

chmod 700 experiments/exact_experiments.sh
./experiments/exact_experiments.sh

To replicate the graphs included in our paper, use the plot.py script provided inside the directories of each experiment category. For example, to replicate the graphs for FIRST learning experiment, use

python experiments/learn/plot.py

Experiment and result logging

Results of experiments are stored in the data directory.

data/models.csv contains information on the run and the model.
data/results.csv contain training information (validation and training loss/accuracy) for each epoch of different runs

Every traning (or testing in exact implementations) run is logged with a unique run identification number. The results and models tables can be joined on the attribute runid.

Example of floating point precision error

Running experiments/floating_point_error.py will show a similar floating point error that affect the results of Palindrome (for perfect precision, the result should be 0).

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
data		data
experiments		experiments
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
exact.py		exact.py
learned.py		learned.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

experiments

experiments

notebooks

notebooks

src

src

.gitignore

.gitignore

README.md

README.md

exact.py

exact.py

learned.py

learned.py

Repository files navigation

Overcoming Theoretical Limitations of Soft Attention

Introduction

Getting started with the code

Replicating the experiments

Experiment and result logging

Example of floating point precision error

Feature list

About

Releases

Packages

Contributors 4

Languages

giacomocamposampiero/palindrome-transformer

Folders and files

Latest commit

History

Repository files navigation

Overcoming Theoretical Limitations of Soft Attention

Introduction

Getting started with the code

Replicating the experiments

Experiment and result logging

Example of floating point precision error

Feature list

About

Topics

Resources

Stars

Watchers

Forks

Languages