This directory provides the scripts and recipe to run BERT and GroupBERT models for NLP pre-training and fine-tuning tasks (SQuAD and GLUE) on Graphcore IPUs.
This README is structured to show the datasets required to train the model, how to quickly start training BERT and GroupBERT on Graphcore IPUs, details about how to profile the application, and how to use popdist
to train at scale on Graphcore IPU-PODs.
- Benchmarking
- Datasets
- File structure
- Quick start guide
- Pre-training of BERT Large on Wikipedia
- Fine-tuining of BERT Large on SQuAD
- Fine-tuning of BERT Large on GLUE
- Information about the application
- Profiling your applications
- Multi-Host training using PopDist
- Pre-training and fine-tuning of GroupBERT Base
September 2021:
- Added inference program using the IPU embedded application runtime.
- Update LAMB to be disabled on bias parameters.
October 2021:
- Use off chip replicated optimizer state sharding feature in pretraining.
To reproduce the published Mk2 throughput benchmarks, please follow the setup instructions in this README, and then follow the instructions in README_Benchmarks.md
The Wikipedia dataset contains approximately 2.5 billion wordpiece tokens. This is only an approximate size since the Wikipedia dump file is updated all the time.
If full pre-training is required (with the two phases with different sequence lengths) then data will need to be generated separately for the two phases:
- once with --sequence-length 128 --mask-tokens 20 --duplication-factor 5
- once with --sequence-length 384 --mask-tokens 56 --duplication-factor 5
See the bert_data/README.md
file for more details on how to generate this data.
File | Description |
---|---|
run_pretraining.py |
Main training loop of pre-training task |
ipu_utils.py |
IPU specific utilities |
ipu_optimizer.py |
IPU optimizer |
log.py |
Module containing functions for logging results |
bert_data/ |
Code for using different datasets. - data_loader.py : Dataloader and preprocessing.- create_pretraining_data.py : Script to generate tfrecord files to be loaded from text data.- pretraining.py : Utility for loading the pre-training data.- squad.py : Utility for loading the SQuAD fine-tuning data.- glue.py : Utility for loading the GLUE fine-tuning data. - squad_results.py : Utility for processing SQuAD results. - tokenization.py : Utility for processing tokenization of the data. - wiki_processing.py : Process the Wikipedia data. |
modeling.py |
A Pipeline Model description for the pre-training task on the IPU. |
lr_schedules/ |
Different LR schedules - polynomial_decay.py : A linearily decreasing warmup with option for warmup.- natural_exponential.py : A natural exponential learning rate schedule with optional warmup.- custom.py : A customized learning rate schedule with given lr_schedule_by_step . |
run_squad.py |
Main training and inference loop for SQuAD (1.1 and 2.0) fine-tuning. |
run_classifier.py |
Main training and inference loop for GLUE fine-tuning. |
loss_scaling_schedule.py |
Sets a loss scaling schedule based on config arguments. |
scripts/ |
Directory containing a number of utility scripts: - create_wikipedia_dataset.sh : Generate the wikipedia tf_record datafiles for pre-training- fine_tune_squad.sh : Fine tune BERT Base or Large from the latest Phase 2 checkpoint on SQuAD 1.1 and 2.0.- fine_tune_glue.sh : Fine tune BERT Base or Large from the latest Phase 2 checkpoint on GLUE tasks.- fine_tune_GroupBERT_glue.sh : Fine tune GroupBERT Base or Large from the latest Phase 2 checkpoint on GLUE tasks.- pretrain_distributed.sh : Run pre-training of BERT Large on the Graphcore IPU-POD64.- pretrain.sh : The main pre-training script. Pre-train BERT (large or base) on Graphcore IPUs. This script will run Phase 1, and use the results to train Phase 2 on the Wikipedia dataset.- pretrain_distributed.sh : Script to train BERT (base / large) end-to-end. |
1) Download the Poplar SDK
Download and install the Poplar SDK following the Getting Started guide for your IPU system. Source the enable.sh
script for poplar.
2) Configure Python virtual environment
Create a virtual environment and install the appropriate Graphcore TensorFlow 1.15 wheel from inside the SDK directory:
virtualenv --python python3.6 .bert_venv
source .bert_venv/bin/activate
pip install -r requirements.txt
pip install <path to the tensorflow-1 wheel from the Poplar SDK>
As an example we will create data from a small sample: bert_data/sample.txt
, however the steps are the same for a large corpus of text. As described above, see bert_data/README.md
for instructions on how to generate pre-training data for the Wikipedia dataset.
1) Download the vocab file
You can download a vocab from the pre-trained model checkpoints at https://github.com/google-research/bert. For this example we are using Bert-Base, uncased
.
2) Create the data
Create a directory to keep the data.
mkdir data
Download and unzip the files
cd data
wget <path_to_bert_uncased>
unzip bert_base_uncased.zip
#check that vocab.txt is actually there!
cd ../
bert_data/create_pretraining_data.py
has a few options that can be viewed by running with -h/--help
.
Data for the sample text is created by running:
python3 bert_data/create_pretraining_data.py \
--input-file Datasets/sample.txt \
--output-file Datasets/sample.tfrecord \
--vocab-file data/vocab.txt \
--do-lower-case \
--sequence-length 128 \
--mask-tokens 20 \
--duplication-factor 5
--input-file/--output-file
can take multiple arguments if you want to split your dataset between files.
When creating data for your own dataset, make sure the text has been preprocessed as specified at https://github.com/google-research/bert. This means with one sentence per line and documents delimited by empty lines.
The option --remask
can be used to move the masked elements at the beginning of the sequence. This will improve the inference and training performance.
Now that the data are ready we can start training our BERT tiny on the IPU! Run this config:
python3 run_pretraining.py --config configs/pretrain_tiny_128_lamb.json --train-file ./bert_data/sample.tfrecord
The configs/pretrain_tiny_128_lamb.json
file is a small model that can be used for simple experiments.
This config file has a first part that specifies the model, BERT tiny, the second part is more about the optimisation where we specify the learning rate, the learning rate scheduler, the batch size and the optimiser (in this case we are using LAMB but other options can be used like momentum, ADAM, and SGD).
This project supports Weights & Biases, a platform to keep track of machine learning experiments. A client for Weights&Biases will be installed by default and can be used during training by passing the --wandb
flag.
The user will need to manually log in (see the quickstart guide here) and configure the project name with --wandb-name
.)
For more information please see https://www.wandb.com/.
Once logged into wandb logging can be activated by running:
python3 run_pretraining.py --config configs/pretrain_tiny_128_lamb.json --train-file ./Dataset/sample.tfrecord --wandb
You can also name you wandb run with the flag --wandb-name <YOUR RUN NAME HERE>
.
The steps to follow to run the BERT Large model are exactly the same as before.
As first you need to create the Wikipedia pre-training data using the script in the bert_data
directory, see the README there for the details.
After this you can run BERT Large on 16 IPUs at global batch size 65k using LAMB with:
python3 run_pretraining.py --config configs/pretrain_large_128_phase1.json
Remember to adapt the config file inserting the path top your 128 dataset or add the flag --train-file
as we did before.
At the end of the training, the script will save a checkpoint of the final model, this checkpoint will be used as the starting point of Phase 2 training. The expected losses for Phase 1 will be approximately (rounded to three decimal places):
MLM | NSP |
---|---|
1.377 | 0.023 |
The typical loss curves for Phase 1 pre-training will look like:
Phase 2 pre-training uses a global batch size of 16k, and can be run with the following command:
python3 run_pretraining.py --config configs/pretrain_large_384_phase2.json --init-checkpoint /path/to/final/ckpt/of/phase1
Remember to insert in the config files the path to the sequence length 384 dataset and be sure that the number of masked tokens in the json matches the number used in the creation of the dataset.
In phase 2 pretraining the final phase 1 checkpoint is expected to be passed in using --init-checkpoint
option. At the end of Phase 2 pre-training, the losses are expected to be approximately (rounded to three decimal places):
MLM | NSP |
---|---|
1.264 | 0.019 |
The typical loss curves for Phase 2 pre-training will look like:
Note that the configuration flag --static-mask
must be set if the datasets was generated with the remasking option. A dataset that does not require the --static-mask
flag may end up using more memory than a dataset that does, like the ones presented in the config files folder. Reducing the --available-memory-proportion
parameter may be required in this case.
For simplicity in the scripts/
directory there is a script that manages BERT pre-training from a single command. This script can be used to pre-train BERT on a Graphcore IPU system with 16 IPUs.
To run with the default configuration for BERT Large
as given in configs/pretrain_large_128_phase1.json
and configs/pretrain_large_128_phase2.json
simply run:
./scripts/pretrain.sh large
This will launch a BERT-Large pre-training over 16 IPUs, consisting of 4x replication of a 4 IPU pipelined model.
You will need to check the paths (PHASE1_CONFIG_PATH
,
PHASE1_TRAIN_FILE
,
PHASE2_TRAIN_FILE
,
PHASE1_CHECKPOINT
) given in the pretrain.sh
script match your local paths where you have saved the data / the config you wish to run.
To run pre-training for BERT-Base simply run:
./scripts/pretrain.sh base
Provided are the scripts to fine tune BERT on the Stanford Question Answering Dataset (SQuAD), a popular question answering benchmark dataset. There are two versions of SQuAD, SQuAD 1.1 and SQuAD 2.0. Compared to SQuAD 1.1, SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions to look similar to answerable ones.
To run on SQuAD, you will first need to download the dataset. The necessary training and evaluation files for both SQuAD 1.1 and 2.0 can be found from the following links:
Place these files in a directory squad/
(for SQuAD 1.1) or squad2/
(for SQuAD 2.0) parallel to the Wikipedia data, the expected path can be found in the configs/squad_large.json
file.
To run BERT fine-tuning on SQuAD requires the same set up as for pre-training, follow the steps in Prepare Environment to activate the SDK and install the required packages.
The fine-tuning for a Large model on SQuAD 1.1 can be run with the following command:
python3 run_squad.py --config configs/squad_large.json --do-training --init-checkpoint /path/to/phase2/large/checkpoint.ckpt
for SQuAD 2.0:
python3 run_squad.py --config configs/squad_large_V2.json --do-training --init-checkpoint /path/to/phase2/large/checkpoint.ckpt
where the checkpoint given must be a pretrained Phase 2 Large model. This will output a fine-tuned checkpoint that can be used for prediction in the following step.
(The same command can be run with the configs/squad_base.json
configuration if you provide a Phase 2 BERT Base model as an initial checkpoint.)
The prediction can then be run with the following command (SQuAD 1.1):
python3 run_squad.py --config configs/squad_large.json --do-predict --init-checkpoint /path/to/squad/large/checkpoint.ckpt
for SQuAD 2.0:
python3 run_squad.py --config configs/squad_large_V2.json --do-predict --init-checkpoint /path/to/squad/large/checkpoint.ckpt
This will output a set of predictions to the location specified in the SQUAD config file. These predictions can be evaluated as (SQuAD 1.1):
python3 /path/to/evaluate-v1.1.py /path/to/dev-v1.1.json /path/to/predictions.json
for SQuAD 2.0:
python3 /path/to/evaluate-v2.0.py /path/to/dev-v2.0.json /path/to/predictions.json
This will output the Exact Match (EM) and F1 scores of the final fine-tuned BERT model.
For simplicity the run_squad.py
script can be run straight through.
This means by running the following (for SQuAD 1.1):
./run_squad.py --config configs/squad_large.json --do-training --do-predict --do-evaluation --init-checkpoint /path/to/phase2/model.ckpt
and for SQuAD 2.0:
./run_squad.py --config configs/squad_large_V2.json --do-training --do-predict --do-evaluation --init-checkpoint /path/to/phase2/model.ckpt
fine-tuning, prediction, and evaluation can be run with a single command.
As with pre-training, run_squad.py
takes an option to log results to Weights and Biases, this functionality can be turned on by adding the command line options --wandb --wandb-name <DESIRED NAME>
.
For simplicity in the scripts/
directory there is a script that manages BERT fine-tuning from a single command. This script can be used to fine-tune BERT on a Graphcore IPU system with 4 IPUs.
To run with the default configuration for BERT Large
on SQuAD 1.1 as given in configs/squad_large.json
simply run:
./scripts/fine_tune_squad.sh large v1
This will launch a BERT Large fine-tuning over 4 IPUs, on completion the predictions will be made and the official evaluation performed on the SQuAD results. The final EM (exact match) and F1 scores will be displayed. The mean and variance of EM and F1 scores with 5 different seeds are presented here:
Accuracy metric | Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Mean | Standard deviation |
---|---|---|---|---|---|---|---|
Exact match % | 84.12 | 84.30 | 83.94 | 84.03 | 84.39 | 84.15 | 0.19 |
F1 % | 90.84 | 91.06 | 90.70 | 90.72 | 91.16 | 90.90 | 0.21 |
To run the same fine-tunning, prediction, and evaluation for BERT Base
simply run:
./scripts/fine_tune_squad.sh base v1
For SQuAD 2.0, simply run:
./scripts/fine_tune_squad.sh large v2
for BERT large and
./scripts/fine_tune_squad.sh base v2
for BERT base.
Finally a script is provided to train BERT (base or large) end-to-end. This script performs pre-training Phase 1, pre-training Phase2, fine-tuning on SQuAD 1.1, predictions on SQuAD 1.1, and evaluates the results to obtain EM and F1 scores. This script run is the same manner as the previous scripts, ensure the data and environment is set up correctly, then run:
./scripts/pretrain_and_finetune_BERT.sh large
The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems. There are in total 11 different tasks, in which 10 of them (CoLA, MRPC, MNLI, MNLI-mm, AX, QNLI, QQP, RTE, SST-2, WNLI) are classification tasks and 1 (STS-B) of them is a regression task.
You will first need to download the dataset. The necessary training, development and test data can be downloaded by running:
python download_glue_data.py --data_dir glue_data --tasks all
To run BERT fine-tuning on GLUE requires the same set up as for pre-training, follow the steps in Prepare Environment to activate the SDK and install the required packages.
The fine-tuning for a Large model on a certain GLUE classification task can be run with the following command:
python3 run_classifier.py --config configs/glue_large.json --task-name your_glue_task_name --data-dir glue_data/your_glue_datadir_name --do-training --init-checkpoint /path/to/phase2/large/checkpoint.ckpt
where the checkpoint given must be a pretrained Phase 2 Large model. The task name must be one of the 10 glue classification task names in lower case: cola, mrpc, mnli, mnli-mm, ax, qnli, qqp, rte, sst2 or wnli. The corresponding data directory names are CoLA, MRPC, MNLI, MNLI, AX, QNLI, QQP, RTE, SST-2 or WNLI.
For the regression task, run:
python3 run_classifier.py --config configs/glue_large.json --task-name stsb --data-dir glue_data/STS-B --do-training --init-checkpoint /path/to/phase2/large/checkpoint.ckpt
This will output a fine-tuned checkpoint that can be used for evaluation in the following step.
(The same command can be run with the configs/glue_base.json
configuration if you provide a Phase 2 BERT Base model as an initial checkpoint.)
The evaluation on the development set accuracy can then be run with the following command:
python3 run_classifier.py --config configs/glue_large.json --task-name your_glue_task_name --data-dir glue_data/your_glue_datadir_name --do-eval --init-checkpoint /path/to/glue/finetuned/checkpoint.ckpt
for classification tasks and
python3 run_classifier.py --config configs/glue_large.json --task-name stsb --data-dir glue_data/STS-B --do-eval --init-checkpoint /path/to/glue/finetuned/large/checkpoint.ckpt
for the regression task.
Finally, the prediction on the test set:
python3 run_classifier.py --config configs/glue_large.json --task-name your_glue_task_name --data-dir glue_data/your_glue_datadir_name --do-predict --init-checkpoint /path/to/glue/finetuned/checkpoint.ckpt
for classification tasks and
python3 run_classifier.py --config configs/glue_large.json --task-name stsb --data-dir glue_data/STS-B --do-predict --init-checkpoint /path/to/glue/finetuned/checkpoint.ckpt
for the regression task. This will output the predicted lables which can be used for GLUE leaderboard submission. Note that for the tasks MNLI-mm and AX, fine-tuning is done on MNLI train data. Evaluation on development set and prediction on test set are done on MNLI-mm datasets. For AX, there is only prediction for test set.
For simplicity the run_classifier.py
script can be run straight through.
This means by running the following:
python3 run_classifier.py --config configs/glue_large.json --task-name your_glue_task_name --data-dir glue_data/your_glue_datadir_name --do-training --do-eval --do-predict --init-checkpoint /path/to/phase2/large/checkpoint.ckpt
and
python3 run_classifier.py --config configs/glue_large.json --task-name stsb --data-dir glue_data/STS-B --do-training --do-eval --do-predict --init-checkpoint /path/to/phase2/large/checkpoint.ckpt
fine-tuning, evaluation, and prediction can be run with a single command.
As with pre-training and fine-tuning on SQuAD, run_classifier.py
takes an option to log results to Weights and Biases, this functionality can be turned on by adding the command line options --wandb --wandb-name <DESIRED NAME>
.
For simplicity in the scripts/
directory there is a script that manages BERT fine-tuning of GLUE from a single command. This script can be used to fine-tune BERT on a Graphcore IPU system with 4 IPUs.
To run with the default configuration for BERT Large
on GLUE as given in configs/glue_large.json
simply run:
./scripts/fine_tune_glue.sh large your_glue_task_name
This will launch a BERT Large fine-tuning over 4 IPUs, on completion the evluations will be made and predictions will be written out.
To run the same fine-tunning, prediction, and evaluation for BERT Base
simply run:
./scripts/fine_tune_glue.sh base your_glue_task_name
The config files provided demonstrate just a sample of what you can do with this application.
Aspects that can be changed include the optimiser, learning rate schedule and model size/shape.
Use the -h/--help
option to see the different options that can be used.
The command line options will override the settings contained within a config file.
For example,
python3 run_pretraining --config configs/pretrain_tiny_128_lamb.json --sequence-length 384
will run a job with seq len 384, overriding the 128 present in the config.
The PopVision Graph Analyser and System Analyser are the two main tools to inspect the behaviour of your application on the IPU, they can be downloaded from Download. Here, we are going focus on the Graph Analyser and how to use it to profile the execution and memory utilisation of BERT in the IPU. Clearly the same procedure can be applied to any other application.
The first thing we are going to see is how we can inspect the hardware utilisation. This will give us insights on how much space left is there on the device, this is important information that we can use to increase the micro batch size or use a more complex optimiser.
Profiling data can be generated by the compiler by setting the following options:
POPLAR_ENGINE_OPTIONS='{"autoReport.outputExecutionProfile":"false", "debug.allowOutOfMemory": "true", "debug.outputAllSymbols": "true", "autoReport.all": "true", "autoReport.directory":"./memory_report"}'
the field "autoReport.directory":"./memory_report"
in this example is pointing to the directory where the memory profile will be found.
When profiling an application it can be useful to inspect the log output from the different layers of the software stack in combination with the information displayed in the PopVision Graph Analyser. Due to the verbosity of the log level, it is good practice to redirect the output into a file, here is an example:
POPLAR_LOG_LEVEL=INFO POPLIBS_LOG_LEVEL=INFO TF_CPP_VMODULE='poplar_compiler=1' POPLAR_ENGINE_OPTIONS='{"autoReport.outputExecutionProfile":"false", "debug.allowOutOfMemory": "true", "debug.outputAllSymbols": "true", "autoReport.all": "true", "autoReport.directory":"./memory_report"}' python3 run_pretraining.py --config configs/pretrain_tiny_128_lamb.json --compile-only --generated-data > output.log 2>&1 &
In the previous command we set the log level for POPLAR, POPLIBS to INFO and we also set TensorFlow log to be 'poplar_compiler=1'.
In this example we are taking advantage of two specific flags of run_pretraining.py
that can improve the workflow: --generated-data
and --compile-only
.
The first uses random data generated on the host instead of real data.
The second is more interesting and enables the compilation of the application, getting the memory profile, without attaching to any IPUs.
This makes possible for example to run a lot of experiments with different hyperparameters in order to understand which one is going to use the hardware better.
The IPUs can then be used on jobs that require physical hardware such as convergence experiments or to obtain execution profiles, which is the theme of the next section.
The method presented in the previous section will allow you inspect the memory profile, here we show how to use the PopVision Graph Analyser to inspect the execution of your application. As before, this information can be obtained with specific POPLAR_ENGINE_OPTIONS
:
POPLAR_ENGINE_OPTIONS='{"autoReport.outputExecutionProfile":"true", "debug.allowOutOfMemory": "true", "debug.outputAllSymbols": "true", "autoReport.all": "true", "autoReport.directory":"./execution_report"}'
This option is very similar to the previous example to obtain a memory profile, the only difference is "autoReport.outputExecutionProfile":"true"
.
In this case, it is necessary to attach to the IPU since the code has to be run, so no --compile-only
flag can be used.
In order to have a easily readable execution trace, it is good practice to modify the execution of your application. We would like for example to execute just a single step, and we would like that batches-per-step is set to 1. For models being executed in a pipeline configuration, we are going also to set the pipeline to the minimal value possible. The previous suggestions are due to the fact that every step is the same so it is much easier to inspect just one of them, and the second is that an extremely deep pipeline will be difficult to navigate and inspect, and it is anyway the repetition of stages that are the same. We can then set the pipeline to a minimal value. Let's make an example with a large application, like the phase1 training with LAMB:
POPLAR_ENGINE_OPTIONS='{"autoReport.outputExecutionProfile":"true", "debug.allowOutOfMemory": "true", "debug.outputAllSymbols": "true", "autoReport.all": "true", "autoReport.directory":"./execution_report"}' python run_pretraining.py --config configs/pretrain_large_128_phase1.json --steps 1 --batches-per-step 1 --gradient-accumulation-count 10
In this way the execution profile will be dropped together with the memory profile and we can inspect how many cycles the hardware is spending on each operation and this can give us insight on possible optimisations and improvements.
PopDist is the tool shipped with the SDK that allows to run multiple SDK instances at the same time on the same or different hosts. The config files ending with '_POD64' and '_POD128' have been specifically designed to be trained using PopDist.
Before launching this we need to set up the V-IPU cluster and eventually a V-IPU partition, the procedure for which can be found in the V-IPU user-guide.
We need then to set up the dataset and the SDKs, it is important that these components are found on each host in the same global path. The same is valid for the virtual environment and for the run_pretraining.py
script.
Further details on how to set up poprun
, and the arguemnts used, can be found in the docs.
The relevant set up is provided in the script given in scripts/pretrain_distributed.sh
. This will be detailed in the following section.
We provide a utility script to run Phase 1 and Phase 2 pre-training on an IPU-POD64 machine. This script manages the config and checkpoints required for both phases of pre-training. This can be executed as:
./scripts/pretrain_distributed.sh <CONFIG> <VIPU_HOST> <HOSTS>
CONFIG: One of 'POD64' or 'POD128'. VIPU_HOST: IP address of VIPU host. HOSTS: Space separated list of IP addresses where the instances will be run. For POD64, this must be a single host. For POD128, this must be a host in each of the combined POD64s.
Inside this script the ssh keys will be copied across the hosts, as will the files in the BERT directory, as well as SDKs. Ensure the your directory structure aligns with that used in the script. The default MPI options are set in the script, however you will need to check that the environment variables below are set correctly for your machine:
PHASE1_CONFIG_PATH
: The path to the Phase 1 config file - this should not need changing
PHASE1_TRAIN_FILE
: The path to the Phase 1 (sequence length 128) wikipedia data
VIPU_PARTITION_NAME
: The name of the partition of the POD
PHASE2_CONFIG_PATH
: The path to the Phase 2 config file - this should not need changing
PHASE2_TRAIN_FILE
: The path to the Phase 2 (sequence length 384) wikipedia data
In distributed training there are different types of batch size. For clarity we define the following terms for this codebase:
micro_batch_size
- the number of samples calculated in one full forward/backward pass of the algorithm
replica_bath_size
- the number of samples that contribute to a weight update from a single replica
global_batch_size
- the number of samples that contribute to a weight update across all replicas
replica_bath_size = gradient_accumulation_iterations * micro_batch_size
global_batch_size = num_replicas * gradient_accumulation_iterations * micro_batch_size
GroupBERT is a BERT based model with a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions and furthermore. It relies on grouped transformations to reduce the computational cost of dense feed-forward layers and convolutions, while preserving the expressivity of the model.
More information about GroupBERT can be found via this link: https://arxiv.org/abs/2106.05822
The GroupBERT pretraining and fine-tuning can be run by using the config files under configs/groupbert The steps to follow are exactly the same as before.
For simplicity in the scripts/
directory there is a script that manages GroupBERT fine-tuning on GLUE from a single command. This script can be used to fine-tune GroupBERT on a Graphcore IPU system with 4 IPUs.
To run with the default configuration for GroupBERT Base
on GLUE as given in configs/groupbert/glue_base.json
simply run:
./scripts/fine_tune_GroupBERT_glue.sh base your_glue_task_name
This will launch a GroupBERT Base fine-tuning over 4 IPUs, on completion the predictions will be made and the accuracy on development set will be shown.
To run the same fine-tunning, prediction, and evaluation for GroupBERT Large
simply run:
./scripts/fine_tune_GroupBERT_glue.sh large your_glue_task_name