A New Approach to Building a Skills Taxonomy

The full technical report and blog article for this project can be found here and here.

Introduction

There is no official and fully open skills taxonomy in the UK. There is a really important need for such a taxonomy that would enable consistent conceptualisation of workforce skills, together with consistent terminology and language around skills used by educators, careers advisers, policy makers and employers. The lack of a consistent language has multiple consequences such as creating confusion over the skills required for particular roles or the training needs of employees. At the same time, the effects of COVID-19 and Brexit have triggered rapid changes in skill demands as well as new skill shortages. This shifting landscape has only increased the need for an open and up-to-date skills taxonomy for the UK which could help to provide better quality and up to date information, in turn to better inform policy.

Therefore, in partnership with the Economic Statistics Centre of Excellence (ESCoE), we are releasing an updated skills taxonomy that is more open, more up-to-date and methodologically refined.

This repo contains the source code for this project.

An overview of the methodology, coloured by the three main steps to the pipeline, can be visualised below:

The taxonomy file

The taxonomy file is given here. To view this JSON file in a friendly format, you should download it and open it using Firefox. Alternatively, you could also use an online tool such as JSON formatter.

Pipeline steps

More details of the steps included in this project, and running instructions, can be found in their respective READMEs:

tk_data_analysis - Get a sample of the TextKernel job adverts (scripts only; the data is no longer available).
sentence_classifier - Training a classifier to predict skill sentences.
skills_extraction - Extracting skills from skill sentences.
skills_taxonomy - Building the skills taxonomy from extracted skills.

Analysis

This repository also contains various pieces of analysis of the taxonomy. These are discussed in the main analysis README file.

Examples of the hierarchy

Running the code

This repository has been made public in the interest of openness, and hopefully that some of the scripts and functions it contains may be useful for others. However, the TextKernel dataset of job adverts is not available for use anymore (either by Nesta staff or the general public). Because of this, the pipeline can no longer be run from start to finish.

Conda environment

When you are running scripts from this repo for the first time you need to create the environment by running make conda-create to create the conda environment. Then everytime after this you can activate it using conda activate skills-taxonomy-v2. If you update the requirements then run make conda-update.

As a one off, if needed, you will also have to run:

conda install pytorch torchvision torchaudio -c pytorch
conda install -c conda-forge spacy==3.0.0
python -m spacy download en_core_web_sm
conda install cdlib=0.2.3

and

conda install -c anaconda py-xgboost

or, if you aren't using anaconda:

conda install -c conda-forge py-xgboost

Contributor guidelines

Technical and working style guidelines

Project based on Nesta's data science project template (Read the docs here).

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
bin		bin
docs		docs
inputs		inputs
outputs		outputs
skills_taxonomy_v2		skills_taxonomy_v2
.env.shared		.env.shared
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
jupytext.toml		jupytext.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

nestauk/skills-taxonomy-v2

Folders and files

Latest commit

History

Repository files navigation

A New Approach to Building a Skills Taxonomy

Introduction

The taxonomy file

Pipeline steps

Analysis

Examples of the hierarchy

Running the code

Conda environment

Contributor guidelines

About

Resources

License

Stars

Watchers

Forks

Languages