Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

CanItEdit is a benchmark for evaluating LLMs on instructional code editing, the task of updating a program given a natural language instruction. The benchmark contains 105 hand-crafted Python programs with before and after code blocks, two types of natural language instructions (descriptive and lazy), and a hidden test suite.

See our paper for more.

This repository provides code for evaluating models on the benchmark, and the code to reproduce EditPackFT and EditCoder, a dataset and a LLM built for instructional code editing.

The CanItEdit benchmark dataset, EditCoder model, and EditPackFT dataset can be found on HuggingFace:

CanItEdit: https://huggingface.co/datasets/nuprl/CanItEdit
EditCoder: https://huggingface.co/nuprl/EditCoder-6.7b-v1
EditPackFT: https://huggingface.co/datasets/nuprl/EditPackFT

Cloning the repository

It is very important to clone this repository and initialize all submodule recursively. This can be done with the following command:

git clone --recurse-submodules https://github.com/nuprl/CanItEdit

Structure

./benchmark contains the CanItEdit benchmark dataset and code for generating and evaluating completions
./editcoder contains code to train an EditCoder model
./editpackft contains code to reproduce the EditPackFT dataset
./requirements.txt contains the requirements for running the code in this repository

Citation

If you use this code or the CanItEdit benchmark, please cite our paper:

@inproceedings{cassano2023edit,
      title={Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions}, 
      author={Federico Cassano and Luisa Li and Akul Sethi and Noah Shinn and Abby Brennan-Jones and Anton Lozhkov and Carolyn Jane Anderson and Arjun Guha},
      booktitle={The First International Workshop on Large Language Model for Code},
      year={2024},
      url={https://arxiv.org/abs/2312.12450}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
MultiPL-E @ 0490c2f		MultiPL-E @ 0490c2f
benchmark		benchmark
editcoder		editcoder
editpackft		editpackft
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiPL-E @ 0490c2f

MultiPL-E @ 0490c2f

benchmark

benchmark

editcoder

editcoder

editpackft

editpackft

.gitmodules

.gitmodules

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

See our paper for more.

Cloning the repository

Structure

Citation

About

Releases

Packages

Languages

nuprl/CanItEdit

Folders and files

Latest commit

History

Repository files navigation

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

See our paper for more.

Cloning the repository

Structure

Citation

About

Resources

Stars

Watchers

Forks

Languages