Skip to content

DavidRitzwoller/pubmed_clinical_trials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Census of PubMed/MEDLINE Clinical Trial Publications

Overview

This repository contains the data produced by Durvasula, Eyuboglu, and Ritzwoller (2024). The data provide a census of publications indexed in PubMed/MEDLINE that report the results prospective, interventional clinical trials that evaluate the effects of investgational or approved drugs in a setting with exclusively human subjects.

Data Description

The repository includes a single file, sample.csv.

This file has five fields:

  • pmid: The unique identifier assigned to each publication by PubMed/MEDLINE.
  • prob: The predicted probability output by our ensemble, fine-tuned, language model. Large numbers indicate that the publication is more likely report the results of a clinical trial.
  • liberal, moderate, conservative: Binary variables indicating whether each publication is an element of the associated sample. We construct three samples by restricting attention to publications whose prob is above three thresholds. We refer to these samples as the liberal sample, the moderate sample, and the conservative sample. The thresholds associated with these samples are 0.0124, 0.221, and 0.494, respectively.

The file has 1,810,510 entries. The entries correspond to the set of publications indexed by PubMed/MEDLINE that were published between 2010 and 2022, which satisfy a set of sample restrictions enumerated in Appendix A.2 of Durvasula, Eyuboglu, and Ritzwoller (2024).

Usage

The conservative sample has the lowest false positive rate, and will include the smallest number of publications that are not reporting the results of a clinical trial. The liberal sample has the largest true positive rate, and so will omit the smallest number of publications that are not reporting the results of a clinical trial.

Acknowledgements

We gratefully acknowledge financial support from the National Science Foundation, the Knight Hennessy Scholars Program, the Stanford Law School John M. Olin Program in Law and Economics, the National Bureau of Economic Research Innovation Information Initiative Summer Fellows Program, and the OpenAI Researcher Access Program.

About

Census of PubMed/MEDLINE Clinical Trial Publications

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published