Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 1.6 KB

File metadata and controls

27 lines (18 loc) · 1.6 KB

Detecting issues in an image classification dataset (Caltech-256) with Datalab

This example uses cleanlab's Datalab class to audit an image dataset. Here we run Datalab with a Swin Transformer model trained for classification.

There are two notebooks:

  • train_image_classifier.ipynb - Trains a Swin Transformer classifier model on a subset of Caltech-256

    • Install dependencies with:

      pip install -r requirements-train.txt --extra-index-url https://download.pytorch.org/whl/cu116
      
  • datalab.ipynb - Audits the dataset using Datalab applied to outputs from the trained model.

    • Install dependencies with
      pip install -r requirements-datalab.txt
      

You can also audit your dataset for the same issues detected by Datalab (and more) without having to: write code, train your own machine learning model, or set up your own interface to the data/results. Cleanlab Studio does all this for you automatically. For image/text/tabular datasets, most users obtain better results with Cleanlab Studio vs. implementing their own solution (and achieve these results 100x faster).

Cleanlab Studio results for ImageNet dataset