Skip to content

lxtGH/OMG-Seg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


OMG-Seg: Is One Model Good Enough For All Segmentation?

CVPR, 2024
Xiangtai Li · Haobo Yuan . Wei Li . Henghui Ding · Size Wu · Wenwei Zhang ·
Yining Li . Kai Chen . Chen Change Loy

S-Lab, MMlab@NTU, Shanghai AI Laboratory

Xiangtai is the project leader and corresponding author.

arXiv PDF Project Page Project Page HuggingFace Model


avatar

Short Introduction

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the Segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to fill all these tasks in one model and achieve good enough performance.

We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training. Both the code and models will be publicly available.

Short introduction on VALSE of OMG-Seg with other related work, can be found here, in Chinese.

News !!

  • 🔥2023-4-06, Update the model trained with only one machine and demo scripts.
  • 🔥2024-3-18, Training Code of OMG-Seg are released !! Stronger Performance using Object-365-instance segmentation pre-train !!
  • 🔥2024-2-26, OMG-Seg is accepted by CVPR-2024 !!
  • 2024-1-19, Models and Test Code are released !!

Features of OMG-Seg

$\color{#2F6EBA}{Universal\ Image, Video, Open-Vocabulary, Segmentation\ Model}$

  • A new unified solution for over ten different segmentation tasks: PS, IS, VSS, VIS, VPS, Open-Vocabulary Seg, and Interactive Segmentation.
  • A novel unified view for solving multiple segmentation tasks in one model with extremely less parameters.

$\color{#2F6EBA}{Good\ Enough\ Performance}$

  • OMG-Seg achieves good enough performance on in one shared architecture, on multiple datasets. (only 70M trainable parameters)

$\color{#2F6EBA}{The\ First\ OpenSourced\ Universal\ Segmentation\ Codebase}$

  • Our codebase support joint image/video/multi-dataset co-training.
  • The first open-sourced codebase, including training, inference and demo.

$\color{#2F6EBA}{Easy\ \ Followed\ By\ Academic\ Lab}$

  • OMG-Seg can be reproduced by only one 32GB V100 or 40GB A100 machine, which can be followed by Academic Labs.

To-Do Plans

  • Release Strong Models. (To be Done)
  • Release training code. (done)
  • Release CKPTs.(done)
  • Support HuggingFace. (done)

Experiment Set Up

Dataset

See DATASET.md

Install

Our codebase is built with MMdetection-3.0 tools.

See INSTALL.md

Quick Start

Experiment Preparation

  1. First set up the dataset and environment. Make sure you have fixed and corresponding versions.

  2. Download pre-trained CLIP backbone. The scripts will automatically download the pre-trained CLIP models.

  3. Generate CLIP text embedding for each dataset and joint merged dataset for co-training. See the embedding generation.

  4. Run the train/test scripts below to carry out experiments on model training and testing.

Train

See the configs under seg/configs/m2ov_train.

./tools/dist.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py  8 --checkpoint pre_trained_model_path

Note that you can also use CLIP pre-trained models, by running the following command.

./tools/dist.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py  8 

We adopt slurm to train our model with 32 A100 GPUS.

PARTITION=YOUR_PARTITION JOB_NAME=YOUR_JOB_NAME GPUS=32 GPUS_PER_NODE=8 ./tools/slurm.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py 

Demo Scripts

Run the visualization scripts on COCO

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_coco.py 1 --checkpoint model_path --show-dir vis

Run the visualization scripts on VIPSeg

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_vipseg.py 1 --checkpoint model_path --show-dir vis

The color maps are dumped in the sub-folder vis in work_dir.

Test

See the configs under seg/configs/m2ov_val. Make sure you have set up the classification embeddings for testing.

Test Cityscape dataset, we observe 0.3% noises for Cityscapes panoptic segmentation

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_cityscapes.py 4 --checkpoint model_path

Test COCO dataset, we observe 0.5% noises for COCO panoptic segmentation

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_coco.py 4 --checkpoint model_path

Test Open-Vocabulary ADE dataset, we observe 0.8% noises for COCO panoptic segmentation

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_ade.py 4 --checkpoint model_path

Test Interactive COCO segmentation:

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_ov_coco_pan_point.py 4 --checkpoint model_path

Test Youtube-VIS-19 dataset

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_y19.py 4 --checkpoint model_path

Test VIP-Seg dataset

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_vipseg.py 1 --checkpoint model_path

Trained Model

ConvNeXt-large backbone. model

ConvNeXt-XX-large backbone. model

The Object-365 pretrained models can be found here.

Using one machine to re-run our codebase, ConvNeXt-large backbone. model, log.

Citation

If you think OMG-Seg codebase and models are useful for your research, please consider referring us:

@inproceedings{OMGSeg,
author       = {Xiangtai Li and
                  Haobo Yuan and
                  Wei Li and
                  Henghui Ding and
                  Size Wu and
                  Wenwei Zhang and
                  Yining Li and
                  Kai Chen and
                  Chen Change Loy},
  title        = {OMG-Seg: Is One Model Good Enough For All Segmentation?},
booktitle={CVPR},
  year={2024}
}

License

S-Lab LICENSE.

About

[CVPR-2024] One Model For Image/Video/Instractive/Open-Vocabulary Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published