Skip to content

Latest commit

 

History

History
81 lines (61 loc) · 3.83 KB

README.md

File metadata and controls

81 lines (61 loc) · 3.83 KB

Enhanced and Repackaged GIT Clustering 🌐🔍

📦 Discover the Package on TestPyPI: git_cluster Package

🔍 Dive Deeper in Our GitHub Repository: Git-Clustering GitHub Repo

About 📖

This repository introduces an enhanced version of the GIT (Graph of Intensity Topology) clustering algorithm. It's been augmented with additional methods, repackaged for ease of use, and includes comprehensive benchmarks to demonstrate its performance. 🚀

Features ✨

  • Broad Applicability: Tested across a variety of datasets. 🌍 (See the benchmarks in the notebooks/).
  • User-friendly Packaging: Simplified integration into your projects. 📦

Usage 🛠️

To get started, explore the notebooks/Quick_Start_with_GIT.ipynb notebook for a step-by-step guide on applying this algorithm to your data.

Testing in Google Colab 🧪

To validate the installation and functionality of the GIT Clustering package, you can either run the steps manually following the instructions below or click the Open in Colab button to open a Colab notebook where everything is set up for you.

Run in Colab

Manual Installation and Execution

Follow these steps to manually install the GIT Clustering package and test its functionality:

  1. Install the GIT Clustering package from TestPyPI and upgrade gdown for dataset downloading:

    !pip install -i "https://test.pypi.org/simple/" git_cluster
    !pip install -U gdown
  2. Download the datasets and prepare it for use:

    !gdown 1yNwCStP3Sdf2lfvNe9h0WIZw2OQ3O2UP && unzip datasets.zip
  3. Execute a sample clustering process:

    from git_cluster import GIT
    from utils import alignPredictedWithTrueLabels, autoPlot
    from dataloaders import Toy_DataLoader as Toy_DataLoader
    
    # Load the Circles Dataset
    X_circles, Y_circles_true = Toy_DataLoader(name='circles', 
                                              path="/content/datasets/toy_datasets").load()
    
    # Create an instance of the GIT clustering
    git = GIT(k=12, target_ratio=[1, 1])
    
    # Fit the GIT model to the dataset and predict cluster labels.
    Y_circles_pred = git.fit_predict(X_circles)
    
    # Plot the dataset and highlight the clusters with different colors.
    autoPlot(X_circles, Y_circles_pred)

Acknowledgments 🎉

  • We extend our thanks to the original authors of the GIT algorithm for their foundational work in Clustering Based on Graph of Intensity Topology:
    • Gao, Zhangyang and Lin, Haitao and Tan, Cheng and Wu, Lirong and Li, Stan and others.

Citing This Work 📝

If you use the GIT Clustering algorithm in your research or project, please consider citing the original work:

@article{gao2021git,
  title={Git: Clustering Based on Graph of Intensity Topology},
  author={Gao, Zhangyang and Lin, Haitao and Tan, Cheng and Wu, Lirong and Li, Stan and others},
  journal={arXiv preprint arXiv:2110.01274},
  year={2021}
}

Connect with me 🌐