Skip to content

This repository serves as a demo for River and its associated clustering module (2022 edition).

License

Notifications You must be signed in to change notification settings

hoanganhngo610/river-clustering-demo-2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demo of River and its associated clustering module (2022 edition)

This repository serves as a demo for River and its associated clustering module (2022 edition). This will be part of the tutorial(s) at the following conference(s):

  • 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 16th - May 19th 2022, Chengdu, China

  • 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), August 14th - August 18th 2022, Washington DC Convention Center, Washington, D.C., USA.

If you find the demo in particular, or the content of the tutorial as a whole, useful for your research and you would like to cite it as a scientific source, please cite it as:

@inproceedings{10.1145/3534678.3542600,
  author = {Montiel, Jacob and Ngo, Hoang-Anh and Le-Nguyen, Minh-Huong and Bifet, Albert},
  title = {Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking},
  year = {2022},
  isbn = {9781450393850},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3534678.3542600},
  doi = {10.1145/3534678.3542600},
  abstract = {Online clustering algorithms play a critical role in data science, especially with the advantages regarding time, memory usage and complexity, while maintaining a high performance compared to traditional clustering methods. This tutorial serves, first, as a survey on online machine learning and, in particular, data stream clustering methods. During this tutorial, state-of-the-art algorithms and the associated core research threads will be presented by identifying different categories based on distance, density grids and hidden statistical models. Clustering validity indices, an important part of the clustering process which are usually neglected or replaced with classification metrics, resulting in misleading interpretation of final results, will also be deeply investigated.Then, this introduction will be put into the context with River, a go-to Python library merged between Creme and scikit-multiflow. It is also the first open-source project to include an online clustering module that can facilitate reproducibility and allow direct further improvements. From this, we propose methods of clustering configuration, applications and settings for benchmarking, using real-world problems and datasets.},
  booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages = {4808–4809},
  numpages = {2},
  keywords = {online clustering, stream learning, data streams, decision support, stream clustering, benchmarking},
  location = {Washington DC, USA},
  series = {KDD '22}
  }

About

This repository serves as a demo for River and its associated clustering module (2022 edition).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published