Skip to content

A list of topics for a Google summer of code (GSOC) 2011

agramfort edited this page Mar 29, 2011 · 19 revisions

A list of topics for a Google summer of code (GSOC) 2011

Online learning

Mentor : O. Grisel

Goal : Devise an intuitive yet efficient API dedicated to the incremental fitting of some scikit-learn estimators (on an infinite stream of samples for instance).

See this thread on the mailing list for a discussion of such an API. Design decision will be taken by implementing / adapting three concrete models:

  • text feature extraction
  • online clustering with sequential k-means
  • generalized linear model fitting with Stochastic Gradient Descent (both for regression and classification)

Dictionary Learning a.k.a. Sparse Coding

Mentor : Gael Varoquaux, Alex Gramfort

The objective is to bring to the scikit some recent yet very popular methods known as Dictionary Learning or Sparse Coding. It involves heavy numerical computing and has many applications from general signal/image processing to very applied topics such as biomedical imaging. The project will start from existing code snippets (see below) and will require to make some design decision to keep the API simple yet powerful as the rest of the scikit.

Some useful ressources with compatible License:

Boosting

Mentor : Satra

Manifold learning

Mentor : Fabian Pedregosa

Random forest

Mentor : Satra

(there is already a preliminary implementation in my fork) i would combine this with boosting/bagging

Locality Sensitive Hashing

Mentor : Mathieu Blondel?

There is an LSH implementation in pybrain (pybrain/supervised/knn/lsh)

Command line interface

Mentor : ?

Interaction with mldata.org

Mentor : ?

Clone this wiki locally