-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Google summer of code (GSOC) 2015
This is the page for coordination of the GSoC for scikit-learn.
Scikit-learn is a machine learning module in Python. See http://scikit-learn.org for more details.
Scikit-learn is taking part of the GSoC trough the Python Software Foundation: http://wiki.python.org/moin/SummerOfCode
Difficulty: Scikit-learn is a technical project. Contributing via a GSoC requires a number of expertise in Python coding as well as numerical and machine learning algorithms.
Important: Read: Expectations for prospective students
Application template: https://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2015 Please follow this template.
Also important: A letter from Gaël to former applicants. His suggestions are just as relevant this year.
Hi folks,
The deadline for applications is nearing. I'd like to stress that the scikit-learn will only be accepting high-quality application: it is a challenging, though rewarding, project to work with. To maximize the quality of your application, here are a few advice:
-
First discuss on the mailing list a pre-proposal. Make sure that both the scikit-learn team and yourself are entousiastic about the idea. Try to have one or two possible mentors that hold a dialog with you.
-
Satisfy the PSF requirements (http://wiki.python.org/moin/SummerOfCode/Expectations) briefly:
- Demonstrate to your prospective mentor(s) that you are able to complete the project you've proposed
- Blog for your GSoC project.
- Contribute at least one patch to the project
I'd add the the patch should be somewhat substantial, not just fixing typos.
To contribute patch, please have a look at the [contribution guide] (http://scikit-learn.org/dev/developers/index.html#contributing-code) and the Easy issues in the tracker.
- In parallel with 2, start a online document (google doc, for instance) to elaborate your final proposal, and if you manage to convince mentors, you can get feedback on it.
As a final note, I want to stress that GSOC projects are ambitious: we are talking about a few months of full time work. Thus the ideas proposed are idea challenging, and the students are supposed to draw a battle plan, with difficult variants and less difficult variants. The GSOC is a full major set of contributions, not a single pull request.
Good luck, I am looking forward to seeing the proposals. You'll see, the scikit is a big friendly and enthousiastic community,
Gaël
Disclaimer: This list of topics is currently being updated from last year's, and some information (like the names of possible mentors) is not definitive. Please e-mail the list with any questions.
Possible mentor: Olivier Grisel, Vlad Niculae, Peter Prettenhofer (backup)
Possible candidate:
Goal: Online or Minibatch SGD or similar on a squared l2 reconstruction loss + low rank penalty (nuclear norm) on scipy.sparse matrix: the implicit components of the sparse input representation would be interpreted by the algorithms as missing values rather than zero values.
Application: Build a scalable recommender system example, e.g. on the movielens dataset.
TODO: find references in the literature. Matrix Factorization Jungle
Possible mentors: Andreas Mueller, Gael Varoquaux, Vlad Niculae
Possible candidate:
- Refurbish the current GMM code to put it to the scikit's standards
- Reimplement VBGMM and DPGMM
- Implement a core-set strategy for GMM
http://las.ethz.ch/files/feldman11scalable-long.pdf http://videolectures.net/nips2011_faulkner_coresets/
Issue to get started : https://github.com/scikit-learn/scikit-learn/issues/4202
Possible mentor: Paolo Losi, Alex Gramfort, (others?)
Possible candidate:
Goal: Add the additive
( or additive_model
) directory and implement a few additive models.
- Help finishing up the PR by jcrudy on including pyearth into scikit
- Add Generalized Additive Model ( GAM )
- Add SpAM ( Sparse Additive Model )
- Add GAMLSS ( GAM for Location Scale and Shape )
- Add LISO ( LASSO ISOtone for High Dimensional Additive Isotonic Regression)
References:
- GAM ( Generalized Additive Model )
- MARS ( Multivariate Adaptive Regression Splines ) :
- SpAM
- GAMLSS ( GAM for Location Scale and Shape )
- LISO ( Lasso ISOtone for High Dimensional Additive Isotonic Regression)
- High Dimensional Additive Modelling - Lucas et al.
- SpAM implemented by Juemin Yang, in R, as a GSoC 2011 project
Possible mentor:
Possible candidate: Barmaley-exe
Goal: add some of metric learning algorithms (like NCA, ITML, LMNN) to be used with KNNs and as transformers. Brian Kulis has a survey and a tutorial on metric learning, that seem to be a good place to start.
Here are people that have said that they might be available for mentoring:
Gaël Varoquaux, Vlad Niculae, Olivier Grisel, Andreas Mueller, Alexandre Gramfort, Arnaud Joly, Michael Eickenberg.