Skip to content

Consistency brigade

amueller edited this page May 29, 2012 · 7 revisions

Parameter naming

Things that are not consistent and should be fixed

  • SVC's parameter C should be lowercase (it's not a matrix)
  • The labels are sometimes stored as attribute classes (SGD), others as labels_ (SVMs), but should be stored as classes_ as is usually the case.
  • chunk_size parameters should be renamed to batch_size in all MiniBatch* models.
  • Single letter parameter names:
    • p in affinity propagation clustering
  • Which is better: n_train, train_fraction, train_size (in cross validation module)?

API

Models taking a symmetric kernel, affinity or distance matrix

Some models (SVC, KernelPCA, SpectralClustering...) can accept a precomputed kernel, affinity or distance matrix with shape (n_samples, n_samples) as main data argument in place of the traditional (n_samples, n_features) shaped design matrix.

This has been discussed in #803 and the current plan is to introduce an is_pairwise property or attribute.

GridSearchCV, cross_val_score and other tools should also be updated.