Setting up tests to benchmark current and future code

1 data sets

binary:

name	size	N (train/test)	p	#nz (train)	used in	format	mldata
Leukemia	1.9M	72	3571	dense	¹	RData	yes
Newsgroup	9.4M	11,314	777,811	0.05%	¹	RData	no
Internet-Ad	49K	2359	1430	1.2%	¹	RData	no
a9a	2.3M/1.1M	32,561 / 16,281	123	451,592 (11%)	²	libsvm	yes
real-sim	33.6M	72,309	20,958	3,709,083 (0.2%)	²	libsvm	yes
rcv1	13.1M/432M	20,242 / 677,399	47,236	49,556,258 (0.15%)	²	libsvm	no

multiclass:

name	size	#class	N(train/test)	p	#nz	used in	format	mldata
news20	3.6M/0.9M	20	15,935 /3,993	1,355,191	9,097,916 (0,03%)	²	libsvm	yes
Cancer	22M	14	144	16,063	dense	¹	RData	no

regression:

name	size	N	p	#nz	used in	format
Prostate Cancer Data		97	9	dense	³	RData

¹ Friedman, J., T. Hastie, and R. Tibshirani. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33, no. 1 (2010): 1.
p.20 download data

² Yuan, G.X., K.W. Chang, C.J. Hsieh, and C.J. Lin. “A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification.” The Journal of Machine Learning Research 9999 (2010): 3183–3234.
p.3214 download data

³ Tibshirani, R. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) (1996): 267–288.

Questions:

Which data sets should be used?
Format to store them?
Other regression type data sets?

agramfort : I would as much as possible use mldata and the mldata loader we ship with sklearn (fetch_mldata)

2 problems to benchmark

l2 loss*
log loss*
multi-logit*

with l1 and l1 & l2 penalty

Questions:

Settings to use in benchmarking (penalty value etc. ) ?

agramfort : I would start with 2 extreme cases (high lambda or low lambda) for each n_samples >> n_features or n_samples << n_features

2.1 external reference implementations

glmnet
- glmnet-python ( version? )
- R glmnet package + rpy2 (latest version)
liblinear
- liblinear + python interface (latest version)

Questions:

How to time execution to achieve a fair comparison ?
Which glmnet interface should be used?

agramfort : I would start with rpy2 even you pay the price of a copy which is not really fair.

3 code performance monitoring

scikit-learn implementations speed development tracking
tracking of execution times for the scikit-learn implementations of (2) on data sets (1) over time

vbench
https://github.com/vene/scikit-learn-speed
http://code.google.com/p/unladen-swallow/

Questions:

Already some example code available for scikit-learn ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting up tests to benchmark current and future code

1 data sets

2 problems to benchmark

2.1 external reference implementations

3 code performance monitoring

Clone this wiki locally