Skip to content

Commit

Permalink
Merge pull request #4 from dmlc/master
Browse files Browse the repository at this point in the history
update
  • Loading branch information
yanqingmen committed Aug 26, 2015
2 parents 969ea57 + c4fa2f6 commit 34f0b31
Show file tree
Hide file tree
Showing 77 changed files with 2,591 additions and 849 deletions.
7 changes: 3 additions & 4 deletions .gitignore
Expand Up @@ -48,10 +48,9 @@ Debug
*.cpage.col
*.cpage
*.Rproj
xgboost
xgboost.mpi
xgboost.mock
train*
./xgboost
./xgboost.mpi
./xgboost.mock
rabit
#.Rbuildignore
R-package.Rproj
Expand Down
17 changes: 13 additions & 4 deletions .travis.yml
@@ -1,16 +1,26 @@
sudo: true

# Enabling test on Linux and OS X
os:
- linux
- osx

# Use Build Matrix to do lint and build seperately
env:
matrix:
- TASK=lint LINT_LANG=cpp
- TASK=lint LINT_LANG=python
- TASK=R-package CXX=g++
- TASK=python-package CXX=g++
- TASK=python-package3 CXX=g++
- TASK=java-package CXX=g++
- TASK=build CXX=g++
- TASK=build-with-dmlc CXX=g++

os:
- linux
- osx

# dependent apt packages
addons:
apt:
Expand All @@ -20,19 +30,18 @@ addons:
- wget
- libcurl4-openssl-dev
- unzip
- python-numpy
- python-scipy
- python-nose

before_install:
- scripts/travis_osx_install.sh
- git clone https://github.com/dmlc/dmlc-core
- export TRAVIS=dmlc-core/scripts/travis/
- export PYTHONPATH=${PYTHONPATH}:${PWD}/wrapper
- export PYTHONPATH=${PYTHONPATH}:${PWD}/python-package
- source ${TRAVIS}/travis_setup_env.sh

install:
- pip install cpplint pylint --user `whoami`


script: scripts/travis_script.sh


Expand Down
21 changes: 16 additions & 5 deletions CHANGES.md
@@ -1,18 +1,18 @@
Change Log
=====
==========

xgboost-0.1
=====
===========
* Initial release

xgboost-0.2x
=====
============
* Python module
* Weighted samples instances
* Initial version of pairwise rank

xgboost-0.3
=====
===========
* Faster tree construction module
- Allows subsample columns during tree construction via ```bst:col_samplebytree=ratio```
* Support for boosting from initial predictions
Expand All @@ -22,7 +22,7 @@ xgboost-0.3
* Add R module

xgboost-0.4
=====
===========
* Distributed version of xgboost that runs on YARN, scales to billions of examples
* Direct save/load data and model from/to S3 and HDFS
* Feature importance visualization in R module, by Michael Benesty
Expand All @@ -34,3 +34,14 @@ xgboost-0.4
- xgboost python model is now pickable
* sklearn wrapper is supported in python module
* Experimental External memory version

on going at master
==================
* Fix List
- Fixed possible problem of poisson regression for R.
* Python module now throw exception instead of crash terminal when a parameter error happens.
* Python module now has importance plot and tree plot functions.
* Java api is ready for use
* Added more test cases and continuous integration to make each build more robust
* Improvements in sklearn compatible module
* Added pip installation functionality for python module
50 changes: 50 additions & 0 deletions CONTRIBUTORS.md
@@ -0,0 +1,50 @@
Contributors of DMLC/XGBoost
============================
XGBoost has been developed and used by a group of active community. Everyone is more than welcomed to is a great way to make the project better and more accessible to more users.

Comitters
---------
Committers are people who have made substantial contribution to the project and granted write access to the project.
* [Tianqi Chen](https://github.com/tqchen), University of Washington
- Tianqi is a PhD working on large-scale machine learning, he is the creator of the project.
* [Tong He](https://github.com/hetong007), Simon Fraser University
- Tong is a master student working on data mining, he is the maintainer of xgboost R package.
* [Bing Xu](https://github.com/antinucleon)
- Bing is the original creator of xgboost python package and currently the maintainer of [XGBoost.jl](https://github.com/antinucleon/XGBoost.jl).
* [Michael Benesty](https://github.com/pommedeterresautee)
- Micheal is a lawyer, data scientist in France, he is the creator of xgboost interactive analysis module in R.

Become a Comitter
-----------------
XGBoost is a opensource project and we are actively looking for new comitters who are willing to help maintaining and lead the project.
Committers comes from contributors who:
* Made substantial contribution to the project.
* Willing to spent time on maintaining and lead the project.

New committers will be proposed by current comitter memembers, with support from more than two of current comitters.

List of Contributors
--------------------
* [Full List of Contributors](https://github.com/dmlc/xgboost/graphs/contributors)
- To contributors: please add your name to the list when you submit a patch to the project:)
* [Kailong Chen](https://github.com/kalenhaha)
- Kailong is an early contributor of xgboost, he is creator of ranking objectives in xgboost.
* [Skipper Seabold](https://github.com/jseabold)
- Skipper is the major contributor to the scikit-learn module of xgboost.
* [Zygmunt Zając](https://github.com/zygmuntz)
- Zygmunt is the master behind the early stopping feature frequently used by kagglers.
* [Ajinkya Kale](https://github.com/ajkl)
* [Boliang Chen](https://github.com/cblsjtu)
* [Vadim Khotilovich](https://github.com/khotilov)
* [Yangqing Men](https://github.com/yanqingmen)
- Yangqing is the creator of xgboost java package.
* [Engpeng Yao](https://github.com/yepyao)
* [Giulio](https://github.com/giuliohome)
- Giulio is the creator of windows project of xgboost
* [Jamie Hall](https://github.com/nerdcha)
- Jamie is the initial creator of xgboost sklearn modue.
* [Yen-Ying Lee](https://github.com/white1033)
* [Masaaki Horikoshi](https://github.com/sinhrks)
- Masaaki is the initial creator of xgboost python plotting module.
* [Hongliang Liu](https://github.com/phunterlau)
- Hongliang is the maintainer of xgboost python PyPI package for pip installation.
33 changes: 32 additions & 1 deletion Makefile
@@ -1,4 +1,5 @@
export CC = gcc
#build on the fly
export CXX = g++
export MPICXX = mpicxx
export LDFLAGS= -pthread -lm
Expand All @@ -11,6 +12,12 @@ ifeq ($(OS), Windows_NT)
export CC = gcc -m64
endif

UNAME= $(shell uname)

ifeq ($(UNAME), Linux)
LDFLAGS += -lrt
endif

ifeq ($(no_omp),1)
CFLAGS += -DDISABLE_OPENMP
else
Expand Down Expand Up @@ -161,9 +168,33 @@ Rcheck:
make Rbuild
R CMD check --as-cran xgboost*.tar.gz

pythonpack:
#make clean
cd subtree/rabit;make clean;cd ..
rm -rf xgboost-deploy xgboost*.tar.gz
cp -r python-package xgboost-deploy
cp *.md xgboost-deploy/
cp LICENSE xgboost-deploy/
cp Makefile xgboost-deploy/xgboost
cp -r wrapper xgboost-deploy/xgboost
cp -r subtree xgboost-deploy/xgboost
cp -r multi-node xgboost-deploy/xgboost
cp -r windows xgboost-deploy/xgboost
cp -r src xgboost-deploy/xgboost

#make python

pythonbuild:
make pythonpack
python setup.py install

pythoncheck:
make pythonbuild
python -c 'import xgboost;print xgboost.core.find_lib_path()'

# lint requires dmlc to be in current folder
lint:
dmlc-core/scripts/lint.py xgboost $(LINT_LANG) src wrapper R-package
dmlc-core/scripts/lint.py xgboost $(LINT_LANG) src wrapper R-package python-package

clean:
$(RM) -rf $(OBJ) $(BIN) $(MPIBIN) $(MPIOBJ) $(SLIB) *.o */*.o */*/*.o *~ */*~ */*/*~
Expand Down
1 change: 1 addition & 0 deletions R-package/.Rbuildignore
Expand Up @@ -3,3 +3,4 @@
\.dll$
^.*\.Rproj$
^\.Rproj\.user$
README.md
16 changes: 8 additions & 8 deletions R-package/DESCRIPTION
@@ -1,16 +1,16 @@
Package: xgboost
Type: Package
Title: eXtreme Gradient Boosting
Version: 0.4-0
Date: 2015-05-11
Title: Extreme Gradient Boosting
Version: 0.4-2
Date: 2015-08-01
Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>, Michael Benesty <michael@benesty.fr>
Maintainer: Tong He <hetong007@gmail.com>
Description: Xgboost is short for eXtreme Gradient Boosting, which is an
efficient and scalable implementation of gradient boosting framework.
This package is an R wrapper of xgboost. The package includes efficient
Description: Extreme Gradient Boosting, which is an
efficient implementation of gradient boosting framework.
This package is its R interface. The package includes efficient
linear model solver and tree learning algorithms. The package can automatically
do parallel computation with OpenMP, and it can be more than 10 times faster
than existing gradient boosting packages such as gbm. It supports various
do parallel computation on a single machine which could be more than 10 times faster
than existing gradient boosting packages. It supports various
objective functions, including regression, classification and ranking. The
package is made to be extensible, so that users are also allowed to define
their own objectives easily.
Expand Down
4 changes: 2 additions & 2 deletions R-package/R/utils.R
Expand Up @@ -288,7 +288,7 @@ xgb.cv.aggcv <- function(res, showsd = TRUE) {
}
ret <- paste(ret, sprintf("%f", mean(stats)), sep="")
if (showsd) {
ret <- paste(ret, sprintf("+%f", sd(stats)), sep="")
ret <- paste(ret, sprintf("+%f", stats::sd(stats)), sep="")
}
}
return (ret)
Expand All @@ -313,7 +313,7 @@ xgb.createFolds <- function(y, k = 10)
if(cuts < 2) cuts <- 2
if(cuts > 5) cuts <- 5
y <- cut(y,
unique(quantile(y, probs = seq(0, 1, length = cuts))),
unique(stats::quantile(y, probs = seq(0, 1, length = cuts))),
include.lowest = TRUE)
}

Expand Down
2 changes: 1 addition & 1 deletion R-package/R/xgb.cv.R
Expand Up @@ -240,7 +240,7 @@ xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing =
else colnames <- colnamesMean

type <- rep(x = "numeric", times = length(colnames))
dt <- read.table(text = "", colClasses = type, col.names = colnames) %>% as.data.table
dt <- utils::read.table(text = "", colClasses = type, col.names = colnames) %>% as.data.table
split <- str_split(string = history, pattern = "\t")

for(line in split) dt <- line[2:length(line)] %>% str_extract_all(pattern = "\\d*\\.+\\d*") %>% unlist %>% as.numeric %>% as.list %>% {rbindlist(list(dt, .), use.names = F, fill = F)}
Expand Down
37 changes: 18 additions & 19 deletions R-package/R/xgb.model.dt.tree.R
Expand Up @@ -133,34 +133,33 @@ xgb.model.dt.tree <- function(feature_names = NULL, filename_dump = NULL, model
allTrees <- rbindlist(list(allTrees, dt), use.names = T, fill = F)
}

yes <- allTrees[!is.na(Yes),Yes]
set(allTrees, i = which(allTrees[,Feature]!= "Leaf"),
yes <- allTrees[!is.na(Yes), Yes]

set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
j = "Yes.Feature",
value = allTrees[ID == yes,Feature])

set(allTrees, i = which(allTrees[,Feature]!= "Leaf"),
value = allTrees[ID %in% yes, Feature])
set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
j = "Yes.Cover",
value = allTrees[ID == yes,Cover])

set(allTrees, i = which(allTrees[,Feature]!= "Leaf"),
j = "Yes.Quality",
value = allTrees[ID == yes,Quality])
value = allTrees[ID %in% yes, Cover])

no <- allTrees[!is.na(No),No]
set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
j = "Yes.Quality",
value = allTrees[ID %in% yes, Quality])
no <- allTrees[!is.na(No), No]

set(allTrees, i = which(allTrees[,Feature]!= "Leaf"),
set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
j = "No.Feature",
value = allTrees[ID == no,Feature])
value = allTrees[ID %in% no, Feature])

set(allTrees, i = which(allTrees[,Feature]!= "Leaf"),
set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
j = "No.Cover",
value = allTrees[ID == no,Cover])
value = allTrees[ID %in% no, Cover])

set(allTrees, i = which(allTrees[,Feature]!= "Leaf"),
set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
j = "No.Quality",
value = allTrees[ID == no,Quality])
value = allTrees[ID %in% no, Quality])

allTrees
}

Expand Down
4 changes: 2 additions & 2 deletions R-package/R/xgb.plot.importance.R
Expand Up @@ -33,7 +33,7 @@ xgb.plot.importance <- function(importance_matrix = NULL, numberOfClusters = c(1
if (!"data.table" %in% class(importance_matrix)) {
stop("importance_matrix: Should be a data.table.")
}
if (!require(ggplot2, quietly = TRUE)) {
if (!requireNamespace("ggplot2", quietly = TRUE)) {
stop("ggplot2 package is required for plotting the importance", call. = FALSE)
}
if (!requireNamespace("Ckmeans.1d.dp", quietly = TRUE)) {
Expand All @@ -46,7 +46,7 @@ xgb.plot.importance <- function(importance_matrix = NULL, numberOfClusters = c(1
clusters <- suppressWarnings(Ckmeans.1d.dp::Ckmeans.1d.dp(importance_matrix[,Gain], numberOfClusters))
importance_matrix[,"Cluster":=clusters$cluster %>% as.character]

plot <- ggplot(importance_matrix, aes(x=reorder(Feature, Gain), y = Gain, width= 0.05), environment = environment())+ geom_bar(aes(fill=Cluster), stat="identity", position="identity") + coord_flip() + xlab("Features") + ylab("Gain") + ggtitle("Feature importance") + theme(plot.title = element_text(lineheight=.9, face="bold"), panel.grid.major.y = element_blank() )
plot <- ggplot2::ggplot(importance_matrix, ggplot2::aes(x=stats::reorder(Feature, Gain), y = Gain, width= 0.05), environment = environment())+ ggplot2::geom_bar(ggplot2::aes(fill=Cluster), stat="identity", position="identity") + ggplot2::coord_flip() + ggplot2::xlab("Features") + ggplot2::ylab("Gain") + ggplot2::ggtitle("Feature importance") + ggplot2::theme(plot.title = ggplot2::element_text(lineheight=.9, face="bold"), panel.grid.major.y = ggplot2::element_blank() )

return(plot)
}
Expand Down
39 changes: 34 additions & 5 deletions R-package/README.md
@@ -1,15 +1,44 @@
# R package for xgboost.
R package for xgboost
=====================

## Installation
[![CRAN Status Badge](http://www.r-pkg.org/badges/version/xgboost)](http://cran.r-project.org/web/packages/xgboost)
[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/xgboost)](http://cran.rstudio.com/web/packages/xgboost/index.html)

For up-to-date version (which is recommended), please install from github. Windows user will need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first.
Installation
------------

We are [on CRAN](https://cran.r-project.org/web/packages/xgboost/index.html) now. For stable/pre-compiled(for Windows and OS X) version, please install from CRAN:

```r
devtools::install_github('dmlc/xgboost',subdir='R-package')
install.packages('xgboost')
```

For up-to-date version, please install from github. Windows user will need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first.

## Examples
```r
devtools::install_github('dmlc/xgboost',subdir='R-package')
```

Examples
--------

* Please visit [walk through example](demo).
* See also the [example scripts](../demo/kaggle-higgs) for Kaggle Higgs Challenge, including [speedtest script](../demo/kaggle-higgs/speedtest.R) on this dataset and the one related to [Otto challenge](../demo/kaggle-otto), including a [RMarkdown documentation](../demo/kaggle-otto/understandingXGBoostModel.Rmd).

Notes
-----

If you face an issue installing the package using ```devtools::install_github```, something like this (even after updating libxml and RCurl as lot of forums say) -

```
devtools::install_github('dmlc/xgboost',subdir='R-package')
Downloading github repo dmlc/xgboost@master
Error in function (type, msg, asError = TRUE) :
Peer certificate cannot be authenticated with given CA certificates
```
To get around this you can build the package locally as mentioned [here](https://github.com/dmlc/xgboost/issues/347) -
```
1. Clone the current repository and set your workspace to xgboost/R-package/
2. Run R CMD INSTALL --build . in terminal to get the tarball.
3. Run install.packages('path_to_the_tarball',repo=NULL) in R to install.
```

0 comments on commit 34f0b31

Please sign in to comment.