Skip to content

Commit

Permalink
Merge pull request #6 from dmlc/master
Browse files Browse the repository at this point in the history
update
  • Loading branch information
yanqingmen committed Dec 18, 2015
2 parents 3453b6e + 4a15939 commit f378fac
Show file tree
Hide file tree
Showing 172 changed files with 4,068 additions and 1,709 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Expand Up @@ -66,3 +66,8 @@ java/xgboost4j-demo/tmp/
java/xgboost4j-demo/model/
nb-configuration*
dmlc-core
# Eclipse
.project
.cproject
.pydevproject
.settings/
25 changes: 18 additions & 7 deletions CHANGES.md
Expand Up @@ -37,11 +37,22 @@ xgboost-0.4

on going at master
==================
* Fix List
- Fixed possible problem of poisson regression for R.
* Python module now throw exception instead of crash terminal when a parameter error happens.
* Python module now has importance plot and tree plot functions.
* Changes in R library
- fixed possible problem of poisson regression.
- switched from 0 to NA for missing values.
- exposed access to additional model parameters.
* Changes in Python library
- throws exception instead of crash terminal when a parameter error happens.
- has importance plot and tree plot functions.
- accepts different learning rates for each boosting round.
- allows model training continuation from previously saved model.
- allows early stopping in CV.
- allows feval to return a list of tuples.
- allows eval_metric to handle additional format.
- improved compatibility in sklearn module.
- additional parameters added for sklearn wrapper.
- added pip installation functionality.
- supports more Pandas DataFrame dtypes.
- added best_ntree_limit attribute, in addition to best_score and best_iteration.
* Java api is ready for use
* Added more test cases and continuous integration to make each build more robust
* Improvements in sklearn compatible module
* Added pip installation functionality for python module
* Added more test cases and continuous integration to make each build more robust.
9 changes: 8 additions & 1 deletion CONTRIBUTORS.md
Expand Up @@ -13,6 +13,8 @@ Committers are people who have made substantial contribution to the project and
- Bing is the original creator of xgboost python package and currently the maintainer of [XGBoost.jl](https://github.com/antinucleon/XGBoost.jl).
* [Michael Benesty](https://github.com/pommedeterresautee)
- Micheal is a lawyer, data scientist in France, he is the creator of xgboost interactive analysis module in R.
* [Yuan Tang](https://github.com/terrytangyuan)
- Yuan is a data scientist in Chicago, US. He contributed mostly in R and Python packages.

Become a Comitter
-----------------
Expand All @@ -34,7 +36,6 @@ List of Contributors
* [Zygmunt Zając](https://github.com/zygmuntz)
- Zygmunt is the master behind the early stopping feature frequently used by kagglers.
* [Ajinkya Kale](https://github.com/ajkl)
* [Yuan Tang](https://github.com/terrytangyuan)
* [Boliang Chen](https://github.com/cblsjtu)
* [Vadim Khotilovich](https://github.com/khotilov)
* [Yangqing Men](https://github.com/yanqingmen)
Expand All @@ -49,4 +50,10 @@ List of Contributors
- Masaaki is the initial creator of xgboost python plotting module.
* [Hongliang Liu](https://github.com/phunterlau)
- Hongliang is the maintainer of xgboost python PyPI package for pip installation.
* [daiyl0320](https://github.com/daiyl0320)
- daiyl0320 contributed patch to xgboost distributed version more robust, and scales stably on TB scale datasets.
* [Huayi Zhang](https://github.com/irachex)
* [Johan Manders](https://github.com/johanmanders)
* [yoori](https://github.com/yoori)
* [Mathias Müller](https://github.com/far0n)
* [Sam Thomson](https://github.com/sammthomson)
6 changes: 3 additions & 3 deletions Makefile
Expand Up @@ -177,19 +177,19 @@ Rcheck:
R CMD check --as-cran xgboost*.tar.gz

pythonpack:
#make clean
#for pip maintainer only
cd subtree/rabit;make clean;cd ..
rm -rf xgboost-deploy xgboost*.tar.gz
cp -r python-package xgboost-deploy
cp *.md xgboost-deploy/
#cp *.md xgboost-deploy/
cp LICENSE xgboost-deploy/
cp Makefile xgboost-deploy/xgboost
cp -r wrapper xgboost-deploy/xgboost
cp -r subtree xgboost-deploy/xgboost
cp -r multi-node xgboost-deploy/xgboost
cp -r windows xgboost-deploy/xgboost
cp -r src xgboost-deploy/xgboost

cp python-package/setup_pip.py xgboost-deploy/setup.py
#make python

pythonbuild:
Expand Down
28 changes: 15 additions & 13 deletions R-package/DESCRIPTION
Expand Up @@ -3,33 +3,35 @@ Type: Package
Title: Extreme Gradient Boosting
Version: 0.4-2
Date: 2015-08-01
Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>, Michael Benesty <michael@benesty.fr>
Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>,
Michael Benesty <michael@benesty.fr>
Maintainer: Tong He <hetong007@gmail.com>
Description: Extreme Gradient Boosting, which is an
efficient implementation of gradient boosting framework.
This package is its R interface. The package includes efficient
linear model solver and tree learning algorithms. The package can automatically
do parallel computation on a single machine which could be more than 10 times faster
than existing gradient boosting packages. It supports various
objective functions, including regression, classification and ranking. The
package is made to be extensible, so that users are also allowed to define
Description: Extreme Gradient Boosting, which is an efficient implementation
of gradient boosting framework. This package is its R interface. The package
includes efficient linear model solver and tree learning algorithms. The package
can automatically do parallel computation on a single machine which could be
more than 10 times faster than existing gradient boosting packages. It supports
various objective functions, including regression, classification and ranking.
The package is made to be extensible, so that users are also allowed to define
their own objectives easily.
License: Apache License (== 2.0) | file LICENSE
URL: https://github.com/dmlc/xgboost
BugReports: https://github.com/dmlc/xgboost/issues
VignetteBuilder: knitr
Suggests:
knitr,
ggplot2 (>= 1.0.0),
DiagrammeR (>= 0.6),
ggplot2 (>= 1.0.1),
DiagrammeR (>= 0.8.1),
Ckmeans.1d.dp (>= 3.3.1),
vcd (>= 1.3),
testthat
testthat,
igraph (>= 1.0.1)
Depends:
R (>= 2.10)
Imports:
Matrix (>= 1.1-0),
methods,
data.table (>= 1.9.4),
data.table (>= 1.9.6),
magrittr (>= 1.5),
stringr (>= 0.6.2)
RoxygenNote: 5.0.1
7 changes: 6 additions & 1 deletion R-package/NAMESPACE
@@ -1,16 +1,19 @@
# Generated by roxygen2 (4.1.1): do not edit by hand
# Generated by roxygen2: do not edit by hand

export(getinfo)
export(setinfo)
export(slice)
export(xgb.DMatrix)
export(xgb.DMatrix.save)
export(xgb.create.features)
export(xgb.cv)
export(xgb.dump)
export(xgb.importance)
export(xgb.load)
export(xgb.model.dt.tree)
export(xgb.plot.deepness)
export(xgb.plot.importance)
export(xgb.plot.multi.trees)
export(xgb.plot.tree)
export(xgb.save)
export(xgb.save.raw)
Expand All @@ -23,6 +26,7 @@ importClassesFrom(Matrix,dgCMatrix)
importClassesFrom(Matrix,dgeMatrix)
importFrom(Matrix,cBind)
importFrom(Matrix,colSums)
importFrom(Matrix,sparse.model.matrix)
importFrom(Matrix,sparseVector)
importFrom(data.table,":=")
importFrom(data.table,as.data.table)
Expand All @@ -35,6 +39,7 @@ importFrom(data.table,setnames)
importFrom(magrittr,"%>%")
importFrom(magrittr,add)
importFrom(magrittr,not)
importFrom(stringr,str_detect)
importFrom(stringr,str_extract)
importFrom(stringr,str_extract_all)
importFrom(stringr,str_match)
Expand Down
6 changes: 2 additions & 4 deletions R-package/R/getinfo.xgb.DMatrix.R
Expand Up @@ -23,7 +23,6 @@ setClass('xgb.DMatrix')
#' stopifnot(all(labels2 == 1-labels))
#' @rdname getinfo
#' @export
#'
getinfo <- function(object, ...){
UseMethod("getinfo")
}
Expand All @@ -35,15 +34,15 @@ getinfo <- function(object, ...){
#' @param ... other parameters
#' @rdname getinfo
#' @method getinfo xgb.DMatrix
setMethod("getinfo", signature = "xgb.DMatrix",
setMethod("getinfo", signature = "xgb.DMatrix",
definition = function(object, name) {
if (typeof(name) != "character") {
stop("xgb.getinfo: name must be character")
}
if (class(object) != "xgb.DMatrix") {
stop("xgb.setinfo: first argument dtrain must be xgb.DMatrix")
}
if (name != "label" && name != "weight" &&
if (name != "label" && name != "weight" &&
name != "base_margin" && name != "nrow") {
stop(paste("xgb.getinfo: unknown info name", name))
}
Expand All @@ -54,4 +53,3 @@ setMethod("getinfo", signature = "xgb.DMatrix",
}
return(ret)
})

27 changes: 16 additions & 11 deletions R-package/R/predict.xgb.Booster.R
Expand Up @@ -20,6 +20,17 @@ setClass("xgb.Booster",
#' only valid for gbtree, but not for gblinear. set it to be value bigger
#' than 0. It will use all trees by default.
#' @param predleaf whether predict leaf index instead. If set to TRUE, the output will be a matrix object.
#'
#' @details
#' The option \code{ntreelimit} purpose is to let the user train a model with lots
#' of trees but use only the first trees for prediction to avoid overfitting
#' (without having to train a new model with less trees).
#'
#' The option \code{predleaf} purpose is inspired from §3.1 of the paper
#' \code{Practical Lessons from Predicting Clicks on Ads at Facebook}.
#' The idea is to use the model as a generator of new features which capture non linear link
#' from original features.
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
Expand All @@ -29,21 +40,16 @@ setClass("xgb.Booster",
#' eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
#' pred <- predict(bst, test$data)
#' @export
#'
setMethod("predict", signature = "xgb.Booster",
definition = function(object, newdata, missing = NULL,
setMethod("predict", signature = "xgb.Booster",
definition = function(object, newdata, missing = NA,
outputmargin = FALSE, ntreelimit = NULL, predleaf = FALSE) {
if (class(object) != "xgb.Booster"){
stop("predict: model in prediction must be of class xgb.Booster")
} else {
object <- xgb.Booster.check(object, saveraw = FALSE)
}
if (class(newdata) != "xgb.DMatrix") {
if (is.null(missing)) {
newdata <- xgb.DMatrix(newdata)
} else {
newdata <- xgb.DMatrix(newdata, missing = missing)
}
newdata <- xgb.DMatrix(newdata, missing = missing)
}
if (is.null(ntreelimit)) {
ntreelimit <- 0
Expand All @@ -52,14 +58,14 @@ setMethod("predict", signature = "xgb.Booster",
stop("predict: ntreelimit must be equal to or greater than 1")
}
}
option = 0
option <- 0
if (outputmargin) {
option <- option + 1
}
if (predleaf) {
option <- option + 2
}
ret <- .Call("XGBoosterPredict_R", object$handle, newdata, as.integer(option),
ret <- .Call("XGBoosterPredict_R", object$handle, newdata, as.integer(option),
as.integer(ntreelimit), PACKAGE = "xgboost")
if (predleaf){
len <- getinfo(newdata, "nrow")
Expand All @@ -72,4 +78,3 @@ setMethod("predict", signature = "xgb.Booster",
}
return(ret)
})

9 changes: 4 additions & 5 deletions R-package/R/predict.xgb.Booster.handle.R
Expand Up @@ -5,15 +5,14 @@
#' @param object Object of class "xgb.Boost.handle"
#' @param ... Parameters pass to \code{predict.xgb.Booster}
#'
setMethod("predict", signature = "xgb.Booster.handle",
setMethod("predict", signature = "xgb.Booster.handle",
definition = function(object, ...) {
if (class(object) != "xgb.Booster.handle"){
stop("predict: model in prediction must be of class xgb.Booster.handle")
}

bst <- xgb.handleToBooster(object)
ret = predict(bst, ...)

ret <- predict(bst, ...)
return(ret)
})

3 changes: 1 addition & 2 deletions R-package/R/setinfo.xgb.DMatrix.R
Expand Up @@ -21,7 +21,6 @@
#' stopifnot(all(labels2 == 1-labels))
#' @rdname setinfo
#' @export
#'
setinfo <- function(object, ...){
UseMethod("setinfo")
}
Expand All @@ -32,7 +31,7 @@ setinfo <- function(object, ...){
#' @param ... other parameters
#' @rdname setinfo
#' @method setinfo xgb.DMatrix
setMethod("setinfo", signature = "xgb.DMatrix",
setMethod("setinfo", signature = "xgb.DMatrix",
definition = function(object, name, info) {
xgb.setinfo(object, name, info)
})
11 changes: 5 additions & 6 deletions R-package/R/slice.xgb.DMatrix.R
Expand Up @@ -13,7 +13,6 @@ setClass('xgb.DMatrix')
#' dsub <- slice(dtrain, 1:3)
#' @rdname slice
#' @export
#'
slice <- function(object, ...){
UseMethod("slice")
}
Expand All @@ -23,19 +22,19 @@ slice <- function(object, ...){
#' @param ... other parameters
#' @rdname slice
#' @method slice xgb.DMatrix
setMethod("slice", signature = "xgb.DMatrix",
setMethod("slice", signature = "xgb.DMatrix",
definition = function(object, idxset, ...) {
if (class(object) != "xgb.DMatrix") {
stop("slice: first argument dtrain must be xgb.DMatrix")
}
ret <- .Call("XGDMatrixSliceDMatrix_R", object, idxset,
ret <- .Call("XGDMatrixSliceDMatrix_R", object, idxset,
PACKAGE = "xgboost")

attr_list <- attributes(object)
nr <- xgb.numrow(object)
len <- sapply(attr_list,length)
ind <- which(len==nr)
if (length(ind)>0) {
ind <- which(len == nr)
if (length(ind) > 0) {
nms <- names(attr_list)[ind]
for (i in 1:length(ind)) {
attr(ret,nms[i]) <- attr(object,nms[i])[idxset]
Expand Down

0 comments on commit f378fac

Please sign in to comment.