Skip to content

Commit 2d2f926

Browse files
Merge pull request dmlc#679 from pommedeterresautee/master
Wording of R doc in new functions
2 parents a064100 + f761432 commit 2d2f926

File tree

8 files changed

+42
-54
lines changed

8 files changed

+42
-54
lines changed

R-package/R/xgb.create.features.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
#' @details
1515
#' This is the function inspired from the paragraph 3.1 of the paper:
1616
#'
17-
#' \strong{"Practical Lessons from Predicting Clicks on Ads at Facebook"}
17+
#' \strong{Practical Lessons from Predicting Clicks on Ads at Facebook}
1818
#'
1919
#' \emph{(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers,
2020
#' Joaquin Quiñonero Candela)}

R-package/R/xgb.importance.R

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
#' @details
2222
#' This is the function to understand the model trained (and through your model, your data).
2323
#'
24-
#' Results are returned for both linear and tree models.
24+
#' This function is for both linear and tree models.
2525
#'
2626
#' \code{data.table} is returned by the function.
2727
#' The columns are :
@@ -32,8 +32,9 @@
3232
#' \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees.
3333
#' }
3434
#'
35-
#' If you don't provide name, index of the features are used.
36-
#' They are extracted from the boost dump (made on the C++ side), the index starts at 0 (usual in C++) instead of 1 (usual in R).
35+
#' If you don't provide \code{feature_names}, index of the features will be used instead.
36+
#'
37+
#' Because the index is extracted from the model dump (made on the C++ side), it starts at 0 (usual in C++) instead of 1 (usual in R).
3738
#'
3839
#' Co-occurence count
3940
#' ------------------
@@ -47,10 +48,6 @@
4748
#' @examples
4849
#' data(agaricus.train, package='xgboost')
4950
#'
50-
#' # Both dataset are list with two items, a sparse matrix and labels
51-
#' # (labels = outcome column which will be learned).
52-
#' # Each column of the sparse Matrix is a feature in one hot encoding format.
53-
#'
5451
#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2,
5552
#' eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
5653
#'
@@ -114,8 +111,6 @@ xgb.importance <- function(feature_names = NULL, model = NULL, data = NULL, labe
114111
result
115112
}
116113

117-
118-
119114
# Avoid error messages during CRAN check.
120115
# The reason is that these variables are never declared
121116
# They are mainly column names inferred by Data.table...

R-package/R/xgb.model.dt.tree.R

Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
#' Convert tree model dump to data.table
1+
#' Parse boosted tree model text dump
22
#'
3-
#' Read a tree model text dump and return a data.table.
3+
#' Parse a boosted tree model text dump and return a \code{data.table}.
44
#'
55
#' @importFrom data.table data.table
66
#' @importFrom data.table set
@@ -13,17 +13,19 @@
1313
#' @importFrom stringr str_extract
1414
#' @importFrom stringr str_split
1515
#' @importFrom stringr str_trim
16-
#' @param feature_names names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.
17-
#' @param model dump generated by the \code{xgb.train} function. Avoid the creation of a dump file.
18-
#' @param text dump generated by the \code{xgb.dump} function. Avoid the creation of a dump file. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}).
19-
#' @param n_first_tree limit the plot to the n first trees. If \code{NULL}, all trees of the model are plotted. Performance can be low for huge models.
16+
#' @param feature_names names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If the model already contains feature names, this argument should be \code{NULL} (default value).
17+
#' @param model object created by the \code{xgb.train} function.
18+
#' @param text \code{character} vector generated by the \code{xgb.dump} function. Model dump must include the gain per feature and per tree (parameter \code{with.stats = TRUE} in function \code{xgb.dump}).
19+
#' @param n_first_tree limit the plot to the \code{n} first trees. If set to \code{NULL}, all trees of the model are plotted. Performance can be low depending of the size of the model.
2020
#'
21-
#' @return A \code{data.table} of the features used in the model with their gain, cover and few other thing.
21+
#' @return A \code{data.table} of the features used in the model with their gain, cover and few other information.
2222
#'
2323
#' @details
24-
#' General function to convert a text dump of tree model to a Matrix. The purpose is to help user to explore the model and get a better understanding of it.
24+
#' General function to convert a text dump of tree model to a \code{data.table}.
2525
#'
26-
#' The content of the \code{data.table} is organised that way:
26+
#' The purpose is to help user to explore the model and get a better understanding of it.
27+
#'
28+
#' The columns of the \code{data.table} are:
2729
#'
2830
#' \itemize{
2931
#' \item \code{ID}: unique identifier of a node ;
@@ -35,21 +37,16 @@
3537
#' \item \code{Quality}: it's the gain related to the split in this specific node ;
3638
#' \item \code{Cover}: metric to measure the number of observation affected by the split ;
3739
#' \item \code{Tree}: ID of the tree. It is included in the main ID ;
38-
#' \item \code{Yes.X} or \code{No.X}: data related to the pointer in \code{Yes} or \code{No} column ;
40+
#' \item \code{Yes.Feature}, \code{No.Feature}, \code{Yes.Cover}, \code{No.Cover}, \code{Yes.Quality} and \code{No.Quality}: data related to the pointer in \code{Yes} or \code{No} column ;
3941
#' }
4042
#'
4143
#' @examples
4244
#' data(agaricus.train, package='xgboost')
4345
#'
44-
#' #Both dataset are list with two items, a sparse matrix and labels
45-
#' #(labels = outcome column which will be learned).
46-
#' #Each column of the sparse Matrix is a feature in one hot encoding format.
47-
#' train <- agaricus.train
48-
#'
49-
#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
46+
#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2,
5047
#' eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
5148
#'
52-
#' #agaricus.test$data@@Dimnames[[2]] represents the column names of the sparse matrix.
49+
#' # agaricus.train$data@@Dimnames[[2]] represents the column names of the sparse matrix.
5350
#' xgb.model.dt.tree(feature_names = agaricus.train$data@@Dimnames[[2]], model = bst)
5451
#'
5552
#' @export

R-package/R/xgb.plot.deepness.R

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ get.paths.to.leaf <- function(dt.tree) {
7676
#' @details
7777
#' Display both the number of \code{leaf} and the distribution of \code{weighted observations}
7878
#' by tree deepness level.
79+
#'
7980
#' The purpose of this function is to help the user to find the best trade-off to set
8081
#' the \code{max.depth} and \code{min_child_weight} parameters according to the bias / variance trade-off.
8182
#'
@@ -88,7 +89,7 @@ get.paths.to.leaf <- function(dt.tree) {
8889
#' \item Weighted cover: noramlized weighted cover per Leaf (weighted number of instances).
8990
#' }
9091
#'
91-
#' This function is inspired by this blog post \url{http://aysent.github.io/2015/11/08/random-forest-leaf-visualization.html}
92+
#' This function is inspired by the blog post \url{http://aysent.github.io/2015/11/08/random-forest-leaf-visualization.html}
9293
#'
9394
#' @examples
9495
#' data(agaricus.train, package='xgboost')

R-package/man/xgb.create.features.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

R-package/man/xgb.importance.Rd

Lines changed: 4 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

R-package/man/xgb.model.dt.tree.Rd

Lines changed: 14 additions & 17 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

R-package/man/xgb.plot.deepness.Rd

Lines changed: 2 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)