Xgbengine

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

XGBoost for classification or regression using the XGBoost package. XGBENGINE is a lower level function. Users are recommended to instead use the functions xgb for regression or xgbda for classification.

Synopsis

model = (x,y,options)
cv = (x,y,options)
pred = (x,y,model,options)

Description

Gradient Boosted Tree Ensemble for classification or regression. xgbTrain uses the XGBoost package to train or apply an XGB model or return cross validation accuracy based on training data.

Cross-validation search for optimal parameter values is triggered by passing ranges for the eta, max_depth, or num_round parameters.

XGBENGINE is implemented using the XGBoost XGBoost package. User-specified values are used for XGBoost parameters (see options above). See XGBoost Parameters for further details of these options.

Inputs

  • x = X-block (predictor block) class "double".
  • y = Y-block (predicted block) class "double" is a vector of length m indicating sample class or target value.
  • model = XGB (Java) model produced by previous xgbengine training run.

Outputs

  • model = XGBoost Java model (if not run in cross-validation mode).
  • pred = structure array with predictions
  • valid = structure array with predictions

Options

options = a structure array with the following fields:

  • xgbtype : [ 'xgbr' | 'xgbc' ] Type of XGB to apply.
  • n : [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the XGB model. 'pca' uses a simple PCA model to compress the information. 'pls' uses either a pls or plsda model (depending on the xgbtype). Compression can make the XGB more stable and less prone to overfitting.
  • compressncomp : [ 1 ] Number of latent variables (or principal components to include in the compression model.
  • compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance corrected scores from compression model.
  • compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance correctedscores from compression model.
  • cvtimelimit : Set a time limit (seconds) on individual cross-validationsub-calculation when searching over supplied XGB parameter ranges for optimal parameters. Only relevant if parameter ranges are used for XGB parameters such as eta, num_round,or max_depth. Default is 10 seconds;A second time limit = 30*cvtimelimit is applied to any xgb calibration calculation which is not part ofcross-validation.
  • cvi : { { 'rnd' 5 } } Standard cross-validation cell (see crossval)defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values.Alternatively, can be a vector with the same number of elements as x has rows with integer values indicating CV subsets (see crossval).
  • eta : [{0.1}] Value(s) to use for XGBoost 'eta' parameter. Eta controls the learning rate of the gradient boosting.Values in range (0,1].
  • max_depth : [{6}] Value(s) to use for XGBoost 'max_depth' parameter. Specifies the maximum depth allowed for the decision trees.
  • num_round : [{500}] Value(s) to use for XGBoost 'num_round' parameter. Specifies how many rounds of tree creation to perform.
  • strictthreshold : [0.5] Probability threshold for assigning a sample to a class. Affects model.classification.inclass.
  • predictionrule : { {'mostprobable'} | 'strict' ] governs which classification prediction statistics appear first in the confusion matrix and confusion table summaries.

See Also

analysis, browse, knn, lwr, pls, plsda, xgb, xgbengine