Xgbengine: Difference between revisions
imported>Donal |
|||
(One intermediate revision by one other user not shown) | |||
Line 14: | Line 14: | ||
Cross-validation search for optimal parameter values is triggered by passing ranges for the eta, max_depth, or num_round parameters. | Cross-validation search for optimal parameter values is triggered by passing ranges for the eta, max_depth, or num_round parameters. | ||
XGBENGINE is implemented using the XGBoost [https://xgboost.readthedocs.io XGBoost] package. User-specified values are used for XGBoost parameters (see ''options'' above). See [https://xgboost.readthedocs.io/en/latest/parameter.html XGBoost Parameters] for further details of these options. | |||
'''Note: The PLS_Toolbox Python virtual environment must be configured in order to use this method. Find out more here: [[Python configuration]].''' | |||
'''At this time, one cannot terminate Python methods from building by the conventional CTRL+C. Please take this into account and mind the workspace when using this method.''' | |||
====Inputs==== | ====Inputs==== |
Latest revision as of 10:48, 27 August 2024
Purpose
XGBoost for classification or regression using the XGBoost package. XGBENGINE is a lower level function. Users are recommended to instead use the functions xgb for regression or xgbda for classification.
Synopsis
- model = (x,y,options)
- cv = (x,y,options)
- pred = (x,y,model,options)
Description
Gradient Boosted Tree Ensemble for classification or regression. xgbTrain uses the XGBoost package to train or apply an XGB model or return cross validation accuracy based on training data.
Cross-validation search for optimal parameter values is triggered by passing ranges for the eta, max_depth, or num_round parameters.
XGBENGINE is implemented using the XGBoost XGBoost package. User-specified values are used for XGBoost parameters (see options above). See XGBoost Parameters for further details of these options.
Note: The PLS_Toolbox Python virtual environment must be configured in order to use this method. Find out more here: Python configuration. At this time, one cannot terminate Python methods from building by the conventional CTRL+C. Please take this into account and mind the workspace when using this method.
Inputs
- x = X-block (predictor block) class "double".
- y = Y-block (predicted block) class "double" is a vector of length m indicating sample class or target value.
- model = XGB (Java) model produced by previous xgbengine training run.
Outputs
- model = XGBoost Java model (if not run in cross-validation mode).
- pred = structure array with predictions
- valid = structure array with predictions
Options
options = a structure array with the following fields:
- xgbtype : [ 'xgbr' | 'xgbc' ] Type of XGB to apply.
- n : [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the XGB model. 'pca' uses a simple PCA model to compress the information. 'pls' uses either a pls or plsda model (depending on the xgbtype). Compression can make the XGB more stable and less prone to overfitting.
- compressncomp : [ 1 ] Number of latent variables (or principal components to include in the compression model.
- compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance corrected scores from compression model.
- compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance correctedscores from compression model.
- cvtimelimit : Set a time limit (seconds) on individual cross-validationsub-calculation when searching over supplied XGB parameter ranges for optimal parameters. Only relevant if parameter ranges are used for XGB parameters such as eta, num_round,or max_depth. Default is 10 seconds;A second time limit = 30*cvtimelimit is applied to any xgb calibration calculation which is not part ofcross-validation.
- cvi : { { 'rnd' 5 } } Standard cross-validation cell (see crossval)defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values.Alternatively, can be a vector with the same number of elements as x has rows with integer values indicating CV subsets (see crossval).
- eta : [{0.1}] Value(s) to use for XGBoost 'eta' parameter. Eta controls the learning rate of the gradient boosting.Values in range (0,1].
- max_depth : [{6}] Value(s) to use for XGBoost 'max_depth' parameter. Specifies the maximum depth allowed for the decision trees.
- num_round : [{500}] Value(s) to use for XGBoost 'num_round' parameter. Specifies how many rounds of tree creation to perform.
- strictthreshold : [0.5] Probability threshold for assigning a sample to a class. Affects model.classification.inclass.
- predictionrule : { {'mostprobable'} | 'strict' ] governs which classification prediction statistics appear first in the confusion matrix and confusion table summaries.