Ensemble

From Eigenvector Research Documentation Wiki
Revision as of 12:22, 22 July 2024 by Sean (talk | contribs) (Created page with "===Purpose=== Predictions based on a collection of regression models. ===Synopsis=== : [model] = ensemble(models,options); : [pred] = ensemble(x,model,options); : [valid] = ensemble(x,y,model,options); Please note that the recommended way to build and apply an ENSEMBLE model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object. ===Description=== Build an ENSEMBLE...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Purpose

Predictions based on a collection of regression models.

Synopsis

[model] = ensemble(models,options);
[pred] = ensemble(x,model,options);
[valid] = ensemble(x,y,model,options);

Please note that the recommended way to build and apply an ENSEMBLE model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

Build an ENSEMBLE model from an input cell array of calibrated regression models. Alternatively, if a model is passed in ENSEMBLE, it makes a Y prediction for an input test X block. Many flavors of ensembling exist. ENSEMBLE uses what's commonly known as model fusion or voting regression, which combines previously calibrated models to generate a prediction. This assumes that the models in the ensemble are calibrated on the same set of samples. There are no restrictions on the child models being calibrated on different sets of variables, preprocessing, or modeltype. An ENSEMBLE can be created from 2 or more calibration models.

The ensemble model's calibration error ends up being the root mean squared error of the aggregated predictions from each of the models. If each model is cross-validated, the ensemble's cross-validation error is the root mean squared error of each of the model's cross-validation predictions. ENSEMBLE can be applied to new data. This involves applying each of its child models to the test data and aggregating each of those predictions. There are a few ways to aggregate predictions using ENSEMBLE: 'mean' 'median' and 'jackknife'.

Jackknife aggregation uses a leave-one-model-out cross-model validation approach to generate the prediction. Let's assume an ENSEMBLE is comprised of N child models. A new sample is collected and the ENSEMBLE is applied to that sample. First, each of the child model's prediction is recorded, resulting in a 1xN vector. For any model j in the ENSEMBLE, model j's prediction gets altered by taking the median of the remaining N-1 model's prediction.

Inputs

  • models = cell array of previously generated models (when calibrating the model).
  • x = X-block (predictor block) class "double" or "dataset", containing numeric values (to be applied to calibrated ensemble),
  • y = Y-block (predicted block) class "double" or "dataset", containing numeric values (to be applied to calibrated ensemble),
  • model = previously generated model (when applying model to new data).

Outputs

  • model = a standard model structure model with the following fields (see Standard Model Structure):
    • modeltype: 'ENSEMBLE',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • pred: 2 element cell array with
      • model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array)
    • detail: sub-structure with additional model details and results, including:
      • model.detail.ensemble.children: cell array of previously generated models.
  • pred a structure, similar to model for the new data.

Options

options = a structure array with the following fields:

  • algorithm : [{'fusion'}] Mode of ensemble creation. 'fusion' takes previously calibrated models and aggregates their predictions to create the ensemble.
  • aggregation : [{'mean'} | 'median' 'jackknife'] Mode of aggregation to use for predictions. 'mean' takes the mean prediction from each model for every sample, the same is true for 'median'. 'jackknife' uses a median leave-one-model-out approach, followed by another median for the final prediction.

See Also

ensemblesearch