Ensemblesearch
Purpose
Search for optimal ensemble using nchoosek method.
Synopsis
- results = ensemblesearch(models,mink,maxk,aggregation);
Description
Creating the best ensemble from a set of candidate child models can take time. Find the best ensemble from the provided child models using ensemblesearch. The algorithm uses an nchoosek approach to create and test the performance of every combination of ensembles from size mink to maxk. For example, including all models from a set of N candidate models may not result in the best ensemble. This function may reveal a subset of K candidate models that result in the best error. Ensemble ambiguities are also calculated for each ensemble, which is a measure of the diversity of an ensemble. Model diversity is achieved by providing child models with varying preprocessing, included variables, number of factors, over/under fitting, and modeltype. It is recommended to pick the ensemble with the lowest error, minimal overfitting, and high diversity.
Inputs
- models = cell array of previously generated models (when calibrating the model).
- mink = integer indicating minimum number of child models that comprise the ensemble,
- maxk = integer indicating maximum number of child models that comprise the ensemble,
- aggregation : [{'mean'} | 'median'] Mode of aggregation to use for predictions. 'mean' takes the mean prediction from each model for every sample, the same is true for 'median'.
Outputs
- results = a structure with the following fields:
- bestmodel: ENSEMBLE model object with the minimum RMSECV from the search,
- bestcandidateindices: indices from input models array that comprise the above bestmodel ENSEMBLE model object,
- dso: dataset object of results from the search,
- booldso: boolean dataset object of candidate presence in each ensemble from the search.