EVRIModel Objects: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
 
(108 intermediate revisions by 5 users not shown)
Line 5: Line 5:
Model objects have three distinct states:
Model objects have three distinct states:


# '''Empty Models''' - Empty models can be populated with data to analyze, "meta parameters" (model building settings), and other modeling options, then models can be calibrated or built from those settings.
# [[#Building_from_Uncalibrated_Model_Objects|'''Empty Models''']] - Empty models can be populated with data to analyze, "meta parameters" (model building settings), and other modeling options, then models can be calibrated or built from those settings.
# '''Calibrated Models''' - Calibrated models contain all the model results and parameters necessary to apply that model to new data. Plots and other information can be obtained from calibrated models.
# [[#Working_With_Calibrated_Models|'''Calibrated Models''']] - Calibrated models contain all the model results and parameters necessary to apply that model to new data. Plots and other information can be obtained from calibrated models.
# '''Applied Models''' - When a calibrated model is applied to new data, the result is a prediction or "applied model". This object contains all the results from applying the model to the new data. Plots and other information can be obtained from applied models.
# [[#Working_With_Applied_Models_.28Predictions.29|'''Applied Models''']] - When a calibrated model is applied to new data, the result is a prediction or "applied model". This object contains all the results from applying the model to the new data. Plots and other information can be obtained from applied models.


==Working with Model Objects in Matlab==
In addition, there are a number of [[#General_Model_Properties_and_Methods|general properties and methods]] which are available for all model states which are useful in working with EVRIModel objects.
 
==Working with Model Objects in Matlab and Solo Scripting==


EVRIModels are standard Matlab objects which are manipulated using the dot notation to access properties and methods. For example, to retrieve the "model type" (modeltype) property from a model, you give the object (a.k.a. variable) name followed by .modeltype. All examples here will assume that the model is stored in a variable named "model".
EVRIModels are standard Matlab objects which are manipulated using the dot notation to access properties and methods. For example, to retrieve the "model type" (modeltype) property from a model, you give the object (a.k.a. variable) name followed by .modeltype. All examples here will assume that the model is stored in a variable named "model".


<pre>model.modeltype</pre>
<pre>model.modeltype</pre>
Most object methods can be accessed in the same way:
<pre>model.plotscores</pre>
Some methods (<tt>.apply</tt> and <tt>.crossvalidate</tt>, for example) also require for additional inputs. These are passed in parenthesis after naming the method:
<pre>model.apply(newdata)</pre>


===Displaying Contents===
===Displaying Contents===


At the Matlab command line, you can view the contents of a model object by simply typing its name or by using the <tt>.disp</tt> method. When viewing content, there are several ways to view the model:
At the Matlab command line (but not in Solo Scripting), you can view the contents of a model object by simply typing its name or by using the <tt>.disp</tt> method. When viewing content, there are several ways to view the model:
# By Description (Desc.) : this view shows you a text description of the type of model, how it was built, and a summary of its results.
# By Description (Desc.) : this view shows you a text description of the type of model, how it was built, and a summary of its results.
# By Contents : this view contains the raw field information from the model. Users of previous versions of PLS_Toolbox will recognize this as the previous standard display.
# By Contents : this view contains the raw field information from the model. Users of previous versions of PLS_Toolbox will recognize this as the previous standard display.


At the Matlab command window, you can turn either one of these sections on or off by clicking the [on] or [off] hyperlinks in the top display line (shown in blue below)
At the Matlab command window, you can turn either one of these sections on or off by clicking the [on] or [off] hyperlinks in the top display line (shown as <font color="#0000ee"><u>underlined blue</u></font> text below)
 
    PCA Model Object (Desc. ON/<font color="#0000ee"><u>[off]</u></font>  Contents ON/<font color="#0000ee"><u>[off]</u></font>)
 
==Building from Uncalibrated Model Objects==
 
When a model object has been initially created, it contains no data and no results. Many model objects' properties can then be populated with data, meta-parameters, and other settings (options) which can then be used with the <tt>.calibrate</tt> method to build a calibrated model. The <tt>.inputs</tt> property lists the specific properties that can be set for a given model type.
 
:'''NOTE:''' Some model types do NOT support calibration in this manner. In these cases, use the <tt>.cancalibrate</tt> property to determine if it allows calibration directly (1) or if it requires a call to the function named in ''modeltype'' (0). In addition, the model will clearly show the state in its display at the command line with a statement to "See _____ function to calibrate." In these cases, the only way to create a calibrated model is to access the named function directly.
 
===Example===
 
The following is an example which would build a PCA model from the data stored in the <tt>data</tt> variable with 3 principal components:
 
<pre>
model = evrimodel('pca');
model.x = data;
model.ncomp = 3;
model = model.calibrate;
</pre>
 
===Uncalibrated Model Properties===
 
The properties of an uncalibrated model depend on the model type. Typically, a value can be provided for the data to model, plus some number of "meta-parameters" which define aspects of how the model will be built. The list of values available is indicated by the .inputs property. All models which are calibratable (<tt>.cancalibrate</tt> is equal to 1) allow modification of the <tt>.display</tt> and <tt>.plots</tt> properties.
 
The properties available for a given calibratable model type will correspond to the function of the same name as the model type. For example, the "LWR" model type has the properties: <tt>x</tt>, <tt>y</tt>, <tt>ncomp</tt>, and <tt>npts</tt>. These are identical to the inputs listed for the LWR function as described on the inputs section of the [[Lwr#Inputs|LWR documentation page]].
 
The properties which are generally available for all model types are listed below.
 
====Model Status Properties (Read-Only)====
 
{| border="1" cellpadding="5" cellspacing="0"  style="margin-left:3em"
|-
|valign="top" |
<tt>.cancalibrate</tt>
| Returns (1) if the model contains a modeling building definition (see Empty Model description, below), or (0) if the model does not contain a definition and must be calibrated using the function defined in the modeltype property.
|-
|valign="top" |
<tt>.inputs</tt>
| Returns a cell array of strings indicating which properties can be set for the model in its current state. Most often this is used when a model is in an uncalibrated state and this property will indicate what parameters and data fields are available to the user to assign before calibrating the model.
|-
|valign="top" |
<tt>.validmodeltypes</tt>
| Returns a cell array of strings listing the model types which are currently valid for assignment to the <tt>.modeltype</tt> field.
|}
 
&nbsp;


<tt> PCA Model Object (Desc. ON/<font color="#000099"><u>[off]</u></font> Contents ON/<font color="#000099"><u>[off]
====Modifiable Properties====
</u></font>)</tt>
{| border="1" cellpadding="5" cellspacing="0"  style="margin-left:3em"
|-
|valign="top" |
<tt>.modeltype</tt>
| Returns the short "keyword" model type of the current model (or empty string if the model type has not been set). This keyword most often is linked to the PLS_Toolbox function that created the given model. This can be assigned to any model type listed in the <tt>.validmodeltypes</tt> property.
|-
|valign="top" |
<tt>.display</tt>
| String property indicating 'on' if command-line display should be given when calibrating or applying a model and 'off' if no display should be given.
* ''''on'''' : Display command-line output
* ''''off'''' : Do not display any output
|-
|valign="top" |
<tt>.plots</tt>
| String property indicating 'final' if plots should be displayed after calibrating or applying a model and 'none' if no plots should be displayed.
* ''''final'''' : Generate plots (if possible)
* ''''none'''' : Do not generate any plots
|
|-
|valign="top" |
<tt>.options</tt>
| Structure array with modifiable fields specific to each model object. For example
<tt>.options.confidencelimit = 0.99;</tt> sets the default confidence limit for the model to 99%.
|}


==General Model Properties and Methods==
&nbsp;
 
===Uncalibrated Model Methods===
 
Both of the methods below return a model object. In Matlab, when no output is requested, the model object is stored back into the same object invoked. In Solo Scripting, these methods require an output variable, usually the same model object being built from. For example: <tt>m = m.calibrate</tt>
 
{| border="1" cellpadding="5" cellspacing="0"  style="margin-left:3em" |-
|-
|valign="top" |
<tt>.calibrate</tt>
| Build the model based on the current meta-parameters and options.
|-
|valign="top" |
<tt>.crossvalidate(''cvi'',''ncomp'')</tt>
| Build the model and cross-validate with the supplied conditions. ''cvi'' is the cross-validation splitting as described for cvi in [[crossval]] (default = venetian blinds with square-root of the number of samples as splits). ''ncomp'' is the number of components (default = maximum number available).
|}
 
&nbsp;
 
==Working With Calibrated Models==
 
Once calibrated, a model object contains all the results (relevant to the model type) derived from the modeled data. The object also has all the information necessary to apply that model to new data. For many models, methods exist for plotting parts of the model (scores, loadings, eigenvalues, etc.)
 
Whether or not a model has been calibrated can be determined by the <tt>.iscalibrated</tt> property which will be true (1) when the model is calibrated. If the object is a prediction from a model, its <tt>.isprediction</tt> property will be true (1) indicating it cannot be applied to new data (only the original, calibrated model can be applied).
 
===Example===
 
Given a model which has already been calibrated (either by using the calibrate method, or by calling one of the [[Modeling_Function_Overview|model building functions]] directly), the following would produce a plot of the scores for the model:
 
  model.plotscores
 
or obtain a DataSet object containing those scores:
 
  dso = model.plotscores;
 
Extracting the Hotelling's T<sup>2</sup> statistic for the first 5 samples would be done using:
 
  model.t2(1:5)
 
Applying the model to new data in the variable <tt>x_new</tt> could be done using:
 
  prediction = model.apply(x_new);
 
===Calibrated Model Properties===
 
The properties available in a calibrated model depends on the model type. Many of the properties are listed in the [[Standard Model Structure]] documentation. In Matlab, all fields available can be found by using "tab completion" (type the name of the variable containing the model plus a period, then press the [Tab] key) or by using the fieldnames() function.
 
In addition to the properties (fields) listed in the Standard Model Structure information, the following "shortcut" fields exist as an easy way to access properties usually embedded in the object. Note that not all of these fields exist for all model types:


The following properties and methods are always available in a model independent of the model state:
The following properties are available for many models once they have been calibrated and represent "shortcut" methods into the Standard Model Structure fields or other model analysis methods. See also the <tt>.display</tt> and <tt>.plots</tt> properties described in the [[#Modifiable_Properties| Uncalibrated Model Properties (Modifiable Properties) section]]


{| border="1" cellpadding="5" cellspacing="0" align="left"
{| border="1" cellpadding="5" cellspacing="0" style="margin-left:3em"
|-
|valign="top" |
<tt>.componentnames</tt>
| Component names can be added after a model is calculated to models of type PCA, MCR, PURITY, and PARAFAC. Recalculating the model will clear component names so it's recommended to name components at the end of the model building process. Component names can be used to help identify components of models in exported models.
<pre>model.componentnames = {'Low Group 1' 'Low Zirconium' 'High Yttrium'};</pre>
|-
|valign="top" |
<tt>.detail</tt>
| As described in the [[Standard Model Structure]] pages, this field contains model-specific statistics, results, and parameters of the model. The contents are highly varied. For ease of use, any field within the <tt>.detail</tt> property can be accessed ''without'' the <tt>.detail</tt> prefix (i.e. by requesting the value directly from the "top-level" model object. For example:  <tt>model.preprocessing</tt> is identical to <tt>model.detail.preprocessing</tt>.
|-
|-
|valign="top" colspan="2"|
|valign="top" |
<b>Informational Read-Only Properties</b>
<tt>.esterror</tt>
| Returns the error estimate for each sample as defined in [[ils_esterror]]. When used with a prediction (when <tt>.isprediction</tt> is true), the prediction must have either been calculated using the <tt>.apply</tt> method or the original model must be passed as an input:
::<tt> pred.esterror(model)</tt>
|-
|-
|valign="top" |
|valign="top" |
Line 40: Line 177:
|-
|-
|valign="top" |
|valign="top" |
<tt>.isprediction</tt>
<tt>.loadings</tt>
| Returns (1) if the model contains a prediction from applying a calibrated model to new data and (0) if the model is just "calibrated" or "empty".
| Returns the x-block loadings as simple matrix (equivalent to <tt>.loads{2,1}</tt>)
|-
|valign="top" |
<tt>.ncomp</tt>
| Returns the number of components (PCs, LVs, etc) used in the model. For model types that do not have an adjustable parameter for number of components, a value of one (1) will be returned.
|-
|valign="top" |
<tt>.reg</tt>
| Returns the regression vector for model types: MLR, PCR, PLS, and PLSDA. For PLS-2 models, <tt>.reg</tt> will return a cell array with regression coefficients.
|-
|-
|valign="top" |
|valign="top" |
<tt>.cancalibrate</tt>
<tt>.prediction</tt>
| Returns (1) if the model contains a modeling building definition (see Empty Model description, below), or (0) if the model does not contain a definition and must be calibrated using the function defined in the modeltype property.  
| Returns the property most associated with "predictions" for the given model type. Model types are:
* Decomposition (PCA, MCR, etc) - returns x-block scores for each sample (<tt>.loads{1,1}</tt>)
* Regression (PLS, PCR, SVM, etc) - returns y-block predictions (known as y_hat, usually <tt>.pred{2}</tt>)
* Classification (PLSDA, SVMDA, KNN, etc) - returns the single-class assignment for each sample as a class ID string (<tt>.classification.inclass</tt> indexed into the class ID lookup <tt>.classification.classids</tt>)
|-
|-
|valign="top" |
|valign="top" |
<tt>.info</tt>
<tt>.predictionlabel</tt>
| Returns (or displays with no outputs) the text description of the model. This is the same description shown at the Matlab command line when the model is viewed with content "on". With an output, the results are returned as a cell array of strings.
| Returns the labels associated with each column of the .prediction field (see above)
|-
|-
|valign="top" |
|valign="top" |
<tt>.content</tt>
<tt>.q</tt>
| Returns the "raw" model information in a form that is most similar to the model structures from previous versions of PLS_Toolbox and Solo. Generally, users need not access this field directly except to provide a model in a form more similar to old models.
| Returns the x-block sum squared residuals for each sample (<tt>.ssqresiduals{1}</tt>). If the ''reducedstats'' property is set to 'on', the value returned by this property is normalized to the confidence limit stored in the model.detail.reslim{1} field.
|-
|-
|valign="top" |
|valign="top" |
<tt>.downgradeinfo</tt>
<tt>.qcon</tt>
| Informational string explaining the purpose of the <tt>.content</tt> field.
| ''See methods below''
|-
|-
|valign="top" |
|valign="top" |
<tt>.evrimodelversion</tt><br>
<tt>.scoredistance</tt>
<tt>.modelversion</tt>
| Returns the normalized k nearest neighbor score distance. This distance is used to detect "inliers" which are samples that fall within the Q and T2 limits, but are in an unusual area of score space with few other close samples. The value is as defined in [[knnscoredistance]] except that the values returned by this property are normalized to the maximum value observed for all included calibration samples. Thus, a value of 1 for a sample means that the given sample is as far away from any calibration sample(s) as was the furthest (most unusual) calibration sample. Can optionally take an integer input to indicate the number of neighbors (k) to calculate distance from. Default for k is defined in [[knnscoredistance]] and is usually 3.
| Returns a string containing the model version description. The model version is almost always linked to the version of PLS_Toolbox or Solo that created the given model. The two field names here are synonymous.
::<tt> model.scoredistance(1)</tt>
When used with a prediction (when <tt>.isprediction</tt> is true), the prediction must have either been calculated using the <tt>.apply</tt> method or the original model must be passed as an input, with or without a value for k:
::<tt> pred.scoredistance(model)</tt>
::<tt> pred.scoredistance(model,1)</tt>
 
 
KNN Distance with k=1 is equivalent to the Nearest Neighbor Distance described in the ASTM standard D6122-06 "Standard Practice for Validation of the Performance of Multivariate Process Infrared Spectrophotometers" Section A3 Outlier Detection Methods sub-section A3.4 Nearest Neighbor Distance.  
|-
|-
|valign="top" |
|valign="top" |
<tt>.validmodeltypes</tt>
<tt>.scores</tt>
| Returns a cell array of strings listing the model types which are currently valid for assignment to the <tt>.modeltype</tt> field.
| Returns the x-block scores for each sample (<tt>.loads{1,1}</tt>)
|-
|-
|valign="top" |
|valign="top" |
<tt>.isclassification</tt>
<tt>.t2</tt>
| Returns (1) if the model is a classification model that returns class assignments for unknowns or (0) if it is a decomposition or regression model type
| Returns the Hotelling's T<sup>2</sup> for the x-block (<tt>.tsqs{1}</tt>).  If the ''reducedstats'' property is set to 'on', the value returned by this property is normalized to the confidence limit stored in the model.detail.tsqlim{1} field.
|-
|-
|valign="top" |
|valign="top" |
<tt>.settings</tt>
<tt>.tcon</tt>
| Returns a cell array of strings indicating which properties can be set for the model in its current state. Most often this is useful when a model is in an uncalibrated state. This property will indicate what parameters and data fields are available to the user to assign before calibrating the model.
| ''See methods below''
|-
|-
|valign="top" |
|valign="top" |
<tt>.uniqueid</tt>
<tt>.uniqueid</tt>
| Returns a string which uniquely identifies this model including the author, author's computer, and a date/time stamp. This uniqueid can be used to safely discriminate between different models.
| Returns a unique ID identifying this model based on its model type, author, and build time/date.
|-
|-
|valign="top" colspan="2"|
|valign="top" |
'''General Read/Write Properties'''
<tt>.x</tt>
| Returns the original x-block data (when available)
|-
|-
|valign="top" |
|valign="top" |
<tt>.modeltype</tt>
<tt>.xhat</tt>
| Returns the short "keyword" model type of the current model (or empty if the model type has not been set). This keyword most often is linked to the PLS_Toolbox function that created the given model.
| Returns the reconstructed x-block (x_hat, see [[datahat]])
|-
|-
|valign="top" |
|valign="top" |
<tt>.plots</tt>
<tt>.y</tt>
| String property indicating 'final' if plots should be displayed after calibrating or applying a model and 'none' if no plots should be displayed.
| Returns the original y-block data (when available)
|-
|-
|valign="top" |
|valign="top" |
<tt>.display</tt>
<tt>.yhat</tt>
| String property indicating 'on' if command-line display should be given when calibrating or applying a model and 'off' if no display should be given.
| Returns the estimated y-block (y_hat, as estimated by the model)
|}
 
The following properties modify the behavior of the properties and methods of a calibrated model:
 
{| border="1" cellpadding="5" cellspacing="0" style="margin-left:3em"
|-
|-
|valign="top" colspan="2"|
|valign="top" |
'''General Methods'''
<tt>.matchvars</tt>
| Governs use of variable alignment when appling the model to new data via the .apply() method.
* 'on' = new data will be aligned to model before application.
* 'off' = if the new data variables do not match the model's expected variables, an error will be thrown.
|-
|-
|valign="top" |
|valign="top" |
<tt>.disp</tt>
<tt>.contributions</tt>
| Displays the contents of the model. There is no output variable from this method, it only displays so this content cannot be captured for display elsewhere (in graphical interfaces, for example). However, see the method <tt>.info</tt> for this purpose.
|  
Governs detail of returned T^2 and Q contributions from the .tcon and .qcon properties. Return contributions for:
* 'passed' = only the variables passed by the client in the order passed. This mode allows the client to easily map contributions back to passed data and is the preferred mode.
* 'used' = all variables used by the model including even variables which client did not provide. Variable order is that used by model and may not match the order passed by the client.
* 'full' = all variables used or excluded by the model, including even variables which client did not provide. Variable order is that used by model and may not match the order passed by the client.
|-
|-
|valign="top" |
|valign="top" |
<tt>.help</tt>
<tt>.reducedstats</tt>
| Alone without any additional sub-indexing, this method brings up the help which is most relevant for the particular model type.
| Governs whether Q and T^2 statistics returned by .T2 and .Q properties are "reduced" using the confidence limit set in the model.detail.reslim and model.detail.tsqlim fields.
* 'on' = statistics are normalized using the value stored in the appropriate detail sub-field.
* 'off' = statistics are returned as calculated.
|}
|}


===Calibrated Model Methods===
The following methods are available when a model has been calibrated.
{| border="1" cellpadding="5" cellspacing="0"  style="margin-left:3em"
|-
|valign="top" |
<tt>.apply(x_new,''y_new'',''options'')</tt>       
| Applies the model to the data  <tt>x_new</tt> and returns a [[#Working_With_Applied_Models_.28Predictions.29|prediction structure]]. If <tt>''y_new''</tt> is supplied (and is appropriate for the model type, e.g. the model is a regression model), this data will be used as validation/test values to compare predictions against. If <tt>''options''</tt> is supplied, it is passed into the model prediction function (allowing modification of some parameters.).
:'''Note:''' The [[#Modifiable_Properties|<tt>.plots</tt> and <tt>.display</tt> properties]] of a model will be used when using this method. If enabled, the method will show plots and/or command-line display as requested. The values in these properties will always override any values passed in the <tt>''options''</tt> input to the <tt>.apply</tt> method.
|-
|valign="top" |
<tt>.crossvalidate(x,''cvi'',''ncomp'')</tt>       
| When cross-validation has not been done when building a model, this method can be used to cross-validate a model (with the conditions used to build it) and store the results in the model object. This is a similar method to the one used with the [[#Uncalibrated_Model_Methods|Uncalibrated Model Methods]] except that it generally ''requires'' the x-block data be provided in the inputs (since most models do not keep the original calibration x-block data in the calibrated model structure.) The <tt>''cvi''</tt> and <tt>''ncomp''</tt> inputs are [[#Uncalibrated_Model_Methods|as defined above]].
:'''Note:''' The [[#Modifiable_Properties|<tt>.plots</tt> and <tt>.display</tt> properties]] of a model will be used when using this method. If enabled, the method will show plots and/or command-line display as requested.
|-
|valign="top" |
<tt>.ssqtable .ssqtext .ssqcell</tt>   
|Return variance captured table in different formats (MATALB table objec, raw text, or cell array).
|-
|valign="top" |
<tt>.ploteigen</tt>   
|With no outputs, this method generates a plot of the eigenvalues or other  statistics associated with changing the number of components in the model (e.g. RMSEC, misclassification rates) for the given model. With an output, no plot is generated but the DataSet object containing the data that would have been plotted is returned.
|-
|valign="top" |
<tt>.plotloads</tt>   
|With no outputs, this method generates a plot of the loadings (including all variable-specific statistics and results) for the given model. With an output, no plot is generated but the DataSet object containing the loadings is returned.
|-
|valign="top" |
<tt>.plotscores</tt>   
| With no outputs, this method generates a plot of the scores (including all sample-specific statistics and results) for the given model. With an output, no plot is generated but the DataSet object containing the scores is returned.
|-
|valign="top" |
<tt>.qcon(x)</tt>
| Returns the Q contributions (matrix of x-block residuals for each sample). For most model types, this method ''requires'' input of the x-data for which the q residuals should be calculated. (see [[qconcalc]])
|-
|valign="top" |
<tt>.tcon(''x'')</tt>
| Returns the Hotelling's T<sup>2</sup> contributions (scaled matrix of x-block projections into the model for each sample.) If the ''x'' input is omitted, the contributions for the calibration data are returned. If ''x'' is supplied, the contributions for the supplied x-data are calculated. (see [[tconcalc]])
|}


&nbsp;
&nbsp;


==Building Model from Empty Model Object==
==Working With Applied Models (Predictions)==


When an EVRIModel object has been initially created, it contains no data and no results. Most model objects can then be populated with data, meta-parameters, and other settings (options) which can then be used to calibrate a model from.
When a model is applied to new data, the output is an applied model, also known as a prediction object. The object type itself is still an EVRIModel Object and nearly all of the methods and properties that were available [[#Working_With_Calibrated_Models|when working with a calibrated model]] are available with an applied model. The most notable difference is that any plots or sample-specific results extracted from the model will be for the data to which it was applied instead of the calibration data. For example, when a model which calculates scores is applied to new data, the resulting EVRIModel Object will contain a <tt>.scores</tt> property that is the scores calculated for the new data.


:''NOTE:'' Some model types do NOT support calibration in this manner. In these cases, the model will clearly show the state in its display at the command line with a statement to "See _____ function to calibrate." In these cases, the only way to create a calibrated model is to access the named function directly. A model can be interrogated with the .cancalibrate property to determine if it allows calibration directly or if it requires a call to modeltype function.
Whether a model object is a calibrated model or a model prediction can be determined by looking at the <tt>.isprediction</tt> field. Note that a prediction object cannot be applied to new data. Only the original model can be applied. However, if a model has been applied using the <tt>.apply</tt> method of a model, the original model is generally stored in the <tt>.parent</tt> field so the model could be re-applied using:  <tt>pred2 = pred.parent.apply(x_new2)</tt> where <tt>x_new2</tt> is new(er) data to apply the original model to.


===Applied Model Properties===
{| border="1" cellpadding="5" cellspacing="0" style="margin-left:3em"
|-
|valign="top" |
<tt>.isprediction</tt>
| Returns (1) if the model contains a prediction from applying a calibrated model to new data and (0) if the model is just "calibrated" or "empty".
|-
|valign="top" |
<tt>.parent</tt>
| When a model has been applied to new data using the <tt>.apply</tt> method, this property will contain a copy of the original model object. The contents of this property are automatically used when a plotting method requires both the calibration and application data.
|}


==Working With Calibrated Models==
&nbsp;


Describe how to use them
==General Model Properties and Methods==


===Methods===
In addition to the properties and methods described above, the following properties and methods are always available in a model independent of the model state or model type:


{| border="1" cellpadding="5" cellspacing="0" align="left"
===Informational Properties (Read-Only)===
{| border="1" cellpadding="5" cellspacing="0"   style="margin-left:3em"
|-
|valign="top" |
<tt>.author</tt>
| String describing the author and computer on which this model was created. Usually ''user@computername''. Given a system with assigned usernames and computer names, this is equivalent to an electronic signature on a model.
|-
|valign="top" |
<tt>.content</tt>
| Returns the "raw" model information in a form that is most similar to the model structures from previous versions of PLS_Toolbox and Solo. Generally, users need not access this field directly except to provide a model in a form more similar to old models.
|-
|valign="top" |
<tt>.downgradeinfo</tt>
| Informational string explaining the purpose of the <tt>.content</tt> field.
|-
|valign="top" |
<tt>.evrimodelversion</tt><br>
<tt>.modelversion</tt>
| Returns a string containing the model version description. The model version is almost always linked to the version of PLS_Toolbox or Solo that created the given model. The two field names here are synonymous.
|-
|-
|valign="top" |
|valign="top" |
<tt>blabla</tt>  
<tt>.info</tt>
| Start up the...
| Returns (or displays with no outputs) the text description of the model. This is the same description shown at the Matlab command line when the model is viewed with content "on". With an output, the results are returned as a cell array of strings.
|-
|-
|valign="top" |
<tt>.isclassification</tt>
| Returns (1) if the model is a classification model that returns class assignments for unknowns or (0) if it is a decomposition or regression model type.
|-
|valign="top" |
<tt>.isyused</tt>
| Returns (1) if the model is using a 2-block (x-block and y-block) method.
|-
|valign="top" |
<tt>.uniqueid</tt>
| Returns a string which uniquely identifies this model including the author, author's computer, and a date/time stamp. This uniqueid can be used to safely discriminate between different models.
|-
|valign="top" |
<tt>.validmodeltypes</tt>
| Returns a cell array of strings listing the model types which are currently valid for assignment to the <tt>.modeltype</tt> field.
|}
|}


&nbsp;
&nbsp;
===Properties===


===General Methods===
{| border="1" cellpadding="5" cellspacing="0"  style="margin-left:3em"
|-
|valign="top" |
<tt>.disp</tt>
| Displays the contents of the model. There is no output variable from this method, it only displays the information. For access to the content, see the <tt>.info</tt> method.


{| border="1" cellpadding="5" cellspacing="0" align="left"
|-
|-
|valign="top" colspan="2"|
|valign="top" |
'''General Properties'''
<tt>.encode</tt>
| Returns m-script code which, when executed by Matlab with PLS_Toolbox, will regenerate the model contents (note: this code does not rebuild the model from raw data, but reconstitutes the content of the model from this text format description of the model.) See the [[encode]] function.
|-
|valign="top" |
<tt>.encodexml</tt>
| Returns xml descriptor of the model content. Parsing this content using the XML import functions of Solo or PLS_Toolbox (see [[parsexml]]) will regenerate the model contents from this text format description of the model.  See the [[encodexml]] function.
|-
|-
|valign="top" |
|valign="top" |
<tt>blalba</tt>
<tt>.help</tt>
| blabla
<tt>.help.predictions</tt>
| Alone without any additional sub-indexing, this method brings up the help which is most relevant for the particular model type. With the <tt>.predictions</tt> sub-field, this method returns [[Solo_Predictor_Script_Construction#Common_Return_Properties|a structure array of possible sub-fields]] that may be requested for certain properties of the current model.
|-
|-
|valign="top" |
<tt>.isnewmodel</tt>
| Test if model is newer than current version.
|}
|}


&nbsp;
&nbsp;


==Alphabetical List of Object Properties and Methods==
==Backwards Compatibility==


<pre>
In general, PLS_Toolbox and Solo models cannot be guaranteed to be backwards compatible to earlier versions of the software. This is because we may introduce a new preprocessing method, or numerical calculation option to an analysis method which simply doesn't exist in the earlier software. Although Eigenvector Research cannot guarantee that we won't make changes to our data or model formats that will "break" code which users have written, we do make every effort to make new code as compatible with old user code as much as practical. The EVRIModel Object has been similarly constructed and will, for the most part, behave much as the old PLS_Toolbox model structures. Indexing into fields and referencing models in code will appear almost identical.
apply         
 
author       
One notable exception is that the EVRIModel Object itself is stored in a format such that a model saved in the released version of PLS_Toolbox or Solo will ''not'' be readable by a version of the software released prior to the introduction of EVRIModel objects. The only way to extract the "simple" model structure format that existed prior to the EVRIModel object is to use the [[#General_Model_Properties_and_Methods|<tt>.content</tt> property]] of the model object. This will convert the top-level model into the basic format that is nominally readable by old versions. However, it is critical to note that other model format or algorithmic changes may make this backwards compatibility impossible.
calibrate     
 
cancalibrate 
An additional way to extract all model objects into their non-object form is to set the 'noobject' property to 1 (one).
content      
   setplspref('evrimodel','noobject',1)
crossvalidate 
All models (top-level models and any embedded models) loaded when that flag is enabled will be automatically extracted.
datasource   
 
date         
If backwards compatibility is truly needed, it is best to contact the [mailto:helpdesk@eigenvector.com Eigenvector Research Helpdesk].
detail       
display       
downgradeinfo 
evrimodelversion
help         
info         
iscalibrated 
isclassification
isprediction    
loadings     
modeltype     
ncomp         
parent       
ploteigen     
plotloads     
plots         
plotscores   
prediction   
q             
scores       
settings     
t2           
time         
uniqueid     
validmodeltypes
x             
xhat         
y             
yhat   
</pre>

Latest revision as of 09:01, 18 March 2024

Introduction

EVRIModel Objects provide access to the Standard Model Structure content of all models and provide some easy-to-use methods and properties for building, manipulating, and reviewing models from Matlab's command line, scripts, and functions. In addition, these properties and methods are available from Solo Scripting when using Solo_Predictor and Solo_Server. This page describes the various modes, methods, and properties of EVRIModel objects, here shortened to just "model objects".

Model objects have three distinct states:

  1. Empty Models - Empty models can be populated with data to analyze, "meta parameters" (model building settings), and other modeling options, then models can be calibrated or built from those settings.
  2. Calibrated Models - Calibrated models contain all the model results and parameters necessary to apply that model to new data. Plots and other information can be obtained from calibrated models.
  3. Applied Models - When a calibrated model is applied to new data, the result is a prediction or "applied model". This object contains all the results from applying the model to the new data. Plots and other information can be obtained from applied models.

In addition, there are a number of general properties and methods which are available for all model states which are useful in working with EVRIModel objects.

Working with Model Objects in Matlab and Solo Scripting

EVRIModels are standard Matlab objects which are manipulated using the dot notation to access properties and methods. For example, to retrieve the "model type" (modeltype) property from a model, you give the object (a.k.a. variable) name followed by .modeltype. All examples here will assume that the model is stored in a variable named "model".

model.modeltype

Most object methods can be accessed in the same way:

model.plotscores

Some methods (.apply and .crossvalidate, for example) also require for additional inputs. These are passed in parenthesis after naming the method:

model.apply(newdata)

Displaying Contents

At the Matlab command line (but not in Solo Scripting), you can view the contents of a model object by simply typing its name or by using the .disp method. When viewing content, there are several ways to view the model:

  1. By Description (Desc.) : this view shows you a text description of the type of model, how it was built, and a summary of its results.
  2. By Contents : this view contains the raw field information from the model. Users of previous versions of PLS_Toolbox will recognize this as the previous standard display.

At the Matlab command window, you can turn either one of these sections on or off by clicking the [on] or [off] hyperlinks in the top display line (shown as underlined blue text below)

   PCA Model Object (Desc. ON/[off]  Contents ON/[off])

Building from Uncalibrated Model Objects

When a model object has been initially created, it contains no data and no results. Many model objects' properties can then be populated with data, meta-parameters, and other settings (options) which can then be used with the .calibrate method to build a calibrated model. The .inputs property lists the specific properties that can be set for a given model type.

NOTE: Some model types do NOT support calibration in this manner. In these cases, use the .cancalibrate property to determine if it allows calibration directly (1) or if it requires a call to the function named in modeltype (0). In addition, the model will clearly show the state in its display at the command line with a statement to "See _____ function to calibrate." In these cases, the only way to create a calibrated model is to access the named function directly.

Example

The following is an example which would build a PCA model from the data stored in the data variable with 3 principal components:

model = evrimodel('pca');
model.x = data;
model.ncomp = 3;
model = model.calibrate;

Uncalibrated Model Properties

The properties of an uncalibrated model depend on the model type. Typically, a value can be provided for the data to model, plus some number of "meta-parameters" which define aspects of how the model will be built. The list of values available is indicated by the .inputs property. All models which are calibratable (.cancalibrate is equal to 1) allow modification of the .display and .plots properties.

The properties available for a given calibratable model type will correspond to the function of the same name as the model type. For example, the "LWR" model type has the properties: x, y, ncomp, and npts. These are identical to the inputs listed for the LWR function as described on the inputs section of the LWR documentation page.

The properties which are generally available for all model types are listed below.

Model Status Properties (Read-Only)

.cancalibrate

Returns (1) if the model contains a modeling building definition (see Empty Model description, below), or (0) if the model does not contain a definition and must be calibrated using the function defined in the modeltype property.

.inputs

Returns a cell array of strings indicating which properties can be set for the model in its current state. Most often this is used when a model is in an uncalibrated state and this property will indicate what parameters and data fields are available to the user to assign before calibrating the model.

.validmodeltypes

Returns a cell array of strings listing the model types which are currently valid for assignment to the .modeltype field.

 

Modifiable Properties

.modeltype

Returns the short "keyword" model type of the current model (or empty string if the model type has not been set). This keyword most often is linked to the PLS_Toolbox function that created the given model. This can be assigned to any model type listed in the .validmodeltypes property.

.display

String property indicating 'on' if command-line display should be given when calibrating or applying a model and 'off' if no display should be given.
  • 'on' : Display command-line output
  • 'off' : Do not display any output

.plots

String property indicating 'final' if plots should be displayed after calibrating or applying a model and 'none' if no plots should be displayed.
  • 'final' : Generate plots (if possible)
  • 'none' : Do not generate any plots

.options

Structure array with modifiable fields specific to each model object. For example

.options.confidencelimit = 0.99; sets the default confidence limit for the model to 99%.

 

Uncalibrated Model Methods

Both of the methods below return a model object. In Matlab, when no output is requested, the model object is stored back into the same object invoked. In Solo Scripting, these methods require an output variable, usually the same model object being built from. For example: m = m.calibrate

.calibrate

Build the model based on the current meta-parameters and options.

.crossvalidate(cvi,ncomp)

Build the model and cross-validate with the supplied conditions. cvi is the cross-validation splitting as described for cvi in crossval (default = venetian blinds with square-root of the number of samples as splits). ncomp is the number of components (default = maximum number available).

 

Working With Calibrated Models

Once calibrated, a model object contains all the results (relevant to the model type) derived from the modeled data. The object also has all the information necessary to apply that model to new data. For many models, methods exist for plotting parts of the model (scores, loadings, eigenvalues, etc.)

Whether or not a model has been calibrated can be determined by the .iscalibrated property which will be true (1) when the model is calibrated. If the object is a prediction from a model, its .isprediction property will be true (1) indicating it cannot be applied to new data (only the original, calibrated model can be applied).

Example

Given a model which has already been calibrated (either by using the calibrate method, or by calling one of the model building functions directly), the following would produce a plot of the scores for the model:

 model.plotscores

or obtain a DataSet object containing those scores:

 dso = model.plotscores;

Extracting the Hotelling's T2 statistic for the first 5 samples would be done using:

 model.t2(1:5)

Applying the model to new data in the variable x_new could be done using:

 prediction = model.apply(x_new);

Calibrated Model Properties

The properties available in a calibrated model depends on the model type. Many of the properties are listed in the Standard Model Structure documentation. In Matlab, all fields available can be found by using "tab completion" (type the name of the variable containing the model plus a period, then press the [Tab] key) or by using the fieldnames() function.

In addition to the properties (fields) listed in the Standard Model Structure information, the following "shortcut" fields exist as an easy way to access properties usually embedded in the object. Note that not all of these fields exist for all model types:

The following properties are available for many models once they have been calibrated and represent "shortcut" methods into the Standard Model Structure fields or other model analysis methods. See also the .display and .plots properties described in the Uncalibrated Model Properties (Modifiable Properties) section

.componentnames

Component names can be added after a model is calculated to models of type PCA, MCR, PURITY, and PARAFAC. Recalculating the model will clear component names so it's recommended to name components at the end of the model building process. Component names can be used to help identify components of models in exported models.
model.componentnames = {'Low Group 1' 'Low Zirconium' 'High Yttrium'};

.detail

As described in the Standard Model Structure pages, this field contains model-specific statistics, results, and parameters of the model. The contents are highly varied. For ease of use, any field within the .detail property can be accessed without the .detail prefix (i.e. by requesting the value directly from the "top-level" model object. For example: model.preprocessing is identical to model.detail.preprocessing.

.esterror

Returns the error estimate for each sample as defined in ils_esterror. When used with a prediction (when .isprediction is true), the prediction must have either been calculated using the .apply method or the original model must be passed as an input:
pred.esterror(model)

.iscalibrated

Returns (1) if the model has been calibrated or applied and (0) if the model is in the "empty" state and has not been calibrated.

.loadings

Returns the x-block loadings as simple matrix (equivalent to .loads{2,1})

.ncomp

Returns the number of components (PCs, LVs, etc) used in the model. For model types that do not have an adjustable parameter for number of components, a value of one (1) will be returned.

.reg

Returns the regression vector for model types: MLR, PCR, PLS, and PLSDA. For PLS-2 models, .reg will return a cell array with regression coefficients.

.prediction

Returns the property most associated with "predictions" for the given model type. Model types are:
  • Decomposition (PCA, MCR, etc) - returns x-block scores for each sample (.loads{1,1})
  • Regression (PLS, PCR, SVM, etc) - returns y-block predictions (known as y_hat, usually .pred{2})
  • Classification (PLSDA, SVMDA, KNN, etc) - returns the single-class assignment for each sample as a class ID string (.classification.inclass indexed into the class ID lookup .classification.classids)

.predictionlabel

Returns the labels associated with each column of the .prediction field (see above)

.q

Returns the x-block sum squared residuals for each sample (.ssqresiduals{1}). If the reducedstats property is set to 'on', the value returned by this property is normalized to the confidence limit stored in the model.detail.reslim{1} field.

.qcon

See methods below

.scoredistance

Returns the normalized k nearest neighbor score distance. This distance is used to detect "inliers" which are samples that fall within the Q and T2 limits, but are in an unusual area of score space with few other close samples. The value is as defined in knnscoredistance except that the values returned by this property are normalized to the maximum value observed for all included calibration samples. Thus, a value of 1 for a sample means that the given sample is as far away from any calibration sample(s) as was the furthest (most unusual) calibration sample. Can optionally take an integer input to indicate the number of neighbors (k) to calculate distance from. Default for k is defined in knnscoredistance and is usually 3.
model.scoredistance(1)

When used with a prediction (when .isprediction is true), the prediction must have either been calculated using the .apply method or the original model must be passed as an input, with or without a value for k:

pred.scoredistance(model)
pred.scoredistance(model,1)


KNN Distance with k=1 is equivalent to the Nearest Neighbor Distance described in the ASTM standard D6122-06 "Standard Practice for Validation of the Performance of Multivariate Process Infrared Spectrophotometers" Section A3 Outlier Detection Methods sub-section A3.4 Nearest Neighbor Distance.

.scores

Returns the x-block scores for each sample (.loads{1,1})

.t2

Returns the Hotelling's T2 for the x-block (.tsqs{1}). If the reducedstats property is set to 'on', the value returned by this property is normalized to the confidence limit stored in the model.detail.tsqlim{1} field.

.tcon

See methods below

.uniqueid

Returns a unique ID identifying this model based on its model type, author, and build time/date.

.x

Returns the original x-block data (when available)

.xhat

Returns the reconstructed x-block (x_hat, see datahat)

.y

Returns the original y-block data (when available)

.yhat

Returns the estimated y-block (y_hat, as estimated by the model)

The following properties modify the behavior of the properties and methods of a calibrated model:

.matchvars

Governs use of variable alignment when appling the model to new data via the .apply() method.
  • 'on' = new data will be aligned to model before application.
  • 'off' = if the new data variables do not match the model's expected variables, an error will be thrown.

.contributions

Governs detail of returned T^2 and Q contributions from the .tcon and .qcon properties. Return contributions for:

  • 'passed' = only the variables passed by the client in the order passed. This mode allows the client to easily map contributions back to passed data and is the preferred mode.
  • 'used' = all variables used by the model including even variables which client did not provide. Variable order is that used by model and may not match the order passed by the client.
  • 'full' = all variables used or excluded by the model, including even variables which client did not provide. Variable order is that used by model and may not match the order passed by the client.

.reducedstats

Governs whether Q and T^2 statistics returned by .T2 and .Q properties are "reduced" using the confidence limit set in the model.detail.reslim and model.detail.tsqlim fields.
  • 'on' = statistics are normalized using the value stored in the appropriate detail sub-field.
  • 'off' = statistics are returned as calculated.

Calibrated Model Methods

The following methods are available when a model has been calibrated.

.apply(x_new,y_new,options)

Applies the model to the data x_new and returns a prediction structure. If y_new is supplied (and is appropriate for the model type, e.g. the model is a regression model), this data will be used as validation/test values to compare predictions against. If options is supplied, it is passed into the model prediction function (allowing modification of some parameters.).
Note: The .plots and .display properties of a model will be used when using this method. If enabled, the method will show plots and/or command-line display as requested. The values in these properties will always override any values passed in the options input to the .apply method.

.crossvalidate(x,cvi,ncomp)

When cross-validation has not been done when building a model, this method can be used to cross-validate a model (with the conditions used to build it) and store the results in the model object. This is a similar method to the one used with the Uncalibrated Model Methods except that it generally requires the x-block data be provided in the inputs (since most models do not keep the original calibration x-block data in the calibrated model structure.) The cvi and ncomp inputs are as defined above.
Note: The .plots and .display properties of a model will be used when using this method. If enabled, the method will show plots and/or command-line display as requested.

.ssqtable .ssqtext .ssqcell

Return variance captured table in different formats (MATALB table objec, raw text, or cell array).

.ploteigen

With no outputs, this method generates a plot of the eigenvalues or other statistics associated with changing the number of components in the model (e.g. RMSEC, misclassification rates) for the given model. With an output, no plot is generated but the DataSet object containing the data that would have been plotted is returned.

.plotloads

With no outputs, this method generates a plot of the loadings (including all variable-specific statistics and results) for the given model. With an output, no plot is generated but the DataSet object containing the loadings is returned.

.plotscores

With no outputs, this method generates a plot of the scores (including all sample-specific statistics and results) for the given model. With an output, no plot is generated but the DataSet object containing the scores is returned.


.qcon(x)

Returns the Q contributions (matrix of x-block residuals for each sample). For most model types, this method requires input of the x-data for which the q residuals should be calculated. (see qconcalc)

.tcon(x)

Returns the Hotelling's T2 contributions (scaled matrix of x-block projections into the model for each sample.) If the x input is omitted, the contributions for the calibration data are returned. If x is supplied, the contributions for the supplied x-data are calculated. (see tconcalc)

 

Working With Applied Models (Predictions)

When a model is applied to new data, the output is an applied model, also known as a prediction object. The object type itself is still an EVRIModel Object and nearly all of the methods and properties that were available when working with a calibrated model are available with an applied model. The most notable difference is that any plots or sample-specific results extracted from the model will be for the data to which it was applied instead of the calibration data. For example, when a model which calculates scores is applied to new data, the resulting EVRIModel Object will contain a .scores property that is the scores calculated for the new data.

Whether a model object is a calibrated model or a model prediction can be determined by looking at the .isprediction field. Note that a prediction object cannot be applied to new data. Only the original model can be applied. However, if a model has been applied using the .apply method of a model, the original model is generally stored in the .parent field so the model could be re-applied using: pred2 = pred.parent.apply(x_new2) where x_new2 is new(er) data to apply the original model to.

Applied Model Properties

.isprediction

Returns (1) if the model contains a prediction from applying a calibrated model to new data and (0) if the model is just "calibrated" or "empty".

.parent

When a model has been applied to new data using the .apply method, this property will contain a copy of the original model object. The contents of this property are automatically used when a plotting method requires both the calibration and application data.

 

General Model Properties and Methods

In addition to the properties and methods described above, the following properties and methods are always available in a model independent of the model state or model type:

Informational Properties (Read-Only)

.author

String describing the author and computer on which this model was created. Usually user@computername. Given a system with assigned usernames and computer names, this is equivalent to an electronic signature on a model.

.content

Returns the "raw" model information in a form that is most similar to the model structures from previous versions of PLS_Toolbox and Solo. Generally, users need not access this field directly except to provide a model in a form more similar to old models.

.downgradeinfo

Informational string explaining the purpose of the .content field.

.evrimodelversion
.modelversion

Returns a string containing the model version description. The model version is almost always linked to the version of PLS_Toolbox or Solo that created the given model. The two field names here are synonymous.

.info

Returns (or displays with no outputs) the text description of the model. This is the same description shown at the Matlab command line when the model is viewed with content "on". With an output, the results are returned as a cell array of strings.

.isclassification

Returns (1) if the model is a classification model that returns class assignments for unknowns or (0) if it is a decomposition or regression model type.

.isyused

Returns (1) if the model is using a 2-block (x-block and y-block) method.

.uniqueid

Returns a string which uniquely identifies this model including the author, author's computer, and a date/time stamp. This uniqueid can be used to safely discriminate between different models.

.validmodeltypes

Returns a cell array of strings listing the model types which are currently valid for assignment to the .modeltype field.

 

General Methods

.disp

Displays the contents of the model. There is no output variable from this method, it only displays the information. For access to the content, see the .info method.

.encode

Returns m-script code which, when executed by Matlab with PLS_Toolbox, will regenerate the model contents (note: this code does not rebuild the model from raw data, but reconstitutes the content of the model from this text format description of the model.) See the encode function.

.encodexml

Returns xml descriptor of the model content. Parsing this content using the XML import functions of Solo or PLS_Toolbox (see parsexml) will regenerate the model contents from this text format description of the model. See the encodexml function.

.help .help.predictions

Alone without any additional sub-indexing, this method brings up the help which is most relevant for the particular model type. With the .predictions sub-field, this method returns a structure array of possible sub-fields that may be requested for certain properties of the current model.

.isnewmodel

Test if model is newer than current version.

 

Backwards Compatibility

In general, PLS_Toolbox and Solo models cannot be guaranteed to be backwards compatible to earlier versions of the software. This is because we may introduce a new preprocessing method, or numerical calculation option to an analysis method which simply doesn't exist in the earlier software. Although Eigenvector Research cannot guarantee that we won't make changes to our data or model formats that will "break" code which users have written, we do make every effort to make new code as compatible with old user code as much as practical. The EVRIModel Object has been similarly constructed and will, for the most part, behave much as the old PLS_Toolbox model structures. Indexing into fields and referencing models in code will appear almost identical.

One notable exception is that the EVRIModel Object itself is stored in a format such that a model saved in the released version of PLS_Toolbox or Solo will not be readable by a version of the software released prior to the introduction of EVRIModel objects. The only way to extract the "simple" model structure format that existed prior to the EVRIModel object is to use the .content property of the model object. This will convert the top-level model into the basic format that is nominally readable by old versions. However, it is critical to note that other model format or algorithmic changes may make this backwards compatibility impossible.

An additional way to extract all model objects into their non-object form is to set the 'noobject' property to 1 (one).

 setplspref('evrimodel','noobject',1)

All models (top-level models and any embedded models) loaded when that flag is enabled will be automatically extracted.

If backwards compatibility is truly needed, it is best to contact the Eigenvector Research Helpdesk.