Using Cross-Validation and Constrainfit: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
 
imported>Rasmus
m (→‎Description: Fixed a spell error (constrainTfit))
 
Line 1: Line 1:
===WARNING:===  
===Purpose===
'''This page is not complete. It is missing both equations and figures from the original document created by C. Miller. It also needs review.'''


For additional information an content see: [http://software.eigenvector.com/faq/index.php?id=108 Cross-validation FAQ]. Parts of that page may be incorporated into this document.
Finds '''A''' minimizing ||X-A*B'|| subject to constraints, given the small matrices ('''X''' ' '''B''') and ('''B''' ' '''B''')


=Background=
===Synopsis===
:  [A,diagnostics]=constrainfit(XB,BtB,Aold,options);  % Constrained
:  [A,diagnostics]=constrainfit(XB,BtB,Aold);  % Unconstrained


Cross validation is a very useful tool that serves two critical functions in chemometrics:
===Description===


# It enables an assessment of the optimal complexity of a model (for example, the number of PCs in a PCA or PCR model, or the number of LVs in a PLS model), and
CONSTRAINFIT solves the least squares problem behind bilinear, trilinear and other multilinear models. Assuming a model '''X''' = '''A'''*'''B''' ' and assuming that '''X''' and '''B''' are known, the least squares estimate of '''A''' is obtained. Rather than using '''X''' and '''B''' this algorithm uses the cross product matrices ('''X''' ' '''B''') and ('''B''' ' '''B''') which are generally smaller and less memory-demanding especially in multi-way models.
# It allows an estimation of the performance of a model when it is applied to unknown data.


For a given data set, cross validation involves a series of experiments, hereby called ''sub-validation'' experiments, each of which involves the removal of a subset of objects from a dataset (the ''test set''), construction of a model using the remaining objects in the dataset (the ''model building set''), and subsequent application of the resulting model to the removed objects. This way, each sub-validation experiment involves testing a model with objects that were ''not'' used to build the model.
CONSTRAINFIT can do a number of general types of regression problems such as nonnegativity-constrained regression, regression with column-orthogonality of '''A''' etc. These constraints are simply set in the option field 'type', e.g. option.type='nonnegativity'. Thus, for most problems, only the 'type' field needs to be set. CONSTRAINFIT will provide a least squares solution to most of these problems.


A typical cross-validation procedure usually involves more than one sub-validation experiment, each of which involves the selection of different subsets of samples for model building and model testing.  As there are several different modeling methods in chemometrics, there are also several different cross-validation methods, and these vary with respect to ''how'' the different sample subsets are selected for these sub-validation experiments. The properties of the 5 different cross-validation methods that are available in PLS_Toolbox are discussed below, and summarized in Table 1.
CONSTRAINFIT can also find '''A''' subject to different constraints on different columns. In this case, the update of '''A''' will be an improvement of the initially provided estimate '''Aold''' though not necessarilly the least squares solution. As CONSTRAINFIT is used inside iterative algorithms, an improvement is sufficient to guarantee overall convergence.


For the following descriptions, n is the total number of objects in the data set, and DS is the number of data splits specified for the cross-validation procedure, which must be less than n/2.
====Inputs====
* '''XB''' = The matrix '''X''' ' '''B'''.
* '''BtB''' = The matrix '''B''' ' '''B'''.
* '''Aold''' = An initial estimate of '''A'''.


'''Venetian Blinds:''' Each of the DS test sets are determined by selecting every DS<sup>th</sup> object in the data set, starting at objects numbered 1 through DS.
====Optional Inputs====
* '''options''' = provides definitions for which type of constraint to impose.


'''Contiguous Blocks:''' Each of the DS test sets are determined by selecting contiguous blocks of n/DS objects in the data set, starting at object number 1.
====Outputs====
* '''A''' = The improved estimate of '''A'''.


'''Random Subsets:''' DS different test sets are determined through random selection of n/DS objects in the data set, such that no single object is in the same test set. This procedure is repeated NI times, where NI is the number of iterations.
===Options===


'''Leave-One-Out:''' Each single object in the data set is used as a test set.
options =  a structure array with the following fields:


'''Custom:''' Each of the test sets is manually defined by the user. Provisions can be made to "force" specific objects to be in every test set, never be in a test set, or not be used in the cross validation procedure at all.
* '''type''': [ {'unconstrained'} | 'nonnegativity' | 'unimodality' | 'orthogonality' | 'columnorthogonal' | 'equality' | 'exponential' | 'rightprod' | 'columnwise']
::: provides quick access to most important settings
::: ''''unconstrained''''   - do unconstrained fit of '''A'''
::: ''''nonnegativity''''  - '''A''' is all nonnegative
::: ''''unimodality''''    - '''A''' has unimodal columns ''and'' is nonnegative
::: ''''orthogonality''''  - '''A''' is orthogonal ('''A''' ' '''A''' = '''I''')
::: ''''columnorthogonal''''- '''A''' has orthogonal columns ('''A''' ' '''A''' = diagonal)
::: ''''equality''''        - columns in '''A''' are subject to equality constraints. Useful for e.g. imposing closure (see settings under options.equality below)
::: ''''exponential''''    - Columns are mono-exponentials
::: ''''rightprod''''      - Fitting '''A''' subject to being of the form '''F*D''', where '''D''' is predefined (must be set in options.advanced.linearconstraints.matrix). if imposed then columnwise constraints (see below) are applied to the columns of '''F''' rather than '''A'''. Hence options.columnconstraints must be set appropriately.
::: ''''L1 penalty''''      - A is estimated using a constraint that A should be sparse - (the higher options.L1.lambda, the sparser A will be). Note: A is also constrained to be nonnegative.


'''Table 1. SEQ Table \* ARABIC 1: Properties of different cross-validation methods in PLS_Toolbox.'''
::: ''''columnwise''''      - A has other constraints than the above. These have to be defined in options.columnconstraints (see below).


{|
* '''columnconstraints''': cell where element f defines constraints on column f (only applicable if options.type = 'columnwise').
::: columnconstraints is a cell vector {f1,f2,f3, ... fF}. Each element f1, f2, etc. corresponds to one column of A. f1 defines constraints on the first column of A etc. Each constraint on a column is defined by a number. For example if f1 is 2, then nonnegativity is imposed on the first column (see definitions below). If f1 = [2 4], then first nonnegativity is imposed and then smoothness. The following constraints are available on individual columns
::: a = 0 : Unconstrained
::: a = 1 : Nonnegativity
::: a = 2 : Unimodality
::: a = 3 : Inequality (every element >= scalar). Scalar has to be in options.inequality.scalar. This is a vector of size F, one scalar for each factor
::: a = 4 : Smoothness. options.smoothness.operator can be used to hold operator (for speeding up. Won't have to be estimated each time. options.smoothness.alpha (0<alpha<1). Setting to zero means no smoothness while setting to 1 means high degree of smoothness.
:::  a = 5 : Fixed elements. The elements that are fixed are defined in options.fixed.
::: a = 6 : Not applicable
::: a = 7 : Approximate unimodality. See options.unimodality.weight
::: a = 8 : Normalize the loading vectors to norm one
::: a = 20: Functional constraint. Using simple pre- or userdefined functions, any functional constraint can be imposed on individual columns. For example, that one column is exponential. Functional constraints require that a function is written (type HELP FITGAUSS for an example).


| ||'''Venetian Blinds'''||'''Contiguous Blocks'''||'''Random Subsets'''||'''Leave-One Out'''||'''Custom'''
::'''Example:''' Fitting the second loading vector as being Gaussian:
<pre>
    NumberFactors=3;
    options.functional=cell(NumberFactors,1);
    ToFix = 2; % This constraint is for the second column
    options.functional{ToFix}.functionhandle = @fitgauss;
    % Define starting parameters
    center = 100;width = 100;height = .1;
    options.functional{ToFix}.parameters = [center width height];
    options.functional{ToFix}.additional=[]; % no additional input
</pre>


|-
When a column has more than one constraint these are generally imposed sequentially starting with the first one in options.columnconstraint. For most constraints, the order of constraints will not be important. Advise is to input constraints with smaller numbers first.


|'''Test sample selection scheme'''||                                                  ||  ||  ||  ||* User-defined subsets
* '''inequality''' : Defines a cutoff. If inequality is defined in columnwise constraints, all elements of that column will be > options.inequality.scalar. Thus, when set to zero, nonnegativity is imposed.
* '''nonnegativity''': defines which algorithm to use for imposing nonnegativity when options.type = 'nonnegativity'. If set to 0, the default NNLS algorithm is used. If set to 1, a faster columnwise update is used which only improves the current least squares fit, if set to 2, an ad hoc approach is used where '''A''' is estimated in a least squares sense and then negative numbers are set to zero. This will not provide a well-defined solution in terms of the least squares loss function. If set to 3, the NMF algorithm is used. This requires that all elements of the data array are nonnegative in order to work properly.
* '''smoothness''': defines how much smoothness is imposed when smoothness is imposed as a columnconstraint. smoothness.alpha is a number between 0 (no smoothness) and 1 (full smoothness)
* '''fixed''': options.fixed.values is a matrix of the same size as the loading matrix ('''A''') with the actual numbers to be fixed in the positions corresponding to the position in A. The remaining positions must be NaN. The degree to which elements are fixed is set in options.fixed.weight (0<weight<1). Zero means not imposed whereas one means completely fixed.
* '''advanced''': In the field advanced.linearconstraints, settings for options.type = 'rightprod' are set. If '''A''' is IxF, then linearconstraint.matrix must be and SxF matrix '''D''''. '''A''' is then found as '''F*D'''. E.g. if '''A''' has three columns, set the predefined matrix D = [1 1 0; 0 0 1], then the first and second of the three columns in A will be identical (F*D where F is to be estimated).
* '''equality''': Settings for using options.type = 'equality'. Two fields are held in equality, C and d. When imposed, CONSTRAINFIT solves for loading matrix A subject to A(i,:)*C' = d for all i. Hence if you want to impose closure and have three factors, set C=[1 1 1] and d=1.
* '''unimodality''': Set weight in options.unimodality.weight. weight==1: exact unimodality. weight==0: no unimodality
* '''functional''': For functional constrains (see above)


* Can "force" specific objects into every test set, every model set, or exclude them from the CV procedure
===Examples===


|-


|'''Parameters'''||* Number of Data Splits (DS)
'''Setting global constraints on A'''
  opt = constrainfit('options');
  opt.type='nonnegativity';
  [A]=constrainfit(XB,BtB,Aold,opt); % Nonnegative


* Maximum number of PCs/LVs||* Number of Data Splits (DS)
'''Setting constraints on just one column of A'''
  opt = constrainfit('options');
  opt.type='columnwise';
  opt.columnconstraints={0;2;0}; % If three columns
  [A]=constrainfit(XB,BtB,Aold,opt); % Second column unimodal


* Maximum number of PCs/LVs||* Number of Data Splits (DS)
<pre>
% Make a noisy dataset such that PARAFAC gives noisy loadings
load aminoacids
x = X.data;
x = x+randn(size(x))*100;


* Number of iterations (NI)
% define parafac options
op=parafac('options');


* Maximum number of PCs/LVs||* Maximum number of PCs/LVs||* Number of data splits (DS)
% set constraints in second mode to be defined columnwise
op.constraints{2}.type='columnwise';


* Object membership for each split
% Define that first column is smooth, second and third unconstrained
op.constraints{2}.columnconstraints={4 0 0};


* All user-defined
% Fit model
model = parafac(x,3,op);
</pre>


|-
Note how the first loading in the second mode is more smooth than the rest. If needed smoothness can be turned up (to one) and down (to zero) using op.constraints{2}.smoothness.alpha=0.6


|'''Number of sub-validation experiments'''||* DS||* DS||* DS \* NI||* Number of Objects||* DS
===See Also===


|-
[[parafac]]
 
|'''Number of test samples per sub-validation'''||* Number of Objects/DS (closest integers)||* Number of Objects/DS (closest integers)||* Number of Objects/DS (closest integers)||* 1||* Can vary, user defined
 
|-
 
|}
 
=Cross Validation in Solo and PLS_Toolbox=
 
==Graphical User Interface==
 
When using the graphical user interface (GUI), the cross-validation utility can be accessed in one of two ways: 1) by selecting the "4. Choose Cross-Validation" button on the Analysis Flowchart, or 2) by selecting the Tools menu on the Analysis window, then selecting Cross-Validation.  Note that a set of data must first be loaded into the Analysis interface before the cross-validation utility is made available.  Once this is done, the Cross-Validation window (Figure 1) appears.
 
Figure 1. The Cross-Validation Window, with Cross-Validation Method options shown.
 
The drop-down menu on the left is used to select the cross-validation method, and the slider bars on the right are used to specify parameters for the cross-validation method that is selected.  Note that not all of the parameters are relevant for every cross-validation method.  For example, if the "contiguous block" method is chosen, the user may only adjust two parameters: the Maximum Number of LVs and the Number of Data Splits (see Figure 2). The parameters that are available for each Cross-Validation method are specified in the second row of Table 1.
 
Figure 2. The Cross-Validation Window, with parameter selection shown for the "contiguous block" method.
 
When the Cross-Validation window in Analysis GUI is first opened, the parameters specified in the slider bars default to values that are based on the dimensionality of the data.  In this case, the loaded X and Y variables, spec1 and conc from nir_data.mat, have 30 objects, and the default number of Data Splits is the integer closest to the square root of the number of objects, which is equal to 5.
 
Once the parameters have been set to desired values, select the "Apply" button to apply these settings and keep the Cross-Validation window open, or select the "OK: button to apply these settings and close the Cross-Validation Window.  Pressing the "Reset" Button causes the parameters to revert to their default settings. A quick way to verify that your cross-validation settings will be applied to the next analysis is to check the "Tools" menu on the Analysis window (see Figure 3).  A check mark next to "Cross-Validation" indicates that your most recently specified cross-validation settings will be applied to the next analysis of the loaded data.
 
Figure 3. Verification that Cross-Validation is selected.
 
==Cross-Validation Results==
 
In the Analysis GUI, the cross-validation process accompanies the construction of the "full" model, which uses the complete set of loaded data. Consequently, once the Analysis is executed, cross-validation results and plots will be made available along with results and plots for the "full" model.  For example, the Model Results window in Analysis will include a Root Mean Square Error of Cross-Validation (RMSECV) value along with the Root Mean Square Error of Calibration (RMSEC) value (see Figure 4). The RMSECV can be defined as
 
'''        ( AUTONUMLGL  \* Arabic \e 1)'''
 
where          contains the values of the Y variable that are estimated by cross-validation (where the value for each object i is estimated using a model that was built using a set of objects that does ''not'' include object i),          contain the known values of the Y variable, and ''n'' is the total number of objects in the data set.
 
Figure 4. Analysis results, with cross-validation specified.
 
Also, selection of the "Plot Variance Captured" plot button will enable the user to plot RMSEC and RMSECV as a function of the number of latent variables retained in the model (see Figure 5). Such a plot is useful for determining the optimal number of latent variables to retain in a model that is built using the full set of data.
 
Figure 5. Plot of Calibration error and Cross-Validation error as a function of the number of latent variables retained in the PLS model.
 
In principle, the RMSEC must always decrease as the number of latent variables retained in the model increases.  However, because the RMSECV is determined from the cross-validation experiment, in which the test samples were ''not'' used to build the model that was used to test them, this value can actually ''increase'' as too many latent variables are added to the model. The optimal number of latent variables depends on the specific objectives of the modeling project, but is typically the number at which the addition of another latent variable does not greatly improve the performance of the model. In this specific case, one could choose either 5 or 8 latent variables based on this criterion. However, if the consequences of overfitting, and subsequent poor predictive model performance, are particularly high, then one could conduct several cross-validation procedures using different sets of cross-validation methods and parameters, to enable a more confident decision to be made.
 
An additional plot that is made available by including cross-validation in the Analysis GUI is the analog to the common "calibration curve" scatter plot of predicted versus actual Y values. In this case, of course, there are two possible sources of "predicted" Y values: 1) Y values predicted by a model that uses all of the available objects in the data set, and 2) Y values predicted by cross-validation.  Figure 6 shows an overlay of both of these plots, using an 8-latent-variable model built from the nir_data used in this example.
 
Figure 6. Overlay of predicted vs. actual Y value scatter plots, for calibration and cross-validation, using the nir_data discussed above.
 
==Command-Line==
 
The command line version of cross-validation is called using the function crossval. The general syntax for crossval is
 
results = crossval(x,y,rm,cvi,ncomp,options);
 
where x and y refer to the X block and Y block data (either in array or dataset object format), rm defines the regression or modeling method, cvi defines the cross-validation method, ncomp defines the maximum number of latent variables to use in cross validation, and options can be used to assign some optional parameters to the procedure. The input can be a cell array that specifies both the cross validation method and the relevant parameters for that method.  For example, cross-validation of the nir_data data set using the PLS modeling method (with the SIMPLS algorithm), and the contiguous block cross-validation method with 5 data splits and a maximum of 20 latent variables (as was specified in the GUI example discussed above, see Figure 2) would be done using the command:
 
results = crossval(spec1,conc(:,1),'sim',{'con' 5},20);
 
The output of the command-line cross-validation routine is then stored in a structured array called results, which contains the following sub-arrays:
 
        press: [5x20 double]
      cumpress: [1x20 double]
        rmsecv: [1x20 double]
        rmsec: [1x20 double]
        cvpred: [30x1x20 double]
    misclassed: {}
          reg: []
 
Details regarding the syntax and output of crossval can be obtained by typing help crossval at the Matlab prompt.
 
==Custom Cross-Validation==
 
In Solo and PLS_Toolbox, the user can also manually specify the subsets of samples to be used as tests sets in the cross-validation procedure, using the "Custom" cross-validation method. When using the GUI, if the "Custom" method in the Cross-Validation window (Figure 1) is chosen, then the user will be asked to specify a "custom cross-validation" vector that specifies the subsets of objects to be placed in each test subset for each sub-validation. This custom cross-validation vector must be a vector of integers with dimensionality ''1 x n'', where ''n'' is the total number of objects in the currently-loaded data set. The values within this vector must adhere to the following set of rules:
 
* A value of -2 indicates that the object is placed in ''every test set'' (never in a model-building set)
* A value of -1 indicates that the object is placed in ''every model-building set'' (never in a test set)
* A value of 0 indicates that the object is used for ''neither'' model-building nor model testing
* Values of 1,2,3… indicate the test set number for each object (for those objects that are used in the cross-validation procedure)
 
For example, for a data set containing 9 objects, a custom cross validation array of
 
:<tt>[-1 0 1 1 -2 2 2 0 -1]</tt>
 
will result in two sub-validation experiments: one using objects 3,4 and 5 in the test set and 1,6,7 and 9 in the model-building set, and one using objects 5,6 and 7 in the test set and 1,3,4 and 9 in the model-building set.
 
This method can be particularly useful in cases where the data sets have relatively few objects, or they are generated from a designed experiment. In such cases, it might be necessary to "force" some objects to always be either in the test set, in the model-building set, or ''out'' of the cross-validation procedure entirely, in order to avoid misleading results, as discussed below.
 
==Choosing the Cross-Validation Method==
 
There are many attributes that influence selection of the appropriate cross-validation scheme for a given situation.  These include the following:
 
* The ordering of the samples in the dataset,
* The total number of objects (and variables) in the dataset
* The presence (or lack thereof) of replicate samples in the dataset
* The specific objective(s) of the analysis,
* The consequences/costs of overly optimistic or overly pessimistic results, and
* The amount of time available to do cross-validation
 
Also, there are two "traps" that one generally wants to avoid when selecting a cross-validation scheme:
 
* The ''ill-conditioned trap'':  for a given sub-validation, the selected test set and remaining model set cover different data "spaces"- thus leading to overly pessimistic cross-validation results
* The ''replicate sample trap'': for a given sub-validation, replicates of the same physical sample are present in both the model-building set and the test set, thus leading to overly optimistic cross-validation results
 
With the above considerations in mind, several recommendations can be made about the appropriate cross-validation method and parameters for different generic situations. These recommendations are be made by cross-validation method below, and are summarized in Table 2, below.
 
===Venetian Blinds===
 
This method is simple and easy to implement, and generally safe to use if there are relatively many objects that are already in random order. For time-series data, it can be useful for estimating errors in the method from non-temporal sources, provided that a sufficient number of data splits are specified.  However, for blocked data with replicates, one must choose parameters very carefully to avoid the replicate sample trap.  Similarly, for time-series data, a low number of splits can lead to in overly optimistic results.
 
===Contiguous Blocks===
 
Like Venetian Blinds, this method is simple and easy to implement, and generally safe to use in cases where there are relative many objects in random order. For data that is ''not'' randomly ordered, one must choose parameters carefully to avoid overly pessimistic results from the ill-conditioned trap. For time-series data and batch data, this method can be convenient for assessing the temporal stability and batch-to-batch stability of a model built from the data.
 
===Random Subsets===
 
The random subset selection method is rather versatile, in that it can be used effectively in a wide range of situations, especially if one has the time to run multiple iterations of subset selections. As the number of splits is inversely proportional to the number of test samples per sub-validation, a higher number of splits can be used to avoid the ill-conditioned trap, and a lower number of splits can be used to avoid the replicate sample trap. However, a general disadvantage of this method is that the user has no control over the selection of test sample subsets for each sub-validation experiment, thus making it difficult to assess whether the results were adversely affected by either of these traps.  Fortunately, their effect can be reduced to some extent by increasing the number of iterations.
 
===Leave-One-Out===
 
This method is generally reserved for small data sets (n not greater than 20).  It might also be useful in the case of randomly distributed objects, if one has enough time or n is not too large.  However, it is not recommended even for small data sets if the objects are blocked with replicates or generated from a design of experiments (DOE), due to the replicate sample trap and ill-conditioned trap, respectively.
 
===Custom===
 
Although this method requires some extra time to determine how the object subsets are selected for each sub-validation experiment, this flexibility is critical in several cases, especially for small data sets and data generated from a design of experiments (DOE). In addition, in the case of batch data, this flexibility allows the user to specify different individual batches, or sets of batches, as test sets for each sub-validation experiment- thus enabling batch-wise model validation.
 
'''Table 2: Recommendations regarding different Cross-Validation methods, for different types of data sets.'''
 
{|
 
| ||'''Venetian Blinds'''||'''Contiguous Blocks'''||'''Random Subsets'''||'''Leave-One Out'''||'''Custom'''
 
|-
 
|'''General Properties'''
||
* Easy
* Relatively quick
||
* Easy
* Relatively quick
||
* Easy
* Can be slow, if n or number of iterations large
* Selection of subsets unknown
||
* Easiest! (Only one parameter)
* ''Avoid using if n>20''
||
* Flexible
* Requires time to determine/ construct cross validation array
|-
 
|'''Small data sets (<\~20 objects)'''
||
||
||
* OK, if many iterations done
||
* Good choice….
* ....unless designed/DOE data
||
* often needed to avoid the  ''ill-conditioned'' ''trap''
|-
 
|'''randomly-distributed objects'''
||
* Good choice
||
* Good choice
||
* Good choice
* Can take a while with large n, many iterations
||
* OK, but….
* Can take a while with large n
||
 
|-
 
|'''time-series data'''
||
* Useful for assessing  ''NON-temporal'' model errors
 
* Can be optimistic with low number of data splits
||
* Useful for assessing ''temporal ''stability of model
||
 
||
||
 
|-
 
|'''Batch data'''
||
* Useful for assessing predictability ''within'' batches/parts of batches
||
* Useful for assessing predictability ''between'' batches/parts of batches
||
 
||
 
||
* Can manually select "batch-wise" test sets
 
|-
 
|'''Blocked data (replicates)'''
||
* Beware  the ''replicate sample trap'' (optimistic results)!
||
* Good way to avoid ''replicate sample trap''
* Beware the ''ill-conditioned trap''!
||
* Can use to avoid ''replicate sample trap'' (high number of splits, high iterations preferable)
||
* overly optimistic results, due to ''replicate sample trap''
||
 
|-
 
|'''Designed Experiment (DOE) data'''
||
* Dangerous, unless object order is randomized
||
* Dangerous, unless object order is randomized
||
||
* Not recommended (''ill-conditioned trap'')
||
* often needed to avoid the ''ill-conditioned'' ''trap''
|-
 
|}

Revision as of 00:37, 23 September 2013

Purpose

Finds A minimizing ||X-A*B'|| subject to constraints, given the small matrices (X ' B) and (B ' B)

Synopsis

[A,diagnostics]=constrainfit(XB,BtB,Aold,options);  % Constrained
[A,diagnostics]=constrainfit(XB,BtB,Aold);  % Unconstrained

Description

CONSTRAINFIT solves the least squares problem behind bilinear, trilinear and other multilinear models. Assuming a model X = A*B ' and assuming that X and B are known, the least squares estimate of A is obtained. Rather than using X and B this algorithm uses the cross product matrices (X ' B) and (B ' B) which are generally smaller and less memory-demanding especially in multi-way models.

CONSTRAINFIT can do a number of general types of regression problems such as nonnegativity-constrained regression, regression with column-orthogonality of A etc. These constraints are simply set in the option field 'type', e.g. option.type='nonnegativity'. Thus, for most problems, only the 'type' field needs to be set. CONSTRAINFIT will provide a least squares solution to most of these problems.

CONSTRAINFIT can also find A subject to different constraints on different columns. In this case, the update of A will be an improvement of the initially provided estimate Aold though not necessarilly the least squares solution. As CONSTRAINFIT is used inside iterative algorithms, an improvement is sufficient to guarantee overall convergence.

Inputs

  • XB = The matrix X ' B.
  • BtB = The matrix B ' B.
  • Aold = An initial estimate of A.

Optional Inputs

  • options = provides definitions for which type of constraint to impose.

Outputs

  • A = The improved estimate of A.

Options

options = a structure array with the following fields:

  • type: [ {'unconstrained'} | 'nonnegativity' | 'unimodality' | 'orthogonality' | 'columnorthogonal' | 'equality' | 'exponential' | 'rightprod' | 'columnwise']
provides quick access to most important settings
'unconstrained' - do unconstrained fit of A
'nonnegativity' - A is all nonnegative
'unimodality' - A has unimodal columns and is nonnegative
'orthogonality' - A is orthogonal (A ' A = I)
'columnorthogonal'- A has orthogonal columns (A ' A = diagonal)
'equality' - columns in A are subject to equality constraints. Useful for e.g. imposing closure (see settings under options.equality below)
'exponential' - Columns are mono-exponentials
'rightprod' - Fitting A subject to being of the form F*D, where D is predefined (must be set in options.advanced.linearconstraints.matrix). if imposed then columnwise constraints (see below) are applied to the columns of F rather than A. Hence options.columnconstraints must be set appropriately.
'L1 penalty' - A is estimated using a constraint that A should be sparse - (the higher options.L1.lambda, the sparser A will be). Note: A is also constrained to be nonnegative.
'columnwise' - A has other constraints than the above. These have to be defined in options.columnconstraints (see below).
  • columnconstraints: cell where element f defines constraints on column f (only applicable if options.type = 'columnwise').
columnconstraints is a cell vector {f1,f2,f3, ... fF}. Each element f1, f2, etc. corresponds to one column of A. f1 defines constraints on the first column of A etc. Each constraint on a column is defined by a number. For example if f1 is 2, then nonnegativity is imposed on the first column (see definitions below). If f1 = [2 4], then first nonnegativity is imposed and then smoothness. The following constraints are available on individual columns
a = 0 : Unconstrained
a = 1 : Nonnegativity
a = 2 : Unimodality
a = 3 : Inequality (every element >= scalar). Scalar has to be in options.inequality.scalar. This is a vector of size F, one scalar for each factor
a = 4 : Smoothness. options.smoothness.operator can be used to hold operator (for speeding up. Won't have to be estimated each time. options.smoothness.alpha (0<alpha<1). Setting to zero means no smoothness while setting to 1 means high degree of smoothness.
a = 5 : Fixed elements. The elements that are fixed are defined in options.fixed.
a = 6 : Not applicable
a = 7 : Approximate unimodality. See options.unimodality.weight
a = 8 : Normalize the loading vectors to norm one
a = 20: Functional constraint. Using simple pre- or userdefined functions, any functional constraint can be imposed on individual columns. For example, that one column is exponential. Functional constraints require that a function is written (type HELP FITGAUSS for an example).
Example: Fitting the second loading vector as being Gaussian:
    NumberFactors=3;
    options.functional=cell(NumberFactors,1);
    ToFix = 2; % This constraint is for the second column
    options.functional{ToFix}.functionhandle = @fitgauss;
    % Define starting parameters
    center = 100;width = 100;height = .1;
    options.functional{ToFix}.parameters = [center width height];
    options.functional{ToFix}.additional=[]; % no additional input

When a column has more than one constraint these are generally imposed sequentially starting with the first one in options.columnconstraint. For most constraints, the order of constraints will not be important. Advise is to input constraints with smaller numbers first.

  • inequality : Defines a cutoff. If inequality is defined in columnwise constraints, all elements of that column will be > options.inequality.scalar. Thus, when set to zero, nonnegativity is imposed.
  • nonnegativity: defines which algorithm to use for imposing nonnegativity when options.type = 'nonnegativity'. If set to 0, the default NNLS algorithm is used. If set to 1, a faster columnwise update is used which only improves the current least squares fit, if set to 2, an ad hoc approach is used where A is estimated in a least squares sense and then negative numbers are set to zero. This will not provide a well-defined solution in terms of the least squares loss function. If set to 3, the NMF algorithm is used. This requires that all elements of the data array are nonnegative in order to work properly.
  • smoothness: defines how much smoothness is imposed when smoothness is imposed as a columnconstraint. smoothness.alpha is a number between 0 (no smoothness) and 1 (full smoothness)
  • fixed: options.fixed.values is a matrix of the same size as the loading matrix (A) with the actual numbers to be fixed in the positions corresponding to the position in A. The remaining positions must be NaN. The degree to which elements are fixed is set in options.fixed.weight (0<weight<1). Zero means not imposed whereas one means completely fixed.
  • advanced: In the field advanced.linearconstraints, settings for options.type = 'rightprod' are set. If A is IxF, then linearconstraint.matrix must be and SxF matrix D'. A is then found as F*D. E.g. if A has three columns, set the predefined matrix D = [1 1 0; 0 0 1], then the first and second of the three columns in A will be identical (F*D where F is to be estimated).
  • equality: Settings for using options.type = 'equality'. Two fields are held in equality, C and d. When imposed, CONSTRAINFIT solves for loading matrix A subject to A(i,:)*C' = d for all i. Hence if you want to impose closure and have three factors, set C=[1 1 1] and d=1.
  • unimodality: Set weight in options.unimodality.weight. weight==1: exact unimodality. weight==0: no unimodality
  • functional: For functional constrains (see above)

Examples

Setting global constraints on A

 opt = constrainfit('options');
 opt.type='nonnegativity';
 [A]=constrainfit(XB,BtB,Aold,opt); % Nonnegative

Setting constraints on just one column of A

 opt = constrainfit('options');
 opt.type='columnwise';
 opt.columnconstraints={0;2;0}; % If three columns
 [A]=constrainfit(XB,BtB,Aold,opt); % Second column unimodal
% Make a noisy dataset such that PARAFAC gives noisy loadings
load aminoacids
x = X.data;
x = x+randn(size(x))*100;

% define parafac options
op=parafac('options');

% set constraints in second mode to be defined columnwise
op.constraints{2}.type='columnwise';

% Define that first column is smooth, second and third unconstrained
op.constraints{2}.columnconstraints={4 0 0};

% Fit model
model = parafac(x,3,op);

Note how the first loading in the second mode is more smooth than the rest. If needed smoothness can be turned up (to one) and down (to zero) using op.constraints{2}.smoothness.alpha=0.6

See Also

parafac