Simpls: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Importing text file)
 
imported>Benjamin
mNo edit summary
 
(8 intermediate revisions by one other user not shown)
Line 1: Line 1:
===Purpose===
===Purpose===
Partial Least Squares regression using the SIMPLS algorithm.
Partial Least Squares regression using the SIMPLS algorithm.
===Synopsis===
===Synopsis===
:[reg,ssq,xlds,ylds,wts,xscrs,yscrs,basis] = simpls(x,y,''ncomp,options'')
:[reg,ssq,xlds,ylds,wts,xscrs,yscrs,basis] = simpls(x,y,''ncomp,options'')
:options = simpls('options');.
 
===Description===
===Description===
SIMPLS performs PLS regression using SIMPLS algorithm.
SIMPLS performs PLS regression using SIMPLS algorithm.
INPUTS:
 
* x = X-block (predictor block) class "double" or "dataset", and
====Inputs====
* y = Y-block (predicted block) class "double" or "dataset".
 
OPIONAL INPUTS:
* '''x''' = X-block (predictor block) class "double" or "dataset", and
* ''ncomp'' = integer, number of latent variables to use in {default = rank of X-block}, and
* '''y''' = Y-block (predicted block) class "double" or "dataset".
* ''options'' =  a structure array discussed below.
 
OUPUTS:
====Optional Inputs====
* reg = matrix of regression vectors,
 
* ssq = the sum of squares captured (ssq),
* '''''ncomp''''' = integer, number of latent variables to use in {default = rank of X-block}, and
* xlds = X-block loadings,
* '''options''' =  a structure array discussed below.
* ylds = Y-block loadings,
 
* wts = X-block weights,
====Outputs====
* xscrs = X-block scores,
 
* yscrs = Y-block scores, and
* '''reg''' = matrix of regression vectors where each row corresponds to a regression vector for a given number of latent variables. If the Y-block contains multiple columns, the rows of '''reg''' will be in groups of latent variables (so that the regression vectors for all columns of Y at 1 latent variable will come first, followed by the regression vectors for all columns of Y at 2 latent variables, etc)
* basis = the basis of X-block loadings.
::<math>\begin{bmatrix}{b_{y1,1}}\\ {b_{y2,1}}\\ {b_{y1,2}}\\ {b_{y2,2}}\\ {b_{y1,3}}\\ {b_{y2,3}}\end{bmatrix}</math>
Note: The regression matrices are ordered in reg such that each ''Ny'' (number of Y-block variables) rows correspond to the regression matrix for that particular number of latent variables.
:where b<sub>yn,k</sub> is the regression vector for column "n" of the Y-block calculated from "k" latent variables.
NOTE: in previous versions of SIMPLS, the X-block scores were unit length and the X-block loadings contained the variance. As of Version 3.0, this algorithm now uses standard convention in which the X-block scores contain the variance.
* '''ssq''' = the sum of squares captured (ssq) with the columns:
::Column 1 = Number of latent variables (LVs)
::Column 2 = Variance captured (as a percent) in the X-block by this LV
::Column 3 = Total variance captured (%) by all LVs up to this row
::Column 4 = Variance captured (as a percent) in the X-block by this LV
::Column 5 = Total variance captured (%) by all LVs up to this row
* '''xlds''' = X-block loadings (size: x-block columns by LVs),
* '''ylds''' = Y-block loadings (size: y-block columns by LVs),
* '''wts''' = X-block weights (size: x-block columns by LVs),
* '''xscrs''' = X-block scores (size: samples by LVs),
* '''yscrs''' = Y-block scores (size: samples by LVs),
* '''basis''' = the basis of X-block loadings (size: x-block columns by LVs).
 
'''NOTE:''' in previous versions of SIMPLS, the X-block scores were unit length and the X-block loadings contained the variance. As of Version 3.0, this algorithm now uses standard convention in which the X-block scores contain the variance.
 
 
The calculations for Variance Captured is shown here:
 
::Xlds = ((X*wts)’X)’
::Ylds = Y’YX*wts
::ssqX = ΣΣ(X.^2)
::ssqY = ΣΣ(Y.^2)
 
::VarX = diag(Xlds’*Xlds)/ssqX
::VarY = diag(Ylds’*Ylds)/ssqY
 
===Options===
===Options===
* ''options'' =  a structure array with the following fields:
 
* display: [ {'on'} | 'off' ], governs level of display, and
''options'' =  a structure array with the following fields:
* ranktest: [ 'none' | 'data' | 'scores' | {'auto'} ], governs type of rank test to perform.
 
* 'data' = single test on X-block (faster with smaller data blocks and more components),
* '''display''': [ {'on'} | 'off' ], governs level of display, and
* 'scores' = test during regression on scores matrix (faster with larger data matricies),
* '''ranktest''': [ 'none' | 'data' | 'scores' | {'auto'} ], governs type of rank test to perform.
* 'auto' = automatic selection, or
:: ''''data'''' = single test on X-block (faster with smaller data blocks and more components),
* 'none' = assumes X-block has sufficient rank.
:: ''''scores'''' = test during regression on scores matrix (faster with larger data matricies),
The default options can be retreived using: options = simpls('options');.
:: ''''auto'''' = automatic selection, or
:: ''''none'''' = assumes X-block has sufficient rank.
 
===See Also===
===See Also===
[[crossval]], [[modelstruct]], [[pcr]], [[plsnipal]], [[preprocess]], [[analysis]]
 
[[crossval]], [[modelstruct]], [[pcr]], [[pls]], [[preprocess]], [[nippls]], [[analysis]]

Latest revision as of 14:14, 23 September 2016

Purpose

Partial Least Squares regression using the SIMPLS algorithm.

Synopsis

[reg,ssq,xlds,ylds,wts,xscrs,yscrs,basis] = simpls(x,y,ncomp,options)

Description

SIMPLS performs PLS regression using SIMPLS algorithm.

Inputs

  • x = X-block (predictor block) class "double" or "dataset", and
  • y = Y-block (predicted block) class "double" or "dataset".

Optional Inputs

  • ncomp = integer, number of latent variables to use in {default = rank of X-block}, and
  • options = a structure array discussed below.

Outputs

  • reg = matrix of regression vectors where each row corresponds to a regression vector for a given number of latent variables. If the Y-block contains multiple columns, the rows of reg will be in groups of latent variables (so that the regression vectors for all columns of Y at 1 latent variable will come first, followed by the regression vectors for all columns of Y at 2 latent variables, etc)
where byn,k is the regression vector for column "n" of the Y-block calculated from "k" latent variables.
  • ssq = the sum of squares captured (ssq) with the columns:
Column 1 = Number of latent variables (LVs)
Column 2 = Variance captured (as a percent) in the X-block by this LV
Column 3 = Total variance captured (%) by all LVs up to this row
Column 4 = Variance captured (as a percent) in the X-block by this LV
Column 5 = Total variance captured (%) by all LVs up to this row
  • xlds = X-block loadings (size: x-block columns by LVs),
  • ylds = Y-block loadings (size: y-block columns by LVs),
  • wts = X-block weights (size: x-block columns by LVs),
  • xscrs = X-block scores (size: samples by LVs),
  • yscrs = Y-block scores (size: samples by LVs),
  • basis = the basis of X-block loadings (size: x-block columns by LVs).

NOTE: in previous versions of SIMPLS, the X-block scores were unit length and the X-block loadings contained the variance. As of Version 3.0, this algorithm now uses standard convention in which the X-block scores contain the variance.


The calculations for Variance Captured is shown here:

Xlds = ((X*wts)’X)’
Ylds = Y’YX*wts
ssqX = ΣΣ(X.^2)
ssqY = ΣΣ(Y.^2)
VarX = diag(Xlds’*Xlds)/ssqX
VarY = diag(Ylds’*Ylds)/ssqY

Options

options = a structure array with the following fields:

  • display: [ {'on'} | 'off' ], governs level of display, and
  • ranktest: [ 'none' | 'data' | 'scores' | {'auto'} ], governs type of rank test to perform.
'data' = single test on X-block (faster with smaller data blocks and more components),
'scores' = test during regression on scores matrix (faster with larger data matricies),
'auto' = automatic selection, or
'none' = assumes X-block has sufficient rank.

See Also

crossval, modelstruct, pcr, pls, preprocess, nippls, analysis