Simpls: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
imported>Benjamin
mNo edit summary
 
(3 intermediate revisions by one other user not shown)
Line 23: Line 23:
====Outputs====
====Outputs====


* '''reg''' = matrix of regression vectors,
* '''reg''' = matrix of regression vectors where each row corresponds to a regression vector for a given number of latent variables. If the Y-block contains multiple columns, the rows of '''reg''' will be in groups of latent variables (so that the regression vectors for all columns of Y at 1 latent variable will come first, followed by the regression vectors for all columns of Y at 2 latent variables, etc)
* '''ssq''' = the sum of squares captured (ssq),
::<math>\begin{bmatrix}{b_{y1,1}}\\ {b_{y2,1}}\\ {b_{y1,2}}\\ {b_{y2,2}}\\ {b_{y1,3}}\\ {b_{y2,3}}\end{bmatrix}</math>
* '''xlds''' = X-block loadings,
:where b<sub>yn,k</sub> is the regression vector for column "n" of the Y-block calculated from "k" latent variables.
* '''ylds''' = Y-block loadings,
* '''ssq''' = the sum of squares captured (ssq) with the columns:
* '''wts''' = X-block weights,
::Column 1 = Number of latent variables (LVs)
* '''xscrs''' = X-block scores,
::Column 2 = Variance captured (as a percent) in the X-block by this LV
* '''yscrs''' = Y-block scores, and
::Column 3 = Total variance captured (%) by all LVs up to this row
* '''basis''' = the basis of X-block loadings.
::Column 4 = Variance captured (as a percent) in the X-block by this LV
::Column 5 = Total variance captured (%) by all LVs up to this row
* '''xlds''' = X-block loadings (size: x-block columns by LVs),
* '''ylds''' = Y-block loadings (size: y-block columns by LVs),
* '''wts''' = X-block weights (size: x-block columns by LVs),
* '''xscrs''' = X-block scores (size: samples by LVs),
* '''yscrs''' = Y-block scores (size: samples by LVs),
* '''basis''' = the basis of X-block loadings (size: x-block columns by LVs).


'''NOTE:''' The regression matrices are ordered in reg such that each ''Ny'' (number of Y-block variables) rows correspond to the regression matrix for that particular number of latent variables.
'''NOTE:''' in previous versions of SIMPLS, the X-block scores were unit length and the X-block loadings contained the variance. As of Version 3.0, this algorithm now uses standard convention in which the X-block scores contain the variance.


'''NOTE:''' in previous versions of SIMPLS, the X-block scores were unit length and the X-block loadings contained the variance. As of Version 3.0, this algorithm now uses standard convention in which the X-block scores contain the variance.
 
The calculations for Variance Captured is shown here:
 
::Xlds = ((X*wts)’X)’
::Ylds = Y’YX*wts
::ssqX = ΣΣ(X.^2)
::ssqY = ΣΣ(Y.^2)
 
::VarX = diag(Xlds’*Xlds)/ssqX
::VarY = diag(Ylds’*Ylds)/ssqY


===Options===
===Options===
Line 49: Line 65:
===See Also===
===See Also===


[[crossval]], [[modelstruct]], [[pcr]], [[plsnipal]], [[preprocess]], [[analysis]]
[[crossval]], [[modelstruct]], [[pcr]], [[pls]], [[preprocess]], [[nippls]], [[analysis]]

Latest revision as of 13:14, 23 September 2016

Purpose

Partial Least Squares regression using the SIMPLS algorithm.

Synopsis

[reg,ssq,xlds,ylds,wts,xscrs,yscrs,basis] = simpls(x,y,ncomp,options)

Description

SIMPLS performs PLS regression using SIMPLS algorithm.

Inputs

  • x = X-block (predictor block) class "double" or "dataset", and
  • y = Y-block (predicted block) class "double" or "dataset".

Optional Inputs

  • ncomp = integer, number of latent variables to use in {default = rank of X-block}, and
  • options = a structure array discussed below.

Outputs

  • reg = matrix of regression vectors where each row corresponds to a regression vector for a given number of latent variables. If the Y-block contains multiple columns, the rows of reg will be in groups of latent variables (so that the regression vectors for all columns of Y at 1 latent variable will come first, followed by the regression vectors for all columns of Y at 2 latent variables, etc)
where byn,k is the regression vector for column "n" of the Y-block calculated from "k" latent variables.
  • ssq = the sum of squares captured (ssq) with the columns:
Column 1 = Number of latent variables (LVs)
Column 2 = Variance captured (as a percent) in the X-block by this LV
Column 3 = Total variance captured (%) by all LVs up to this row
Column 4 = Variance captured (as a percent) in the X-block by this LV
Column 5 = Total variance captured (%) by all LVs up to this row
  • xlds = X-block loadings (size: x-block columns by LVs),
  • ylds = Y-block loadings (size: y-block columns by LVs),
  • wts = X-block weights (size: x-block columns by LVs),
  • xscrs = X-block scores (size: samples by LVs),
  • yscrs = Y-block scores (size: samples by LVs),
  • basis = the basis of X-block loadings (size: x-block columns by LVs).

NOTE: in previous versions of SIMPLS, the X-block scores were unit length and the X-block loadings contained the variance. As of Version 3.0, this algorithm now uses standard convention in which the X-block scores contain the variance.


The calculations for Variance Captured is shown here:

Xlds = ((X*wts)’X)’
Ylds = Y’YX*wts
ssqX = ΣΣ(X.^2)
ssqY = ΣΣ(Y.^2)
VarX = diag(Xlds’*Xlds)/ssqX
VarY = diag(Ylds’*Ylds)/ssqY

Options

options = a structure array with the following fields:

  • display: [ {'on'} | 'off' ], governs level of display, and
  • ranktest: [ 'none' | 'data' | 'scores' | {'auto'} ], governs type of rank test to perform.
'data' = single test on X-block (faster with smaller data blocks and more components),
'scores' = test during regression on scores matrix (faster with larger data matricies),
'auto' = automatic selection, or
'none' = assumes X-block has sufficient rank.

See Also

crossval, modelstruct, pcr, pls, preprocess, nippls, analysis