Auto: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Importing text file)
imported>Jeremy
(Importing text file)
Line 1: Line 1:
===Purpose===
===Purpose===
Autoscales a matrix to mean zero and unit variance.
Autoscales a matrix to mean zero and unit variance.
===Synopsis===
===Synopsis===
:[ax,mx,stdx,msg] = auto(x,''options'')
:[ax,mx,stdx,msg] = auto(x,''options'')
:[ax,mx,stdx,msg] = auto(x,''offset'')
:[ax,mx,stdx,msg] = auto(x,''offset'')
:options = auto('options')
:options = auto('options')
===Description===
===Description===
[ax,mx,stdx] = auto(x) autoscales a matrix x and returns the resulting matrix ax with mean-zero unit variance columns, a vector of means mx and a vector of standard deviations stdx used in the scaling. Output msg returns any warning messages. If missing data NaNs are found, the available data is autoscaled if the fraction missing is not above the thresholds specified below. mx and stdx can be used to scale new data (see SCALE).
[ax,mx,stdx] = auto(x) autoscales a matrix x and returns the resulting matrix ax with mean-zero unit variance columns, a vector of means mx and a vector of standard deviations stdx used in the scaling. Output msg returns any warning messages. If missing data NaNs are found, the available data is autoscaled if the fraction missing is not above the thresholds specified below. mx and stdx can be used to scale new data (see SCALE).
===Options===
===Options===
* '''''options''''' =  a structure array with the following fields:
* '''''options''''' =  a structure array with the following fields:
* '''offset''': scaling can use standard deviation plus an offset {default = 0},
* '''offset''': scaling can use standard deviation plus an offset {default = 0},
* '''display''': [ {'off'}| 'on' ] governs level of display to the command window,
* '''display''': [ {'off'}| 'on' ] governs level of display to the command window,
* '''matrix_threshold''': fraction of missing data allowed based on entire matrix (x) {default = 0.15}, and
* '''matrix_threshold''': fraction of missing data allowed based on entire matrix (x) {default = 0.15}, and
*'''column_threshold''': fraction of missing data allowed base on a single column {default = 0.25}.
*'''column_threshold''': fraction of missing data allowed base on a single column {default = 0.25}.
* '''algorithm''': [ {'standard'} | 'robust'] scaling algorithm. 'robust' uses MADC for scaling and median instead of mean. Should be used for robust techniques,
* '''algorithm''': [ {'standard'} | 'robust'] scaling algorithm. 'robust' uses MADC for scaling and median instead of mean. Should be used for robust techniques,
* '''stdthreshold''': [ 0 ] scalar or vector of standard deviation threshold values. If a standard deviation is below its corresponding threshold value, the threshold value will be used in lieu of the actual value. Note that the actual standard deviation is always returned, whether or not it exceedes the threshold. A scalar value is used as a threshold for all variables,
* '''stdthreshold''': [ 0 ] scalar or vector of standard deviation threshold values. If a standard deviation is below its corresponding threshold value, the threshold value will be used in lieu of the actual value. Note that the actual standard deviation is always returned, whether or not it exceedes the threshold. A scalar value is used as a threshold for all variables,
* '''badreplacement''': [0] value to use in place of standard deviation values of 0 (zero). Typical values used with the following effects:  
* '''badreplacement''': [0] value to use in place of standard deviation values of 0 (zero). Typical values used with the following effects:  
*    '''0''' = Any value in given variable is set to zero. Variable is effectively excluded (but still expected by model). This is also the behavior when badreplacement = inf.
*    '''0''' = Any value in given variable is set to zero. Variable is effectively excluded (but still expected by model). This is also the behavior when badreplacement = inf.
*  '''1''' = Values different from mean of the given variable are flagged in Q residuals with no reweighting.
*  '''1''' = Values different from mean of the given variable are flagged in Q residuals with no reweighting.
*  '''Values''' >0 and <inf give the variable different weighting in the Q residuals (values >1 down-weight the bad variables for Q residual calculations, values <1 up-weight the bad variables.).
*  '''Values''' >0 and <inf give the variable different weighting in the Q residuals (values >1 down-weight the bad variables for Q residual calculations, values <1 up-weight the bad variables.).
If the input (offset) is a scalar then, this is used as the offset value with other options set at their default values.
If the input (offset) is a scalar then, this is used as the offset value with other options set at their default values.
The optional input ''offset'' is added to the standard deviations before scaling and can be used to suppress low-level variables that would otherwise have standard deviations near zero.
The optional input ''offset'' is added to the standard deviations before scaling and can be used to suppress low-level variables that would otherwise have standard deviations near zero.
The default options can be retreived using: options = auto('options');.
The default options can be retreived using: options = auto('options');.
===See Also===
===See Also===
[[gscale]], [[medcn]], [[mncn]], [[normaliz]], [[npreprocess]], [[regcon]], [[rescale]], [[scale]], [[snv    ]]
[[gscale]], [[medcn]], [[mncn]], [[normaliz]], [[npreprocess]], [[regcon]], [[rescale]], [[scale]], [[snv    ]]

Revision as of 14:24, 3 September 2008

Purpose

Autoscales a matrix to mean zero and unit variance.

Synopsis

[ax,mx,stdx,msg] = auto(x,options)
[ax,mx,stdx,msg] = auto(x,offset)
options = auto('options')

Description

[ax,mx,stdx] = auto(x) autoscales a matrix x and returns the resulting matrix ax with mean-zero unit variance columns, a vector of means mx and a vector of standard deviations stdx used in the scaling. Output msg returns any warning messages. If missing data NaNs are found, the available data is autoscaled if the fraction missing is not above the thresholds specified below. mx and stdx can be used to scale new data (see SCALE).

Options

  • options = a structure array with the following fields:
  • offset: scaling can use standard deviation plus an offset {default = 0},
  • display: [ {'off'}| 'on' ] governs level of display to the command window,
  • matrix_threshold: fraction of missing data allowed based on entire matrix (x) {default = 0.15}, and
  • column_threshold: fraction of missing data allowed base on a single column {default = 0.25}.
  • algorithm: [ {'standard'} | 'robust'] scaling algorithm. 'robust' uses MADC for scaling and median instead of mean. Should be used for robust techniques,
  • stdthreshold: [ 0 ] scalar or vector of standard deviation threshold values. If a standard deviation is below its corresponding threshold value, the threshold value will be used in lieu of the actual value. Note that the actual standard deviation is always returned, whether or not it exceedes the threshold. A scalar value is used as a threshold for all variables,
  • badreplacement: [0] value to use in place of standard deviation values of 0 (zero). Typical values used with the following effects:
  • 0 = Any value in given variable is set to zero. Variable is effectively excluded (but still expected by model). This is also the behavior when badreplacement = inf.
  • 1 = Values different from mean of the given variable are flagged in Q residuals with no reweighting.
  • Values >0 and <inf give the variable different weighting in the Q residuals (values >1 down-weight the bad variables for Q residual calculations, values <1 up-weight the bad variables.).

If the input (offset) is a scalar then, this is used as the offset value with other options set at their default values.

The optional input offset is added to the standard deviations before scaling and can be used to suppress low-level variables that would otherwise have standard deviations near zero.

The default options can be retreived using: options = auto('options');.

See Also

gscale, medcn, mncn, normaliz, npreprocess, regcon, rescale, scale, snv