Reducennsamples and Release Notes Version 8 5: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Donal
 
imported>Lyle
No edit summary
 
Line 1: Line 1:
===Purpose===


Selects a subset of samples by removing nearest neighbors.
==Version 8.5==
Version 8.5 of PLS_Toolbox and Solo was released in September, 2017.


===Synopsis===
For general product information, see [http://www.eigenvector.com/software/pls_toolbox.htm PLS_Toolbox Product Page]. For information on Solo, see [http://www.eigenvector.com/software/solo.htm Solo Product Page].


:[sc,incl] = reducennsamples(model,minsamples,options);
(back to [[Release Notes PLS Toolbox and Solo]])
:[sc,incl] = reducennsamples(model,newdata,minsamples,options);


===Description===
==Overview==


Select a subset of samples by removing nearest neighbors.
* The primary focus of version 8.5 has been in Calibration Transfer and additional importers.
Performs a selection of samples which fill out the multivariate space by
removing ("thinning out") samples which are similar to each other based
on nearest neighbor distance. This algorithm is useful in selecting the
minimum number of samples needed to define a subspace and reduce the
number of reference measurements needed, or amount of data needed to be
stored.


Initially, the nearest neighbor of each sample is found along with the
* PLS_Toolbox/Solo 8.5 has been tested extensively with Matlab 2017b prerelease and should be compatible with the coming release.
distance between the neighbors. Of the two nearest samples, one is
excluded from the data and the distances are recalculated. This process
is repeated until either the smallest distance between samples reaches a
maximum limit, or the number of samples reaches a lower limit.


Source of data can be either a factor-based model (PCA, PLS, PCR, etc)
* Simplified interface design changes. Various controls have been moved to simplify interfaces. Seldom used settings have been moved to options in several cases (e.g., simplified [[Svm|SVM]] panel).
which contains scores for all samples, or a raw data matrix or DataSet,
in which case distances will be calculated in raw variable space.


Algorithm is based on work published in:
* The Analysis window's confusion matrix and table toolbar icon now returns classification results for both [http://wiki.eigenvector.com/index.php?title=Sample_Classification_Predictions#Class_Pred_Most_Probable 'most probable'] and [http://wiki.eigenvector.com/index.php?title=Sample_Classification_Predictions#Class_Pred_Strict  'strict'] classifications. In the 'strict' case it also lists the 'strictthreshold' value used.
: J.S. Shenk, M.O. Westerhaus, Crop Sci., 1991, 31, 469,
: J.S. Shenk, M.O. Westerhaus, Crop Sci., 1991, 31, 1548.


===Inputs===
==Calibration Transfer==
* [[MCCTTool | New Model-centric Calibration Transfer]] - Interface to develop instrument specific models with calibration transfer.


* '''x''' =  Standard model structure OR double OR DataSet object containing data to select from,
* [[nlstd| NLSTD]] - Create or apply non-linear instrument transfer models (PLS_Toolbox only).
* '''newdata''': Additional data which should be considered for addition to the data provided by model input. When provided, all ''model'' samples are used and ''newdata'' is examined for samples to fill in empty regions of the ''model'' space. Under these conditions, minsamples, is considered the number of additional samples to be selected above the number included in model (see minsamples below),
* '''minsamples''': Minimum number of samples to retain. Sample thinning stops when the number of retained samples reaches this value. If omitted, 4 times the number of factors in the model or 1/2 the number of samples (whichever is smaller) is used.
* '''''options''''' is a structure array with fields described below:


===Outputs===
* [[sstcal| SST]] - Spectral Subspace Transformation calibration transfer.


* '''sc''' =  DataSet object containing either the scores (if a model was supplied) or the data supplied. Samples selected are included. Thinned samples are excluded.
* [[Demonstration_Datasets|Corn DSO]] - New calibration transfer demonstration data set added. 80 samples of corn measured on 3 different NIR spectrometers with moisture, oil, protein and starch values for each of the samples is also included.
* '''incl''' =  Indices of retained samples (samples not thinned as redundant).


===Options===
==Importers==


'''''options''''' is a structure array with the following fields:
* [[shimadzueemreadr]] - Imports Shimadzu EEM formatted text files.
* '''maxdistance''': [inf] Maximum allowed closest distance between samples. Sample thinning stops if the two closest samples are further away than this value. If "inf", thinning occurs until the number of samples given in minsamples is reached. If empty, the nearest distances are calculatd for the initial set and 1/2 of the maximum observed distance is used,
* [[visionairxmlreadr]] - Imports Vision Air formatted XML files (X- & Y-Blocks).
* '''maxsamples''': [5000] Maximum number of samples which can be passed for down-sampling. More than this number will throw an error,
* [[pltreadr]] - Imports Vision Air model files (.plt).
* '''mustuse''': [] Indicies of samples which must be used,
* '''waitbar''': [ 'no' | {'yes'} ] indicates whether a waitbar can be shown.


===See Also===
* [[aqualogreadr]] - Improved capability of loading multiple files containing samples of varying sizes (PLS_Toolbox only).
[[distslct]] [[doptimal]] [knnscoredistance]] [[stdgen]] [[stdsslct]]
 
* [[jascoeemreadr]] - Improved capability of loading multiple files containing samples of varying sizes (PLS_Toolbox only).
 
* [[hitachieemreadr]] - Improved capability of loading multiple files containing samples of varying sizes (PLS_Toolbox only).
 
* [[rawread|RAWREAD]] - Added support for new version format (3 & 4) (See MIA_Toolbox).
 
* [[spgreadr|SPGREADR]] Added new feature, options.spectrumindex now can be an integer, an array of integers (indices, & order doesn't matter), or 'all'. When loading multiple files, options.spectrumindex must be either a single value or 'all'.
 
==Exporters==
 
* [[writeplt]] - Exports EVRI Model structures to Vision Air PLT files.
 
==Other Features and Improvements==
* [[Cooksd| Cook's Distance]] - Calculates Cooks Distance for samples in a regression model.
 
* [[rpls| RPLS]] - A recursive PLS and PCR variable selection algorithm.
 
* [[manhattandist| MANHATTANDIST]] - Calculate Manhattan Distance between rows of a matrix.
 
* [[Confusionmatrix]] and [[Confusiontable]] - Classification results formatting options added. Can now specify use of mostprobable or strict classification rule.
* [[calcdifference]] - Calculate difference between two datasets.
* Added GLS Weighting to model optimizer.

Revision as of 09:35, 7 February 2019

Version 8.5

Version 8.5 of PLS_Toolbox and Solo was released in September, 2017.

For general product information, see PLS_Toolbox Product Page. For information on Solo, see Solo Product Page.

(back to Release Notes PLS Toolbox and Solo)

Overview

  • The primary focus of version 8.5 has been in Calibration Transfer and additional importers.
  • PLS_Toolbox/Solo 8.5 has been tested extensively with Matlab 2017b prerelease and should be compatible with the coming release.
  • Simplified interface design changes. Various controls have been moved to simplify interfaces. Seldom used settings have been moved to options in several cases (e.g., simplified SVM panel).
  • The Analysis window's confusion matrix and table toolbar icon now returns classification results for both 'most probable' and 'strict' classifications. In the 'strict' case it also lists the 'strictthreshold' value used.

Calibration Transfer

  • NLSTD - Create or apply non-linear instrument transfer models (PLS_Toolbox only).
  • SST - Spectral Subspace Transformation calibration transfer.
  • Corn DSO - New calibration transfer demonstration data set added. 80 samples of corn measured on 3 different NIR spectrometers with moisture, oil, protein and starch values for each of the samples is also included.

Importers

  • aqualogreadr - Improved capability of loading multiple files containing samples of varying sizes (PLS_Toolbox only).
  • jascoeemreadr - Improved capability of loading multiple files containing samples of varying sizes (PLS_Toolbox only).
  • hitachieemreadr - Improved capability of loading multiple files containing samples of varying sizes (PLS_Toolbox only).
  • RAWREAD - Added support for new version format (3 & 4) (See MIA_Toolbox).
  • SPGREADR Added new feature, options.spectrumindex now can be an integer, an array of integers (indices, & order doesn't matter), or 'all'. When loading multiple files, options.spectrumindex must be either a single value or 'all'.

Exporters

  • writeplt - Exports EVRI Model structures to Vision Air PLT files.

Other Features and Improvements

  • RPLS - A recursive PLS and PCR variable selection algorithm.
  • MANHATTANDIST - Calculate Manhattan Distance between rows of a matrix.
  • Confusionmatrix and Confusiontable - Classification results formatting options added. Can now specify use of mostprobable or strict classification rule.
  • calcdifference - Calculate difference between two datasets.
  • Added GLS Weighting to model optimizer.