Release Notes Version 7 3 and Faq more info on R Squared statistic: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
 
imported>Lyle
No edit summary
 
Line 1: Line 1:
==New Features in Solo and PLS_Toolbox==
===Issue:===


===Model Optimizer Interface===
Can you give me more information on the R-Squared statistic?
* Snapshot multiple model conditions to create modeling templates to build from new data all at once.
* Automatically create "combinations" of modeling conditions (including model type, method settings, preprocessing, cross-validation settings, and data include fields) then automatically build models from those conditions.
* Compare tables of results from all models to determine best modeling conditions.
* Push existing models into Optimizer from Workspace Browser or Analysis window's model cache for comparison or combinatorial mixing.


===New/Improved Importers===
===Possible Solutions:===
* [[gwscanreadr|Guided Wave Scan and Autoscan]] importer added
* [[jcampreadr|JCAMP importer]] significantly expanded in supported types
* [[pereadr|Perkin Elmer importer added
* [[opusreadr|Bruker OPUS]] importer added
* [[OPSClient|OPC client support]] through OPCclient object (PLS_Toolbox and Solo_Predictor only)
* [[xclreadr|"Streaming" text importer]] algorithm in text importer to allow reading of larger text files
* Improved memory and speed performance for text parsing


===Plot Controls===
R-Squared (R<sup>2</sup>) is an assessment of how well the model does the prediction (it is similar to RMSEC except that it doesn't show if there is a bias).
* Create histograms from any figure's content easily showing class overlap and more (right-click data and choose histogram from menu)
* Data selector toolbar to allows quick select+exclude/include as well as access to other selection tools like the search toolbar
* Improved peak finding with adjustable settings including labeling format, peak direction, and other algorithm settings
* Allow saving of individual settings as the default (by clicking on the disk icon to the right of the given option)
* Exported figures automatically resized for better appearance


===Automatically Calibration / Validation Data Splitting===
You can access the R<sup>2</sup> by right-clicking on a scores plot of predicted vs. measured. It is one of the items which show up in the information box ("Show on figure" puts it on the figure).
* Add ability to keep replicates together during the split
* Add Kennard-Stone sample selection method


===New/Improved Preprocessing Methods===
Note: in other software, R<sup>2</sup> is for the MODELED data only. In PLS_Toolbox we calculate it for the DISPLAYED data. That means that if you show excluded data, or if you show predicted/test data with calibration data ("Show Cal with Test") the R<sup>2</sup> will be for what is shown and will be different from the calibration data. Turn off the "Show Cal with Test" checkbox on the Plot Controls window to view the R<sup>2</sup> for only the test data.
* Fluorescence EEM data filtering
* Perform simple arithmetic operations on variable(s) (also allows "masking" of variables)
* Subtract Reference Sample (fixed background correction)
* Ratio to Reference Sample (relative signal correction)
* Gap Segment Derivatives
* Automatic Baseline Subtraction methods split into Whittaker and Weighted Least Squares
* Enable better handling drag/drop of data in Browse; Modify data by simply dragging it to Preprocess shortcut


===Analysis===
R<sup>2</sup> is calculated as the square of the correlation coefficient between the X and Y axes plotted in the figure. If the only data shown is the estimation of the calibration Y data vs. the actual calibration Y data, this is nearly the same as the standard R<sup>2</sup> for a model as defined by, e.g. Martens and Naes.  
* MCR panel to access commonly used constraints added
* PARAFAC added Split Half model testing to Analysis toolbar
* SIMCA can base classifications and probabilities off t2, q, both, or combined statistics
* Model Cache has improved behavior for handling old cache contents and upgrading model caches from version to version.
* Selectivity ratio confidence limits now shown on loadings plots


==New Command-line Features and Functions==
===Command-line Tool Changes===
* [[correctbias]] -allow correction of a model based on measured and predicted y-values (rather than measured y and corresponding spectra which have to have predictions made from them)


* [[glsw]]
'''Still having problems? Please contact our helpdesk at [mailto:helpdesk@eigenvector.com helpdesk@eigenvector.com]'''
**added options.downweight
**add documentation for new options (xgradient, maxperclass, downweight)


* [[flucut]]
[[Category:FAQ]]
**Changed flucut to allow for Rayleigh
**Added possibility of doing blank subtraction in the options of flucut (and changes Raman option names in the options structure)
 
* [[crossval]]
** add bias and R^2 outputs for calibration data as: cbias and r2c
** grab bbr outputs from 'apply' function (in case a method wants to modify it and pass back some item
** add r2c and cbias to help
** Fix for miscalculated RMSEP (for test data) when preprocessing methods require excluded data (such as derivatives and smoothing). This bug only effected the RMSEP as a function of LVs as calculated in the model during cross-validation. It had no impact on the RMSEP calculated for the prediction, which was always correct.
** don't allow PCA cross-validation to actually reach the total number of variables minus one (because we've actually lost one variable due to leave-out). Avoids RMSECV dropping down when it reaches the total number of variables.
 
* [[exportfigure]] -add ability to resize fonts and figure size for powerpoint or word exports
 
* [[plotgui]]
** add new logic to avoid putting points at the edges of plots
** add statistics of selected samples to Class Statistics (when samples are selected)
** add points in sequence ('sequence') connect method into connect classes menu
** added "maximumdatasummary" option to control how many items are allowed to be shown in the Data Summary > Data mode (default is 1000)
** add notice about data being culled from data summary
** add more logic to showtempnotice callback which will allow multiple temp notices (stacked on top of each other)
** add "viewpixelscale" option to show axisscales on images even when no imageaxisscale is set
 
===Misc New Functions===
 
* [[histaxes]] -add new function to create nice looking histograms from any figure's content
* [[wsmooth]] -Whittiker smoother

Latest revision as of 13:23, 2 January 2019

Issue:

Can you give me more information on the R-Squared statistic?

Possible Solutions:

R-Squared (R2) is an assessment of how well the model does the prediction (it is similar to RMSEC except that it doesn't show if there is a bias).

You can access the R2 by right-clicking on a scores plot of predicted vs. measured. It is one of the items which show up in the information box ("Show on figure" puts it on the figure).

Note: in other software, R2 is for the MODELED data only. In PLS_Toolbox we calculate it for the DISPLAYED data. That means that if you show excluded data, or if you show predicted/test data with calibration data ("Show Cal with Test") the R2 will be for what is shown and will be different from the calibration data. Turn off the "Show Cal with Test" checkbox on the Plot Controls window to view the R2 for only the test data.

R2 is calculated as the square of the correlation coefficient between the X and Y axes plotted in the figure. If the only data shown is the estimation of the calibration Y data vs. the actual calibration Y data, this is nearly the same as the standard R2 for a model as defined by, e.g. Martens and Naes.


Still having problems? Please contact our helpdesk at helpdesk@eigenvector.com