Demonstration Datasets: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Scott
No edit summary
imported>Lyle
 
(13 intermediate revisions by 4 users not shown)
Line 1: Line 1:
The following is a list of the data supplied with PLS_Toolbox. It consists of the name of the file (all end in ".mat" unless otherwise specified) along with a brief description of the contents of the given file.
__TOC__


{| class="wikitable"
==List of DataSets==
The following is a list of the data supplied with PLS_Toolbox. It consists of the name of the file (all end in ".mat" unless otherwise specified) along with a brief description of the contents of the given file. Each file may contain one or more variables to be used.
 
{| class="wikitable sortable"
|-
|-
|alcohol        || Biological fluid analysis of alcoholics for discriminant analysis.
!File Name !! Description !! Variable Name(s) !! Variable Size !! Task !! Notes
|-
|-
|aminoacids      || AMINOACIDS Fluorescence EEM of 5 samples for PARAFAC.
|alcohol || Biological fluid analysis of alcoholics || alcohol || 65x52 || Classification ||
|-
|-
|arch            || ARCH Archeological artifact data set for PCA amd SIMCA examples.
|aminoacids || Fluorescence EEM of 5 aminoacid samples || X || 5x20x61 || Decomposition || Try with PARAFAC
|-
|-
|bread          || Sensory evaluation of breads.
|arch || Archeological artifact data set || arch || 75x10 || Decomposition, Classification ||
|-
|-
|dorrit          || DORRIT EEM of 27 samples with 4 flourophores for PARAFAC.
|beer || VIS-NIR transmission recorded directly on undiluted degassed beer ||beer <br />extract <br />beertest <br />extracttest || 40x926 <br />40x1 <br />20x926 <br /> 20x1 ||Regression || Good for testing Variable Selection Interface
|-
|-
|etchdata        || Engineering process data from semiconductor metal etch (MPCA).
|biscuit || NIR reflectance spectra of 40 samples of biscuit dough from 1200-2398nm. ||spec <br />recipe || 40x600 <br />40x4 || Regression ||
|-
|-
|fia            || UV detection of Flow Injection Analysis of hydroxy-benzaldehyde (n-way data: sample, wavelength, time).
|brain_weight    || Body mass (kg) and brain mass (g) for 28 animals. || brains || 28x2 || Decomposition ||
|-
|-
|FTIR_microscopy || FTIR microscopy transect spectra of a three-layer polymer laminate.
|bread          || Sensory evaluation of breads. || bread || 10x11x8 || Decomposisiton || Try with PARAFAC, also good for testing EEM Filtering preprocessing
|-
|-
|halddata        || HALDDATA Hald cement curing data.
|cancer          || Fluorescence EEM spectra extracted from cervical cancer images. || cancer || 563x22 || Classification, Decomposition || Unfolded EEM data
|-
|-
|lcms            || LC/MS electrospray of 15 surfactant solution.
|corn_dso || 80 samples of corn measured on 3 different NIR spectrometers with moisture, oil, protein and starch values for each of the samples is also included. ||conc <br />m5nbs <br />m5spec <br />mp5nbs <br />mp5spec <br />mp6nbs <br />mp6spec || 80x4 <br />3x700 <br />80x700<br />4x700<br />80x700<br />4x700<br />80x700 || Regression ||
|-
|-
|lcms_compare1  || Select data from LC/MS electrospray data set.
|data_mid_IR    || Data sets for correlation spectroscopy. || data_mid_IR || 21x130 || Correlation Spectroscopy || Use with data_near_IR dataset
|-
|-
|lcms_compare2  || Select data from LC/MS electrospray data set.
|data_near_IR    || Data sets for correlation spectroscopy. || data_near_IR || 21x149 || Correlation Spectroscopy || Use with data_mid_IR dataset
|-
|-
|lcms_compare3  || Select data from LC/MS electrospray data set.
|dorrit || DORRIT EEM of 27 samples with 4 flourophores for PARAFAC. ||EEM <br />yblock || 27x116x18 <br /> 27x4 || Decomposition, Regression || Try PARAFAC and Multi-Way PLS, NPLS
|-
|-
|MS_time_resolved || Direct probe time profile MS of three color-coupling compounds.
|Dupont_BSPC || 10 process variables (pressure, flow, temperature) for 55 batches with 7 steps each. ||dupont_cal <br /> dupont_test || 3600x10 <br /> 1900x10 || Batch Processor  ||
|-
|-
|MS_time_resolved_references || Reference spectra of pure compounds from MS_time_resolved.
|etchdata || Engineering process data from semiconductor metal etch ||Etchcal <br />EtchTest || 20x12x107 <br />20x12x20 || Decomposition || Use Multi-Way PCA (MPCA). See Chemometrics Tutorial chapter 5.
|-
|-
|nir_data        || NIR_DATA NIR spectra of pseudo gasoline samples for STDDEMO.
|fia            || UV detection of Flow Injection Analysis of hydroxy-benzaldehyde (n-way data: sample, wavelength, time) || FIA || 12x50x45 || Decomposition ||
|-
|-
|nmr_data        || NMR_DATA NMR data for GRAM demo.
|flucuttest      || Fluorescence EEM data || z || 2x15x23 || Decomposition || Try PARAFAC
|-
|-
|ocatane        || NIR spectra and octane number values of 39 gasoline samples.
|FTIR_microscopy || FTIR microscopy transect spectra of a three-layer polymer laminate. || FTIR_microscopy || 17x81 || Decomposition || Use PURITY program
|-
|-
|oesdata        || OESDATA Optical emission spectra from metal etch.
|gcwine          || Dynamic headspace GCMS data of red wines from different regions. || gcwine || 71x150x24 || Decomposition || Try PARAFAC2
|-
|-
|paint          || PAINT Non-linear paint formulation data.
|halddata || HALDDATA Hald cement curing data. ||xblock <br /> yblock || 13x4<br />13x1 || Regression ||
|-
|-
|pcadata        || PCADATA Slurry Fed Ceramic Melter data.
|lcms            || LC/MS electrospray of 15 surfactant solution. || lcms || 345x1451 || Decomposition || Use CODA-DW Tool. See Chemometrics Tutorial, chapter 9.
|-
|-
|plsdata        || PLSDATA SFCM data for PCR and PLS demos.
|lcms_compare1  || Select data from LC/MS electrospray data set. || lcms_compare1 || 675x1940 || Decomposition || Try LCMS Compare Tool. Use with lcms_compare2 and lcms_compare3. See Chemometris Tutorial Chapter, chapter 9.
|-
|-
|plslogo        || PLSLOGO Data used to construct the PLS logo.
|lcms_compare2  || Select data from LC/MS electrospray data set. || lcms_compare2 || 675x1940 || Decomposition || Try LCMS Compare Tool. Use with lcms_compare1 and lcms_compare3. See Chemometris Tutorial Chapter, chapter 9.
|-
|-
|projdat        || PROJDAT Projection demo data for PROJDEMO.
|lcms_compare3  || Select data from LC/MS electrospray data set. || lcms_compare3 || 675x1940 || Decomposition || Try LCMS Compare Tool. Use with lcms_compare1 and lcms_compare2. See Chemometris Tutorial Chapter, chapter 9.
|-
|-
|pulsdata        || Time series data from a Slurry Fed Ceramic Melter for PLSPULSM demo.
|MS_time_resolved || Direct probe time profile MS of three color-coupling compounds. ||MS_time_resolved || 20x757 || Decomposition || Use PURITY program (try 3 and 4 components)
|-
|-
|raman_time_resolved || Raman spectra of a time resolved reaction.
|MS_time_resolved_references || Reference spectra of pure compounds from MS_time_resolved. || MS_time_resolved_references || 3x757 ||  || Use along with MS_time_resolved data for comparing results from PURITY
|-
|-
|replacedata    || REPLACEDATA SFCM data for REPLACEDEMO.
|nir_data || NIR spectra of pseudo gasoline samples ||conc <br />spec1 <br />spec2 || 30x5 <br />30x401 <br />30x401 || Regression || Try Savitsky-Golay preprocessing
|-
|-
|sawdata        || SAWDATA Surface acoustic wave sensor data.
|nmr_data        || NMR data for GRAM demo. || nmrdata || 20x1176 || Decomposition ||
|-
|-
|statdata        || STATDATA Data sets for ANOVA and statistics STATDEMO.
|octane || NIR spectra and octane number values of 39 gasoline samples. ||spec <br />octane || 39x226 <br />39x1 || Regression, Decomposition || Try some Robust methods. See Chemometrics Tutorial, chapter 13.
|-
|-
|sugar          || SUGAR Fluorescence EEM N-way data set.
|oesdata        || Optical emission spectra from metal etch. || oes1 || 46x770 || Decomposition || Try Multivariate Curve Resolution (MCR). See Chemometrics Tutorial, chapter 9.
|-
|-
|wine            || WINE Wine demographic data set for PCA example.
|OliveOilData || 36 FT-IR spectra (3600 - 600 cm-1) of olive oils. ||xcal <br /> xtest || 36x518 <br /> 36x518 || Decomposition, Classification || Try Multiplicative Scatter Correction (MSC) preprocessing.
|-
|-
|wineregion      || Metal Composition of Wines for classification by region.
|paint || Non-linear paint formulation data.||paint_cal_X <br />paint_cal_Y <br />paint_test_X <br />paint_test_Y || 49x4<br />49x3 <br />8x4 <br />8x3 || Regression || Try using with non-linear regression methods, like SVM-R
|-
|-
|redbeerdata.xls  || REDBEERDATA.XLS Example spreadsheet for "Intro to MATLAB".
|pcadata || Slurry Fed Ceramic Melter data. ||part1 <br />part2 || 300x10 <br />200x10 || Decomposition ||
|-
|-
|areadrdemtext.txt || AREADRDEMTEXT.TXT Text file used by AREADRDEMO.
|plsdata || SFCM data for PCR and PLS demos. ||xblock1 <br />yblock1 <br />xblock2<br />yblock2 ||300x20 <br />300x1 <br />200x20 <br />200x1 || Regression || Try Multiple Linear Regression (MLR). See Chemometrics Tutorial chapter 6.
|-
|-
|xclreadrdata.txt  || XCLREADRDATA.TXT Text file used by XCLREADRDEMO.
|pulsdata        || Time series data from a Slurry Fed Ceramic Melter for PLSPULSM demo. || melter_data || 325x3 || Decomposition ||
|-
|purvardata      || Raman spectra used in "Chemometrics Tutorial" section "MATLAB Code for Pure Variable Method, (chapter 9)". || data || 6x99 || Decomposition || Use PURITY
|-
|purvardata_noise || Raman spectra used in "Chemometrics Tutorial" section "MATLAB Code for Pure Variable Method, (chapter 9)". || data || 6x99 || Decomposition || Use PURITY
|-
|raman_dust_particles || Raman spectra || raman_dust_particles || 120x1025 || Decomposition || Use PURITY. Use with raman_dust_particles_references. See Chemometrics Tutorial section "MATLAB Code for Pure Variable Method, chapter 9. <br />Also try different Baseline preprocessing techniques with this data (Hint: order = 3).
|-
|raman_dust_particles_references || Raman spectra || raman_dust_particles_references || 3x1025 || Decomposition || Use PURITY. Use with raman_dust_particles. See Chemometrics Tutorial section "MATLAB Code for Pure Variable Method, chapter 9
|-
|raman_time_resolved || Raman spectra of a time resolved reaction, used in "Chemometrics Tutorial". || raman_time_resolved || 16x151 || Decomposition || Try the PURITY program
|-
|sawdata        || Surface acoustic wave sensor data. || SAWdata || 72x13 || Decomposition, Classification ||
|-
|SBRdata_EU || NIR transmission spectra of styrene-butadiene copolymers. ||Xcal <br />Ycal <br />Xtest <br />Ytest || 60x141 <br />60x4 <br />10x141 <br />10x4 || Regression || Try different regression methods (CLS, MLR, PLS)
|-
|stars          || Surface temperature and light intensity values for 47 stars. || stars || 47x2 || Decomposition ||
|-
|sugar          || Sugar Fluorescence EEM N-way data set|| sugar || 268x44x7 || Decomposition || Try Mulit-Way PCA
|-
|wine            || Wine demographic data set for PCA example. || wine || 10x5 || Decomposition ||
|-
|wineregion      || Metal Composition of Wines for classification by region. || wineregion || 38x17 || Classification ||
|}
|}
==How to Load Demo Data==
For MAT files, the easiest way to load demo data is using the Load Dialog Box:
# From the file menu Load Data (or Import Workspace/MAT File).
# When the dialog box appears click the From File button.
# Type the name of the demo MAT file to loaded and hit the return key.
# Select the variables to be loaded and click the Load button.
[[Image:LoadDemoData.png| |600px|Load Demo Data]]


(Sub topic of [[PLS_Toolbox_Topics|PLS_Toolbox_Topics]])
(Sub topic of [[PLS_Toolbox_Topics|PLS_Toolbox_Topics]])

Latest revision as of 16:07, 8 May 2019

List of DataSets

The following is a list of the data supplied with PLS_Toolbox. It consists of the name of the file (all end in ".mat" unless otherwise specified) along with a brief description of the contents of the given file. Each file may contain one or more variables to be used.

File Name Description Variable Name(s) Variable Size Task Notes
alcohol Biological fluid analysis of alcoholics alcohol 65x52 Classification
aminoacids Fluorescence EEM of 5 aminoacid samples X 5x20x61 Decomposition Try with PARAFAC
arch Archeological artifact data set arch 75x10 Decomposition, Classification
beer VIS-NIR transmission recorded directly on undiluted degassed beer beer
extract
beertest
extracttest
40x926
40x1
20x926
20x1
Regression Good for testing Variable Selection Interface
biscuit NIR reflectance spectra of 40 samples of biscuit dough from 1200-2398nm. spec
recipe
40x600
40x4
Regression
brain_weight Body mass (kg) and brain mass (g) for 28 animals. brains 28x2 Decomposition
bread Sensory evaluation of breads. bread 10x11x8 Decomposisiton Try with PARAFAC, also good for testing EEM Filtering preprocessing
cancer Fluorescence EEM spectra extracted from cervical cancer images. cancer 563x22 Classification, Decomposition Unfolded EEM data
corn_dso 80 samples of corn measured on 3 different NIR spectrometers with moisture, oil, protein and starch values for each of the samples is also included. conc
m5nbs
m5spec
mp5nbs
mp5spec
mp6nbs
mp6spec
80x4
3x700
80x700
4x700
80x700
4x700
80x700
Regression
data_mid_IR Data sets for correlation spectroscopy. data_mid_IR 21x130 Correlation Spectroscopy Use with data_near_IR dataset
data_near_IR Data sets for correlation spectroscopy. data_near_IR 21x149 Correlation Spectroscopy Use with data_mid_IR dataset
dorrit DORRIT EEM of 27 samples with 4 flourophores for PARAFAC. EEM
yblock
27x116x18
27x4
Decomposition, Regression Try PARAFAC and Multi-Way PLS, NPLS
Dupont_BSPC 10 process variables (pressure, flow, temperature) for 55 batches with 7 steps each. dupont_cal
dupont_test
3600x10
1900x10
Batch Processor
etchdata Engineering process data from semiconductor metal etch Etchcal
EtchTest
20x12x107
20x12x20
Decomposition Use Multi-Way PCA (MPCA). See Chemometrics Tutorial chapter 5.
fia UV detection of Flow Injection Analysis of hydroxy-benzaldehyde (n-way data: sample, wavelength, time) FIA 12x50x45 Decomposition
flucuttest Fluorescence EEM data z 2x15x23 Decomposition Try PARAFAC
FTIR_microscopy FTIR microscopy transect spectra of a three-layer polymer laminate. FTIR_microscopy 17x81 Decomposition Use PURITY program
gcwine Dynamic headspace GCMS data of red wines from different regions. gcwine 71x150x24 Decomposition Try PARAFAC2
halddata HALDDATA Hald cement curing data. xblock
yblock
13x4
13x1
Regression
lcms LC/MS electrospray of 15 surfactant solution. lcms 345x1451 Decomposition Use CODA-DW Tool. See Chemometrics Tutorial, chapter 9.
lcms_compare1 Select data from LC/MS electrospray data set. lcms_compare1 675x1940 Decomposition Try LCMS Compare Tool. Use with lcms_compare2 and lcms_compare3. See Chemometris Tutorial Chapter, chapter 9.
lcms_compare2 Select data from LC/MS electrospray data set. lcms_compare2 675x1940 Decomposition Try LCMS Compare Tool. Use with lcms_compare1 and lcms_compare3. See Chemometris Tutorial Chapter, chapter 9.
lcms_compare3 Select data from LC/MS electrospray data set. lcms_compare3 675x1940 Decomposition Try LCMS Compare Tool. Use with lcms_compare1 and lcms_compare2. See Chemometris Tutorial Chapter, chapter 9.
MS_time_resolved Direct probe time profile MS of three color-coupling compounds. MS_time_resolved 20x757 Decomposition Use PURITY program (try 3 and 4 components)
MS_time_resolved_references Reference spectra of pure compounds from MS_time_resolved. MS_time_resolved_references 3x757 Use along with MS_time_resolved data for comparing results from PURITY
nir_data NIR spectra of pseudo gasoline samples conc
spec1
spec2
30x5
30x401
30x401
Regression Try Savitsky-Golay preprocessing
nmr_data NMR data for GRAM demo. nmrdata 20x1176 Decomposition
octane NIR spectra and octane number values of 39 gasoline samples. spec
octane
39x226
39x1
Regression, Decomposition Try some Robust methods. See Chemometrics Tutorial, chapter 13.
oesdata Optical emission spectra from metal etch. oes1 46x770 Decomposition Try Multivariate Curve Resolution (MCR). See Chemometrics Tutorial, chapter 9.
OliveOilData 36 FT-IR spectra (3600 - 600 cm-1) of olive oils. xcal
xtest
36x518
36x518
Decomposition, Classification Try Multiplicative Scatter Correction (MSC) preprocessing.
paint Non-linear paint formulation data. paint_cal_X
paint_cal_Y
paint_test_X
paint_test_Y
49x4
49x3
8x4
8x3
Regression Try using with non-linear regression methods, like SVM-R
pcadata Slurry Fed Ceramic Melter data. part1
part2
300x10
200x10
Decomposition
plsdata SFCM data for PCR and PLS demos. xblock1
yblock1
xblock2
yblock2
300x20
300x1
200x20
200x1
Regression Try Multiple Linear Regression (MLR). See Chemometrics Tutorial chapter 6.
pulsdata Time series data from a Slurry Fed Ceramic Melter for PLSPULSM demo. melter_data 325x3 Decomposition
purvardata Raman spectra used in "Chemometrics Tutorial" section "MATLAB Code for Pure Variable Method, (chapter 9)". data 6x99 Decomposition Use PURITY
purvardata_noise Raman spectra used in "Chemometrics Tutorial" section "MATLAB Code for Pure Variable Method, (chapter 9)". data 6x99 Decomposition Use PURITY
raman_dust_particles Raman spectra raman_dust_particles 120x1025 Decomposition Use PURITY. Use with raman_dust_particles_references. See Chemometrics Tutorial section "MATLAB Code for Pure Variable Method, chapter 9.
Also try different Baseline preprocessing techniques with this data (Hint: order = 3).
raman_dust_particles_references Raman spectra raman_dust_particles_references 3x1025 Decomposition Use PURITY. Use with raman_dust_particles. See Chemometrics Tutorial section "MATLAB Code for Pure Variable Method, chapter 9
raman_time_resolved Raman spectra of a time resolved reaction, used in "Chemometrics Tutorial". raman_time_resolved 16x151 Decomposition Try the PURITY program
sawdata Surface acoustic wave sensor data. SAWdata 72x13 Decomposition, Classification
SBRdata_EU NIR transmission spectra of styrene-butadiene copolymers. Xcal
Ycal
Xtest
Ytest
60x141
60x4
10x141
10x4
Regression Try different regression methods (CLS, MLR, PLS)
stars Surface temperature and light intensity values for 47 stars. stars 47x2 Decomposition
sugar Sugar Fluorescence EEM N-way data set sugar 268x44x7 Decomposition Try Mulit-Way PCA
wine Wine demographic data set for PCA example. wine 10x5 Decomposition
wineregion Metal Composition of Wines for classification by region. wineregion 38x17 Classification

How to Load Demo Data

For MAT files, the easiest way to load demo data is using the Load Dialog Box:

  1. From the file menu Load Data (or Import Workspace/MAT File).
  2. When the dialog box appears click the From File button.
  3. Type the name of the demo MAT file to loaded and hit the return key.
  4. Select the variables to be loaded and click the Load button.

Load Demo Data


(Sub topic of PLS_Toolbox_Topics)