Model Exporter Reference Manual and SiPAT Interface: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
 
imported>Jeremy
 
Line 1: Line 1:
__TOC__
==Introduction==
==Introduction==
Eigenevctor Research's [[Function_Reference_Manual|PLS_Toolbox]] and [http://www.siemens.com/ Siemens' SiPAT product] can be used together for deployment of PLS_Toolbox or Solo multivariate analysis models in process control applications. This integration utilizes Mathworks' [http://www.mathworks.com/ Matlab] and functionality built into SiPAT to run custom Matlab functions. These functions can include calls to PLS_Toolbox functions.


[[Model_Exporter]] from Eigenvector Research, Inc. converts models created within the [[Software_User_Guide|PLS_Toolbox or Solo]] chemometrics modeling environments into an interpretable format for use outside of these products. These exported models can be used with a user-supplied interpreter to make predictions on new data. One prediction engine is supplied as a example.
The following discusses the PLS_Toolbox-specific configuration and provides example m-files which can be used to quickly get SiPAT and PLS_Toolbox integrated.


Model_Exporter takes as input a standard model structure created in PLS_Toolbox or Solo and outputs the model into one of three formats: an [[#XML_File_Format|XML file]] (executable by a user-supplied external parser), an [[#M-file_Format|m-file]] (executable in MATLAB – separately distributed by Mathworks, Inc – without any additional toolboxes, or LabView with their MathScript addon package) or a [[#TCL_File_Format|TCL file]] (executable in a Tcl interpreter or in the Symbion software package – by Symbion Systems, Inc.).
====Licensing====


The exported model requires very few resources to be executed. Specifically, it requires floating-point numerical calculations, a small amount of memory, and the overhead resources required by the specific interpreter.
Note that the [http://www.eigenvector.com/software/license_evri.html PLS_Toolbox license] requires that, unless special site licensing arrangements are made, you must obtain a separate PLS_Toolbox license for each instance of any PLS_Toolbox code you wish to deploy. Special licensing options are available. See the [http://www.eigenvector.com Eigenvector Website] for more information.


This documentation describes the use of the Model_Exporter, the use of exported [[#M-file_Format|M-file]] and [[#TCL_File_Format|TCL-file]] formats as well as to help in the design of external [[#XML_File_Format|XML parsing engines]]. One example engine is supplied for the PHP language (often used for web-page scripting predictions; see http://www.php.net for more information on PHP). Additional engines may be available - [mailto:helpdesk@eigenvector.com Contact Eigenvector Research, Inc.] for more information.
==Installation and Basic Configuration==
===SiPAT Configuration===


Latest version release notes can be found at http://wiki.eigenvector.com/index.php?title=Model_Exporter_Release_Notes
The user is directed to the Siemens documentation for details on configuration of SiPAT for use with Matlab. The discussion below describes some of the PLS_Toolbox-specific considerations of this configuration.


==System Requirements==
===Matlab and PLS_Toolbox Configuration===


Model_Exporter can be executed from either the MATLAB computational environment ([http://mathworks.com Mathworks, Inc., Natick, MA]), or  [[Software User Guide|Solo]] (Eigenvector Research, Inc., Wenatchee, WA). Model_Exporter converts models created by PLS_Toolbox 3.5 or higher or Solo 4.0 or higher.
With Matlab already installed (as per [http://www.mathworks.com The Mathworks installation instructions]), PLS_Toolbox can be copied onto the target computer using any of the available file types as described in the [[Installation|standard installation instructions]]. The primary differences from the standard installation will be that you will create two special files and place these files into the PLS_Toolbox/utilities folder on the target computer:
# Create a text file named '''evrilicense.lic''' and put your license code (available from the [http://download.eigenvector.com Eigenvector Research download page]) as a single line in that file. You can also obtain a evrilicense.lic file from the [mailto:helpdesk@eigenvector.com Eigenvector Helpdesk].
# Create an ''empty'' text file named '''evrinetwork.lic'''. This file will instruct PLS_Toolbox to ignore installation errors and run without warnings (necessary when calling PLS_Toolbox functions from within SiPAT.)


===Matlab-Based Exporter Requirements===
==Specific Model and M-file Configuration==


For execution of Model_Exporter within the MATLAB environment, the following is required:
The SiPAT interface into Matlab provides for the specification of a data file to load and a Matlab function to execute (along with specific input and output configuration information.)


:Matlab 6.5 or higher
In the examples given here, the data file will be in the Matlab MAT format and will contain the model to apply. The Matlab function will be given in the Matlab m-file format and will contain the specific instructions for applying the model and returning the results to SiPAT. In these examples, it is assumed there is only one model that is being applied  per each method (no special calibration transfer or other pre-transformation steps being used).
:256 MB RAM (recommended – less may be required)


===Solo-Based Exporter Requirements===
===Creation of the Model MAT File===


For execution of the Model_Exporter, the following is recommended
The model you wish to apply to new data should be saved from Matlab or Solo into a MAT file as the one and only item stored in the MAT file. In the [[Analysis Window]], this is done by selecting the menu item: File > Save Model and using the [[WorkspaceBrowser_ImportingData#To_save_imported_data_to_a_.mat_file save dialog]] to specify a filename and an item name. This will save the model into the given filename with the specified item name. Although the MAT file name can be any standard filename, it will make m-file construction easier if the item name used is always the same in all saved models. In the example here, we will assume that the item is called "model". If a different name is used, the SiPAT-specific configuration comments in the m-file (see below) will need to be modified to indicate to SiPAT the name used for the model.


:Solo+Model_Exporter 4.1 or higher
If saving the model from the Matlab command line, the following command can be used (assuming the model to use is currently named "model" in your Matlab workspace) :
:Windows 2000, XP, 2003 server, Vista, or MAC OS X (intel)
  save myfile.mat model
:200 MB Disk Space (for installation; some models may require additional space)
:256 MB RAM (recommended – less may be required)


===Requirements for Using Exported Models===
===Creation of the Function  M-file===


The requirements to execute an exported model vary depending on the interpreter used, the number of variables in the modeled data, and the complexity of the model (i.e. the number of factors/components included in the model and the types of preprocessing used).
The second part to the SiPAT/PLS_Toolbox interface is a Matlab m-file which contains a function definition (Note: The contents of this m-file must actually be a Matlab function, meaning it must contain a function header line as shown in the scripts below. It cannot be a "script" in the strict Matlab definition of that term which implies code that is not wrapped inside a function definition.)


Memory requirements depend on the precision required for the application, the number of variables in the data and the total number of factors in the model. For example, a model working on 10,000 variables and 5 factors would require around 1MB for double-precision calculations and 500KB single-precision calculations.
Three example m-files are given below: one for use with any of the regression model types (PLS, PCR, MLR, CLS, SVM, LWR, etc), one for use with classification model types (PLSDA, KNN, SIMCA, SVMDA/SVM-C), and one for use with principal component analysis (PCA) models. Each of these m-files assumes that the input is a single vector of values (passed by SiPAT) and each returns two or three values which correspond to the predictions from the model.


The software which executes the specific file formats may have additional requirements. See the file format description sections later in this manual for where to locate model execution details.
These functions all also assume that the input data will be two columns where the first column contains axis scale information for the variables (such as wavelength, m/z, time, etc) and the second column is the actual measured data. If this does not fit the type of data being passed by SiPAT (e.g. no axisscale information), the initial lines:


==Supported Methods==
<pre>
    %convert second column of input data into a dataset (if not appropriate column, change next line)
    x = data(:,2);
    x = dataset(double(x'));


Model_Exporter supports the following model types:
    %Assume first column is axisscale information. If not true, comment out the next line
    x.axisscale{2} = double(data(:,1));
</pre>


:PCA – Principal Components Analysis model
should be converted as necessary to handle the input data. For example, if only a single column of values is being passed, the following can be substituted in for the above code:
:PLS – Partial Least Squares regression model
:PLSDA – Partial Least Squares discriminant analysis model
:PCR – Principal Components Regression model
:CLS – Classical Least Squares Regression model


and preprocessing methods:
<pre>
    %convert column of input data into a dataset
    x = dataset(double(data'));
</pre>


:Absolute value           
====Regression Model Predictions====
:Autoscale   
:Baseline (Specified points)   
:Derivative (SavGol)   
:Detrend
:GLS weighting           
:Log10         
:Mean center                         
:Median center
:Normalize                   
:OSC           
:Smoothing (SavGol)             
:SNV
:Sqrt Mean Scale         
:MSC


The following m-file contents are appropriate for use with regression model types:


Normalization and Baseline support windowing. Normalization supports type 1 (area) and type 2 (length) normalization, but does not support 'Inf' type normalization.
<pre>
%START SIPAT
%<CONFIG>
%  <MODEL>REGRESSION</MODEL>
%  <PREFIX></PREFIX>
%  <SUFFIX></SUFFIX>
%  <INPUTS>
%      <INPUT Name="data" XDataType="MultiValue" YDataType="Single" />
%  </INPUTS>
%  <OUTPUTS>
%      <OUTPUT Name="y" XDataType="SingleValue" YDataType="Double" />
%      <OUTPUT Name="q_x" XDataType="SingleValue" YDataType="Double" />
%      <OUTPUT Name="H_x" XDataType="SingleValue" YDataType="Double" />
%  </OUTPUTS>
%  <FUNCTION>[y,q_x,H_x]=regpred(data,model)</FUNCTION>
%</CONFIG>
%END SIPAT
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Regression using PLS_toobox from Eigenvector
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This function calculates the responses y and the Q-residuals
%(q_x) and Hotellings T2 (H_x) corresponding to the input variables data
%(data) using a standard model structure.
%
%Before using this function, make sure you loaded the following data into
%the workspace:
%  model: Model structure used in PLS_Toolbox/Solo including all pretreatment.


Model_Exporter does not support replacement of missing values (values must be measured for all variables).
function [y,q_x,H_x] = regpred(data,model)


==Exporting a Model==
try
    %convert second column of input data into a dataset (if not appropriate column, change next line)
    x = data(:,2);
    x = dataset(double(x'));


===Exporting from PLS_Toolbox and MATLAB===
    %Assume first column is axisscale information. If not true, comment out the next line
    x.axisscale{2} = double(data(:,1));


Model_Exporter is easily called from the MATLAB environment. After adding the Model_Exporter to the MATLAB path, a model can be exported by simply calling the exportmodel function, passing the model structure itself, and an optional input specifying the file name and type to which the exported model should be written. When filename is omitted, Model_Exporter will prompt for a filename, file type, and location.
    %make a prediction
    opts =[];
    opts.plots='none';
    opts.display='off';
    pred_x = feval(lower(model.modeltype),x,model,opts);


     exportmodel(modelstructure,filename)
     %return prediction in y 
    y = double(pred_x.pred{2});
    if isfield(pred_x,'ssqresiduals')
        q_x = double(pred_x.ssqresiduals{1,1}./pred_x.detail.reslim{1});
        H_x = double(pred_x.tsqs{1}./pred_x.detail.tsqlim{1});
    else
        q_x = 0;
        H_x = 0;
    end


Model_Exporter is also accessible from the PLS_Toolbox through the Analysis GUI. With the model to export loaded into the Analysis GUI, go to the '''File > Export Model > To Predictor…''' menu and select the file type to export from the flyout menu.
catch
    %errors are saved to the following file (change location as desired)
    fid=fopen('C:\sipat_error_sout.txt','w');
    fwrite(fid,encode(lasterror));
    fwrite(fid,encode(evalin('base','whos')));
    fclose(fid);
end
</pre>


===Exporting from Solo===
====Classification Model Predictions====


When installed with the stand-alone Solo software, a model is exported from the Analyis GUI. With the model to export loaded into the Analysis GUI, Go to the '''File > Export Model > To Predictor…''' menu and select the file type to export from the flyout menu.
The following m-file contents are appropriate for use with classification model types:


===Handling Excluded Variables===
<pre>
%START SIPAT
%<CONFIG>
%  <MODEL>CLASSIFICATION</MODEL>
%  <PREFIX></PREFIX>
%  <SUFFIX></SUFFIX>
%  <INPUTS>
%      <INPUT Name="data" XDataType="MultiValue" YDataType="Single" />
%  </INPUTS>
%  <OUTPUTS>
%      <OUTPUT Name="y" XDataType="SingleValue" YDataType="Double" />
%      <OUTPUT Name="q_x" XDataType="SingleValue" YDataType="Double" />
%      <OUTPUT Name="H_x" XDataType="SingleValue" YDataType="Double" />
%  </OUTPUTS>
%  <FUNCTION>[y,q_x,H_x]=classpred(data,model)</FUNCTION>
%</CONFIG>
%END SIPAT
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Classification using PLS_toobox from Eigenvector
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This function calculates the numerical class assignment, the Q-residuals
%(q_x) and Hotellings T2 (H_x) corresponding to the input variables data
%(data) using a standard model structure.
%
%Before using this function, make sure you loaded the following data into
%the workspace:
%  model: Model structure used in PLS_Toolbox/Solo including all pretreatment.


When excluded variables are detected within a model, the user will be given two options for how to handle these variables.
function [y,q_x,H_x] = classpred(data,model)


# Compress Model – Model_Exporter will attempt to remove all references to excluded variables. The created predictor will expect values for only the included variables.
try
# Use Placeholders – Model_Exporter will create a predictor which expects values for all variables, excluded or included, although excluded values will be ignored.
    %convert second column of input data into a dataset (if not appropriate column, change next line)
    x = data(:,2);
    x = dataset(double(x'));


The choice between these two methods depends on the environment in which the exported model is going to be used. If it is easier to always provide all variables to the predictor, then the “Use Placeholders” option is probably preferred. If, instead, only the included variables will be available (e.g. the excluded variables are not going to be measured), compressing the model is the correct approach.
    %Assume first column is axisscale information. If not true, comment out the next line
    x.axisscale{2} = double(data(:,1));


In general, the two methods give identical numerical results with the sole exception of models which make use of smoothing and derivative preprocessing. These methods may give slightly different “edge effects” after compressing a model and validation of such models is encouraged.
    %make a prediction
    opts =[];
    opts.plots='none';
    opts.display='off';
    pred_x = feval(lower(model.modeltype),x,model,opts);


In either case, the header information in the exported model will always reflect the number of variables expected and any labels or axisscale information for those variables.
    %return prediction in y 
    y = double(pred_x.classification.mostprobable);
    if isfield(pred_x,'ssqresiduals')
        q_x = double(pred_x.ssqresiduals{1,1}./pred_x.detail.reslim{1});
        H_x = double(pred_x.tsqs{1}./pred_x.detail.tsqlim{1});
    else
        q_x = 0;
        H_x = 0;
    end


==M-file Format==
catch
 
    %errors are saved to the following file (change location as desired)
The m-files output by [[Model Exporter]] are stand-alone. That is, they can be run by the MATLAB computational environment (available from Mathworks, Inc., http://www.mathworks.com) without any additional toolboxes or the LabVIEW environment (available from National Instruments, Inc., http://www.ni.com) with any MathScript-enabled package.
    fid=fopen('C:\sipat_error_sout.txt','w');
 
     fwrite(fid,encode(lasterror));
For maximum flexibility, an exported model is written as a script which expects only to find a variable named x in its workspace. This variable provides the input data to which the model should be applied. It is important to note that the variable x will be modified by the script and, thus, the caller should not expect the variable to remain unchanged. See "Creating Functions from Exported Models", below, for more information on how to isolate the script and call it as a function. (Those unfamiliar with MATLAB scripts and functions should read the MATLAB documentation describing these concepts and the associated "variable scope" documentation.)
    fwrite(fid,encode(evalin('base','whos')));
 
     fclose(fid);
===Input Data===
end
 
</pre>
The expected length (number of elements) and contents of the input x vector are defined in the comments and initial sections of the exported model script. The script, as exported, does not use this information to perform any validity testing on the input variable. This information is only provided to indicate to the user what type of data is expected.
 
The example below shows the part of an exported model which indicates the expected data size and associated context information. This particular model expects input data of ten variables as a row vector (as described by inputdata.size). The labels of these ten variables are specified in the string array inputdata.label. As there was no axisscale information in this particular data, the inputdata.axisscale value is empty.
 
inputdata.size = [ 1 10 ];
inputdata.axisscale = [ ];
inputdata.label = ['Fe';'Ti';'Ba';'Ca';'K ';'Mn';'Rb';'Sr';'Y ';'Zr'];
 
The user can make use of this information to assure the data being passed to the model is correct. Again, as written, the script provides no testing. Incorrect data sizes will be indicated by a runtime error when executing the script.
 
===Returned Results===
 
The results available from a model prediction will be present as variables in the script's workspace. The user is responsible for making use of these variables as needed. The following list specifies the supported results which may be of interest to the user.
 
:'''scores''' - Scores for each component as a row vector.
:'''Tcon''' - Variable contributions to T2 as a row vector.
:'''Xhat''' - Model estimate of the data as a row vector.
:'''Qcon''' - Q residuals contributions (x residuals) as a row vector.
:'''T2''' - The Hotelling's T^2 as a scalar value.
:'''Q''' - The sum squared x residuals (Q value) as a scalar value.
 
Regression models return the following additional value:
 
:'''yhat''' - Model prediction for y (predicted y value) as a scalar value or vector.
 
PLSDA discriminant analysis models also return an additional value
 
:'''prob''' - Model predicted probability of the input sample belonging to each class, where the classes are ordered as unique(y), as a vector. (y refers to the classes variable originally used in building the model).
 
===Creating Functions from Exported Models===
 
Although the exported model is written as a script which would normally operate in the base workspace of MATLAB, the user can also wrap the script into a function by simply adding a standard function definition to the script file. A function wrapper keeps the input variable x from being modified outside the function. This approach tends to be safer than a script, but is not implemented by default in order to provide the widest flexibility to the user.
 
An example function line is provided in the exported model file (commented out) along with instructions for customization. In addition, there is an example block of code (also commented out by default) which will return “expected information” about x if the function is called without any inputs.
 
In general, the function definition requires only one input, x, and can output any of the variables which are present after the script's execution. An example would be:
 
  function [scores,Q,T2,Qcon,Tcon] = mymodel(x)
 
This function definition returns the vectors: scores, Qcon, and Tcon, as well as the scalar values: Q and T2 to the caller.
 
==TCL File Format==
 
The tcl-files output by [[Model Exporter]] can be run by either a stand-alone Tcl parser (for example see the "Batteries Included" ActiveTcl Distribution http://www.tcl.tk/software/tcltk/ ) or by Symbion (available from Symbion Systems, Inc., www. gosymbion.com). When run in a stand-alone Tcl parser, the La package for matrix support is required (available free from: http://www.hume.com/la/ )
 
For maximum flexibility, an exported model is written as a Tcl script which expects only to find a variable named x in its workspace. This variable provides the input data to which the model should be applied. It is important to note that the variable x will be modified by the script and, thus, the caller should not expect the variable to remain unchanged.
 
===Input Data===
 
The expected length (number of elements) and contents of the input x vector are defined in the comments and initial sections of the exported model script. The script, as exported, does not use this information to perform any validity testing on the input variable. This information is only provided to indicate to the user what type of data is expected.
 
The example below shows the part of an exported model which indicates the expected data size and associated context information. This particular model expects input data of ten variables as a row vector (as described by inputdata.size). The labels of these ten variables are specified in the string array inputdata.label. As there was no axisscale information in this particular data, the inputdata.axisscale value is empty.
 
# inputdata.size = [ 1 10 ];
# inputdata.axisscale = [ ];
# inputdata.label = ['Fe';'Ti';'Ba';'Ca';'K ';'Mn';'Rb';'Sr';'Y ';'Zr'];
 
The user can make use of this information assure the data being passed to the model is correct. Again, such testing is not provided by the script as written. Incorrect data sizes will be indicated by a runtime error when executing the script.
 
===Returned Results===
 
The results available from a model prediction will be present as variables in the script's workspace. The user is responsible for making use of these variables as needed. The list of output variables is the same as those listed under the [[#Returned_Results|M-file format description]].
 
==XML File Format==
 
===Numerical Matrix Definitions===
 
The XML format utilizes custom tags to define various parts of the model. For some tags, the content is a vector or matrix of values. In these cases, a comma character delineates different column elements and semicolon indicates the end of a matrix row and the beginning of the next. All white space is ignored. If a given matrix contains only one row, it is described as a "row vector". A matrix with a single column is described as a "column vector". Orientation of such vectors is critical to the mathematical operations and must be parsed appropriately.
 
===XML Structure===
 
The XML file will consist of a top level &lt;model&gt; tag which will contain an &lt;information&gt; tag, an &lt;inputdata&gt; tag, and one or more step segments, each wrapped in a separate &lt;step&gt; tag.
 
'''&lt;model&gt;'''
:'''&lt;information&gt;'''  General information on the encoded model.
::'''&lt;source&gt;'''    Text description of file source (EVRI Model_Exporter).
::'''&lt;modeltype&gt;'''  Standard model method acronym (PCA, PLS, etc).
::'''&lt;description&gt;''' Text description of model including preprocessing, data size(s), and number of components. Each row of this multi-row string is delineated by &lt;sr&gt; (string row) tags.
::'''&lt;datasource&gt;''' Information block of modeled calibration data. &lt;datasource&gt; is a multi-cell table format. There will be one column of information for each block of data required by the given modeltype (e.g. PCA requires 1 block, PLS requires 2). Each &lt;td&gt; tag will contain a number of sub-fields describing the data used for the given block. Informational only, sub-fields may change.
:'''&lt;/information&gt;'''
:'''&lt;inputdata&gt;'''    Specific requirements for input data including the following information:
::'''&lt;size&gt;'''     Numeric class row vector describing the size expected for the input data (x). The first element of the vector gives the expected number of rows, the second is the expected number of columns.
::'''&lt;axisscale&gt;'''  Numeric class row vector providing the expected axisscale of the input values. The actual values stored in the axisscale vector are completely dependent on the application and the analytical method used and may be empty.
::'''&lt;label&gt;'''    Strings (delimited by &lt;sr&gt; sub-tags) defining the names of the variables expected in the input data (x). The names are dependent on the application and the analytical method used and may be empty.
:'''&lt;/inputdata&gt;'''
:'''&lt;step&gt;'''      Repeated tag for each step required for making a prediction using this model. Will contain the following sub-fields:
::'''&lt;sequence&gt;'''  Numeric class single value indicating the order in which this step should be performed. The steps are generally included in the XML file in sequence-order (sequence 1 will be the first step in the file), but this field can be used to assure in-order processing of steps.
::'''&lt;description&gt;''' String class description of the step (informational only)
::'''&lt;constants&gt;'''  Contains information on constants required by this step. Each constant is defined as a sub-tag herein. The name of the constant is the sub-tag name and will contain a matrix (or vector) of values to use for the given constant. See below for more information.
::'''&lt;script&gt;'''   One or more rows of strings describing the mathematical operations to perform this step. When more than one mathematical operation is to be performed, each will be given in a separate string row &lt;sr&gt; tag. However, these can be ignored. Each mathematical operation will be terminated with a semicolon.
:'''&lt;/step&gt;'''
''(Additional &lt;step&gt; tags located here…)''
 
'''&lt;/model&gt;'''
 
See the provided files "pcaexample.xml" and "plsexample.xml" for full examples of the XML structure.
 
==Requirements for XML Interpreters==
 
To execute each of the <step> segments contained in the XML file, an interpreter must be able to parse the constants defined into matrices and be able to execute the script commands. The following give the specifications for an interpreter.
 
===Managing of Constants and Variables===
* The interpreter must  maintain a "workspace" of stored constants and variables in  which the matrices can be accessed by a variable name (specified by the  tag in which the given constant was read, for example:<pre>&lt;s class="numeric" size="[1,1]">4&lt;/s></pre>
:would define a constant "s" which was equal to the scalar value 4).
* Constants are NOT case sensitive and any interpreter must be written to consider the upper or lower case variables as the same.
* "Constants" are just pre-defined variables. Although every effort will be made to avoid changing these values, it is NOT a rule that these "constants" cannot be  changed – scripts may modify and overwrite these values. They are called  "constants" because they are initially defined by the model.
* The enclosing tag for the  constant will define the class of the constant (in this application,  constants will always be "numeric") and will also define the  size of the constant using the attribute 'size'. For example, <pre>&lt;s class="numeric" size="[1,5]"></pre> defines that the enclosed constant will be a row vector (1 row) of 5 elements (5 columns).
* Prior to the execution of  the script(s), the XML interpreter must place a variable named "x"  (lower-case) in their workspace. This variable must contain the data to which  the model should be applied. The value of "x" will be modified by the script so, following initial assignment, no alteration of this  variable should be done outside what is specified by the script.
* All constants/variables  must be retained for the entirety of a given step. In many cases, the  variables remaining in the workspace will contain results of interest to  the caller and, therefore, all workspace values should be retained. The variable "x" must always be present.
 
===Script Execution===
 
The following lists define the script commands which must be supported by the interpreter (scripts may contain only these commands). When applicable, the Matlab operator corresponding to the given function is given. Interpreters do not need to interpret these operators. They will never be used in any script and are provided here only for reference.
 
====Single Input Functions====
 
C = function(A);
  abs            Absolute Value     Removal of sign of elements
  log10          log (base 10)      Base 10 logarithm of elements
  transpose      transpose array    Exchange rows for columns ( ' )
 
====Double Input Functions====
 
C = function(A,B);
    plus              Plus                        Addition of paired elements (+)
    minus        Minus                            Subtraction of paired elements (-)
    mtimes        Matrix multiply (dot product)    Dot product of matrices (*)
    times        Array multiply                    Multiplication of paired elements (.*)
    power        Array power                      Exponent using paired elements (.^)
    rdivide      Right array divide                Division of paired elements (./)
 
===Mathematical Operation Requirements===
 
* All mathematical operations are expected to be performed using signed, single precision numbers.
* With the exception of mtimes (dot product), all operations are "element-by-element". That is, the two matrices passed will be equal in size (see scalar exception below) and the mathematical operation is performed between each element of matrix A and its corresponding element in matrix B. The output matrix C is always the same size as A and B.
* Scalar Exception (except mtimes): A or B may be a scalar even if the other isn't. In this situation, the scalar input must be interpreted as an appropriately-sized matrix containing all the same value.
* mtimes (dot product) is performed using the standard linear-algebraic dot-product operation. In generic terms, the input matrix A will contain m rows and k columns, the input matrix B will contain k rows and n columns and the output matrix C will contain m rows and n columns. The following equation is used to calculate each element of the C matrix (loop for i = 1 to m and for j = 1 to n):
::<math>C_{i,j}=A_{i,1}B_{1,j} + A_{i,2}B_{2,j} + A_{i,3}B_{3,j}  + ... + A_{i,k}B_{k,j}</math>
:Subscripts indicate the row and column indexing (respectively) into the  given array. When either A or B is a scalar, the mtimes operation should  be handled as a "times" operation. That is, the operation  becomes an element-by-element multiplication where each element of the  matrix input is multiplied by the scalar value and C is the same size as  the input matrix.
 
===Script Execution Requirements===
* The format for a single  script command is: <pre> C = function(A,B);</pre> where function is one of the above functions, A and B are the pre-defined constants / variables to use as input to function, and C is the output. Input B will be omitted for functions which require only one input. Each command of the script will end in a semi-colon ";". All commands must be performed in the order in which they appear in the script.
* The expected size, axisscale,  and labels associated with x will be stored in the <sourcedata> tab  (if any exist). These values can be used by an XML interpreter to verify  the data being analyzed.
* Constants are NOT case  sensitive and any interpreter must be written to consider the upper or lower  case variables as the same.
 
===Returned Results===
 
The results returned by a model prediction will be present as variables in the interpreter's workspace upon completion of the XML parsing. The returned results are the same as those listed for the [[Model_Exporter_M-file_Format#Returned_Results|M-file format]].
 
==Requirements for XML Writers==
 
The following rules are to be followed by the script creation algorithm of Model_Exporter. These rules may be of interest to script interpreters, but should not have any critical impact on interpreter design.
 
* Nesting of functions is not  allowed. Functions can only take variables or pre-defined constants as  input.
* NO iterative processes are  supported. All scripts must be straight-through executing (no control  structures such as "ifs", "while", etc are supported.)
* Missing data replacement  is not supported.
* As of version 1.0 of this product,  only variables or pre-defined constants may be used in a function. No  "in-line" constants may be used. For example:
 
    C = minus(A,1);
 
is invalid because the constant "1" has to be pre-defined. This command should instead be written where the "1" is pre-defined as a constant and the name of that constant is used.
 
* Variables are NOT case  sensitive and any interpreter must be written to consider the upper or  lower case variables as the same. Note, however, that the Matlab output of  this function will be case-sensitive code so the scripts should try to be  consistent in case, even if other interpreters won't care.

Revision as of 14:47, 20 December 2011

Introduction

Eigenevctor Research's PLS_Toolbox and Siemens' SiPAT product can be used together for deployment of PLS_Toolbox or Solo multivariate analysis models in process control applications. This integration utilizes Mathworks' Matlab and functionality built into SiPAT to run custom Matlab functions. These functions can include calls to PLS_Toolbox functions.

The following discusses the PLS_Toolbox-specific configuration and provides example m-files which can be used to quickly get SiPAT and PLS_Toolbox integrated.

Licensing

Note that the PLS_Toolbox license requires that, unless special site licensing arrangements are made, you must obtain a separate PLS_Toolbox license for each instance of any PLS_Toolbox code you wish to deploy. Special licensing options are available. See the Eigenvector Website for more information.

Installation and Basic Configuration

SiPAT Configuration

The user is directed to the Siemens documentation for details on configuration of SiPAT for use with Matlab. The discussion below describes some of the PLS_Toolbox-specific considerations of this configuration.

Matlab and PLS_Toolbox Configuration

With Matlab already installed (as per The Mathworks installation instructions), PLS_Toolbox can be copied onto the target computer using any of the available file types as described in the standard installation instructions. The primary differences from the standard installation will be that you will create two special files and place these files into the PLS_Toolbox/utilities folder on the target computer:

  1. Create a text file named evrilicense.lic and put your license code (available from the Eigenvector Research download page) as a single line in that file. You can also obtain a evrilicense.lic file from the Eigenvector Helpdesk.
  2. Create an empty text file named evrinetwork.lic. This file will instruct PLS_Toolbox to ignore installation errors and run without warnings (necessary when calling PLS_Toolbox functions from within SiPAT.)

Specific Model and M-file Configuration

The SiPAT interface into Matlab provides for the specification of a data file to load and a Matlab function to execute (along with specific input and output configuration information.)

In the examples given here, the data file will be in the Matlab MAT format and will contain the model to apply. The Matlab function will be given in the Matlab m-file format and will contain the specific instructions for applying the model and returning the results to SiPAT. In these examples, it is assumed there is only one model that is being applied per each method (no special calibration transfer or other pre-transformation steps being used).

Creation of the Model MAT File

The model you wish to apply to new data should be saved from Matlab or Solo into a MAT file as the one and only item stored in the MAT file. In the Analysis Window, this is done by selecting the menu item: File > Save Model and using the WorkspaceBrowser_ImportingData#To_save_imported_data_to_a_.mat_file save dialog to specify a filename and an item name. This will save the model into the given filename with the specified item name. Although the MAT file name can be any standard filename, it will make m-file construction easier if the item name used is always the same in all saved models. In the example here, we will assume that the item is called "model". If a different name is used, the SiPAT-specific configuration comments in the m-file (see below) will need to be modified to indicate to SiPAT the name used for the model.

If saving the model from the Matlab command line, the following command can be used (assuming the model to use is currently named "model" in your Matlab workspace) :

 save myfile.mat model

Creation of the Function M-file

The second part to the SiPAT/PLS_Toolbox interface is a Matlab m-file which contains a function definition (Note: The contents of this m-file must actually be a Matlab function, meaning it must contain a function header line as shown in the scripts below. It cannot be a "script" in the strict Matlab definition of that term which implies code that is not wrapped inside a function definition.)

Three example m-files are given below: one for use with any of the regression model types (PLS, PCR, MLR, CLS, SVM, LWR, etc), one for use with classification model types (PLSDA, KNN, SIMCA, SVMDA/SVM-C), and one for use with principal component analysis (PCA) models. Each of these m-files assumes that the input is a single vector of values (passed by SiPAT) and each returns two or three values which correspond to the predictions from the model.

These functions all also assume that the input data will be two columns where the first column contains axis scale information for the variables (such as wavelength, m/z, time, etc) and the second column is the actual measured data. If this does not fit the type of data being passed by SiPAT (e.g. no axisscale information), the initial lines:

    %convert second column of input data into a dataset (if not appropriate column, change next line)
    x = data(:,2);
    x = dataset(double(x'));

    %Assume first column is axisscale information. If not true, comment out the next line
    x.axisscale{2} = double(data(:,1));

should be converted as necessary to handle the input data. For example, if only a single column of values is being passed, the following can be substituted in for the above code:

    %convert column of input data into a dataset
    x = dataset(double(data'));

Regression Model Predictions

The following m-file contents are appropriate for use with regression model types:

%START SIPAT
%<CONFIG>
%   <MODEL>REGRESSION</MODEL>
%   <PREFIX></PREFIX>
%   <SUFFIX></SUFFIX>
%   <INPUTS>
%       <INPUT Name="data" XDataType="MultiValue" YDataType="Single" />
%   </INPUTS>
%   <OUTPUTS>
%       <OUTPUT Name="y" XDataType="SingleValue" YDataType="Double" />
%       <OUTPUT Name="q_x" XDataType="SingleValue" YDataType="Double" />
%       <OUTPUT Name="H_x" XDataType="SingleValue" YDataType="Double" />
%   </OUTPUTS>
%   <FUNCTION>[y,q_x,H_x]=regpred(data,model)</FUNCTION>
%</CONFIG>
%END SIPAT
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Regression using PLS_toobox from Eigenvector
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This function calculates the responses y and the Q-residuals
%(q_x) and Hotellings T2 (H_x) corresponding to the input variables data
%(data) using a standard model structure.
%
%Before using this function, make sure you loaded the following data into
%the workspace:
%  model: Model structure used in PLS_Toolbox/Solo including all pretreatment.

function [y,q_x,H_x] = regpred(data,model)

try
    %convert second column of input data into a dataset (if not appropriate column, change next line)
    x = data(:,2);
    x = dataset(double(x'));

    %Assume first column is axisscale information. If not true, comment out the next line
    x.axisscale{2} = double(data(:,1));

    %make a prediction
    opts =[];
    opts.plots='none';
    opts.display='off';
    pred_x = feval(lower(model.modeltype),x,model,opts);

    %return prediction in y  
    y = double(pred_x.pred{2});
    if isfield(pred_x,'ssqresiduals') 
        q_x = double(pred_x.ssqresiduals{1,1}./pred_x.detail.reslim{1});
        H_x = double(pred_x.tsqs{1}./pred_x.detail.tsqlim{1});
    else
        q_x = 0;
        H_x = 0;
    end

catch
    %errors are saved to the following file (change location as desired)
    fid=fopen('C:\sipat_error_sout.txt','w');
    fwrite(fid,encode(lasterror));
    fwrite(fid,encode(evalin('base','whos')));
    fclose(fid);
end

Classification Model Predictions

The following m-file contents are appropriate for use with classification model types:

%START SIPAT
%<CONFIG>
%   <MODEL>CLASSIFICATION</MODEL>
%   <PREFIX></PREFIX>
%   <SUFFIX></SUFFIX>
%   <INPUTS>
%       <INPUT Name="data" XDataType="MultiValue" YDataType="Single" />
%   </INPUTS>
%   <OUTPUTS>
%       <OUTPUT Name="y" XDataType="SingleValue" YDataType="Double" />
%       <OUTPUT Name="q_x" XDataType="SingleValue" YDataType="Double" />
%       <OUTPUT Name="H_x" XDataType="SingleValue" YDataType="Double" />
%   </OUTPUTS>
%   <FUNCTION>[y,q_x,H_x]=classpred(data,model)</FUNCTION>
%</CONFIG>
%END SIPAT
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Classification using PLS_toobox from Eigenvector
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This function calculates the numerical class assignment, the Q-residuals
%(q_x) and Hotellings T2 (H_x) corresponding to the input variables data
%(data) using a standard model structure.
%
%Before using this function, make sure you loaded the following data into
%the workspace:
%  model: Model structure used in PLS_Toolbox/Solo including all pretreatment.

function [y,q_x,H_x] = classpred(data,model)

try
    %convert second column of input data into a dataset (if not appropriate column, change next line)
    x = data(:,2);
    x = dataset(double(x'));

    %Assume first column is axisscale information. If not true, comment out the next line
    x.axisscale{2} = double(data(:,1));

    %make a prediction
    opts =[];
    opts.plots='none';
    opts.display='off';
    pred_x = feval(lower(model.modeltype),x,model,opts);

    %return prediction in y  
    y = double(pred_x.classification.mostprobable); 
    if isfield(pred_x,'ssqresiduals') 
        q_x = double(pred_x.ssqresiduals{1,1}./pred_x.detail.reslim{1});
        H_x = double(pred_x.tsqs{1}./pred_x.detail.tsqlim{1});
    else
        q_x = 0;
        H_x = 0;
    end

catch
    %errors are saved to the following file (change location as desired)
    fid=fopen('C:\sipat_error_sout.txt','w');
    fwrite(fid,encode(lasterror));
    fwrite(fid,encode(evalin('base','whos')));
    fclose(fid);
end