Solo Predictor Script Construction

From Eigenvector Research Documentation Wiki
Revision as of 22:08, 1 March 2011 by imported>Jeremy
Jump to navigation Jump to search

Solo_Predictor and Solo provide a simple, flexible scripting language with which clients can send instructions to load data, apply a model to that data ("make a prediction"), and retrieve results. A typical exchange follows this sequence:

  1. load data
  2. load model
  3. apply model to data (make a prediction)
  4. return prediction results

This section describes the details of how to format a script, what commands are available and how the commands are used. The next section gives several quick-start example scripts which can be used as templates to perform some standard analyses.

Some familiarity with multivariate analyses and modeling are presumed by this chapter. The user is directed to the PLS_Toolbox and Solo Manual and Tutorial to learn more about specific modeling methods and multivariate analysis in general.

It may also be useful to review the Solo Predictor Script Commands Summary as a starting point for familiarizing yourself with the scripting language.

Workspace Objects

Each command in a script creates or operates on objects stored by Solo_Predictor for the client. Objects include:

DataSet Objects – contain data to be used in predictions.

Preprocessing Objects – contain instructions for how to apply preprocessing to a DataSet.

Calibration Model Objects – contain details on a calibration model from PLS_Toolbox or Solo.

Calibration Transfer Model Objects – contain details on a calibration transfer model from PLS_Toolbox or Solo.

Prediction Objects – contain results from applying a calibration.

Some other object types can be created and modified using the Advanced Scripting with Objects methods.

When created, each object must be given a unique name (up to 64 characters in length) using only letters, numbers, and the underscore character ( _ ). Object names may not contain spaces and may not start with a number ("a1" is allowed, "1" is not), but are otherwise unlimited. Giving a new object the same name as a previously existing object overwrites the original object.

All objects exist in a persistent "workspace" – it remains intact from one request call to another. In addition, this workspace is usually unique to the individual client so that no two clients can access the objects in another's workspace (however, see the privateworkspace option in the Installation and Configuration section.)

Script Commands

Script commands fall into these categories:

1. Importing commands – bring data or objects into Solo_Predictor

obj_data = 'content'

2. Object creation commands - create advanced object types (see Advanced Scripting with Objects)

obj = @objecttype(properties)

3. Model and Preprocessing Application commands – apply an object to data

obj_result = obj_data | obj_model

4. Return Value Request commands – request the value contained in an object.

obj_result

5. Response Message Format commands – set the output format for returned results and errors

:format

6. Write To File and Export commands – creates an output file containing results

:writefile

7. Other commands

:command

8. Comments – not processed by Solo_Predictor.

//comment
#comment
%comment

Each script can contain one or more commands, so more than one operation can be performed in a single call. Multiple commands are separated by semicolons. For example:

 command; command; command

White-space characters in a command (i.e. spaces, tabs, line-feeds) are generally ignored. They can be included for readability but are not generally required. The exceptions to this rule are noted below (for example, in some of the data importing formats where white-space may be required.)

All script commands are also summarized Solo_Predictor Script Commands Summary.

Importing Commands

Nearly all scripts start with one or more importing commands. To perform a prediction, a client must request the import of data and a model (at a minimum). Importing commands create data, models, or preprocessing objects in the client's workspace. These can be created from:

  1. Reading the object from one of a number of supported file types, or
  2. Creating the object directly from text in the script.

In either case, the format of the command is the new object's name, followed by an equal sign and the content for the object in single quotes:

 obj_1 = 'content'

The content takes various forms depending on the source of the object. Note that if the content needs to include a single quote, it should be escaped by adding a backslash in front of the quote: \' This is most often necessary when importing an XML object (see below)

Importing From a File

Any of the objects used by Solo_Predictor can be loaded from a disk file. This is the most common form of import command when a client program can only save data to disk, and is also the most common command used to load a model. A file that is stored either locally on the server's computer or on a network-mounted drive can be loaded by providing the path and filename (including extension) of the file enclosed in single quotes.

obj_1 = 'C:/full/path/filename.ext'

Note that the path to the file is always relative to the server, not the client. Solo_Predictor will automatically recognize the filetype and use the appropriate file reader to import the file.

If reading from a MATLAB .mat file containing more than one variable, a specific variable to load must be specified in the import command. This is done by appending a question mark followed by the variable name, all inside the single quotes.

obj_1 = 'C:/full/path/filename.mat?variable'

This is only necessary when the .mat file contains more than one variable. If only one variable is present, that variable will be loaded without a specific variable name being specified.

The Features and Supported Methods section discusses valid file types, and other types may be available. In general, the list of file types supported by the Analysis GUI and Workspace Browser in Solo and PLS_Toolbox are also supported by Solo_Predictor.

Creating Data From Script Text

Data can be created from a list of comma-separated values by simply passing those values as text between the quotation marks:

obj_1 = '1,2,3,4'

This would create an object that contains the values 1 through 4. Note that this form of input provides no means to assign labels or axis values corresponding to these values. As a result, no variable alignment or correction for missing or extra variables can be performed. It is assumed that the number of variables passed is appropriate for use with the model of interest.

Creating Objects from XML

Any of the objects used by Solo_Predictor can be created from an XML format that is supported by both PLS_Toolbox and Solo. In this form, the content of the import command is an object's XML description.

obj_1 = '<obj>(XML formatted content)</obj>'

Although XML input is more complex, it enables the widest range of features supported by Solo_Predictor, including variable replacement and alignment. To simplify its use, a template XML file showing the tags necessary to create a DataSet object is discussed in DataSet XML Format. This template can be used to create and pass appropriate XML for these objects.

Any object that has been exported from Solo or PLS_Toolbox as XML (including DataSet, Model, and Preprocessing objects) can be stored and passed into Solo_Predictor to re-create the given object.

Advanced Object Creation and Modification

Objects can be created from an empty object definition using the "at" symbol prefix before an object type and its properties in parenthesis:

obj = @objecttype(properties)

The exact objects available depend on the product being used, but most support DataSet objects and EVRIScript objects. In addition, the inputs required by these objects differ.

Creating and working with objects is discussed on the Advanced Scripting with Objects page.

Application of Models and Preprocessing Commands

Once a calibration model, calibration transfer model, or preprocessing object has been imported into Solo_Predictor, it can be applied to a DataSet object using the application command. This command consists of an output object name, an equal sign, a DataSet object's name, the bar character, and the model or preprocessing object to apply. The output of the application command depends on the type of object being applied.

 modifieddata = data_obj | preprocessing_obj
 modifieddata = data_obj | caltransfer_obj
 prediction_obj = data_obj | model_obj

When applying a preprocessing object to a DataSet object, the result is always another DataSet which contains the modified data. This DataSet can then be used in subsequent application commands or retrieved using a Return Value Request command, described below.

When applying a calibration transfer model to data, the result is a modified version of the original DataSet object that can be used in a subsequent command (much like with preprocessing objects.)

A calibration model applied to data outputs a prediction object. Prediction objects are similar in content to a model, but contain results specific to the application results. It is most often the fields of this prediction object which a client will want to retrieve as the final result of a model application.

Requesting Return Values

The eventual goal of most scripts is to return one or more values to the client. A large number of different statistics, results, and information are available. These are all stored as properties of the objects in the client's workspace. The specific value or values an individual client will need to retrieve depends largely on the model being applied and the intended use of the model. Some typical requests will be discussed later.

A script can specify the values to return by sending a return request command. This command specifies an object's name followed by a period and the name of the property (also known as "field") to return.

 Obj.property

The specified object's property will be returned to the client when the script finishes. Properties may contain strings, single numeric values, numeric vectors, numeric arrays or complex objects. The format of the contents will be based on the Response Message Format described later in this section.

Note that the returned value will be the value of the object and property at the point in the script where the statement occurs. Changing or clearing the object or property after the request statement will not affect the returned value.

Retrieving Multiple Values

A given script can only return a single set of values. This can be from a single return request command or multiple "compatible" return request commands. Commands are compatible if the specified values can be concatenated horizontally into a row vector or matrix. To retrieve multiple compatible values, the script can simply include several return request commands within the body of the script (with terminating semicolons as is always necessary between commands.) For example:

Obj_A.property1;
Obj_A.property2;
Obj_B.property1;

If the commands refer to objects or data tables which cannot be combined in this manner (e.g. different number of rows or incompatible objects), Solo_Predictor will return a script error.

If multiple incompatible properties need to be retrieved by a script, multiple connections will need to be made to Solo_Predictor retrieving the desired properties one at a time. Remember that because objects are persistent from call to call, the objects created by one call to Solo_Predictor remain available for subsequent calls. See the Scripting Examples for an example of retrieving multiple values. Another option to retrieve multiple outputs is the Write to File command discussed later in this section.

Common Return Properties

Model and prediction objects can be queried for a description of the properties which are typically used in predictions from the given model type. Sending the command:

 Model_Obj.help.predictions

will return an XML description of properties of Model_Obj which the client program might want to make available to the user. This description is comprised of a single tag enclosing multiple tags, each containing the description of an available property in the model including the following tags:

  • <label> contains a text description of the property's contents
  • <field> contains the full property name which should be used to access the given value. Add the contents of this field onto the string name of the prediction or model object followed by a period: pred.field
  • <dimension> contains a string describing the type of return value provided in this property for the prediction of a single sample. This will be "scalar" (single value), "vector", or "matrix" and can be used by the client to ignore return value types which it cannot manage. "vector" means a single row of numbers. "matrix" means a table of values.

For example, the following XML fragment describes two fields available from a PCA model:

<tr>
  <td>
    <label class="string">Scores</label>
    <field class="string">loads{1}</field>
    <dimension class="string">vector</dimension>
  </td>
  <td>
    <label class="string">Hotelling's T^2</label>
    <field class="string">tsqs{1}</field>
    <dimension class="string">scalar</dimension>
  </td>
</tr>

The first tag describes the Model_Obj.loads{1} property which contains Scores and will be returned as a vector. The second tag describes the Model_Obj.tsqs{1} property which contains the Hotelling's T^2 (T-squared) value which will be returned as a scalar (single value).

The client can use this list to populate a table (or other GUI) with outputs available from the predictions using a given model. Note that the help.predictions list is the same from a model object and any prediction objects made from it.

Because only model and prediction objects provide self-description of useful properties, the following the following tables, organized by object type, are provided to guide the user to some of the standard properties which clients may be interested in retrieving. Note that some properties may contain additional indexing information in either "curly" braces { } or in standard parenthesis ( ) and that this indexing must be included as shown.

Table 1. Commonly used properties for DataSet objects

Property Description / Content
.data The numerical data
.label{2} Labels for the variables (if any)
.axisscale{2} Numerical axis scale for the variables (if any)

Table 2. Commonly used properties for Prediction and Model objects

Property Description / Content
.scores The numeric scores of a model
.t2 The Hotelling's T-squared value. These values will generally be "reduced" so that a value of 1 is at the pre-defined confidence limit.

See .detail.options.confidencelimit property to retrieve the confidence limit used for the reduced value.

.q The sum-squared residuals (a.k.a. SPE) These values will generally be "reduced" so that a value of 1 is at the pre-defined confidence limit.
.prediction The y prediction(s) for regression models.
.tcon The Hotelling's T-squared contributions.
.qcon The Q contributions (X-block residuals).

Response Message Format

All results returned by Solo_Predictor are text-based but the exact format of the text can be selected to best match the client program's ability to parse text. The format is selected using a format command which consists of a colon followed by a format keyword. A format command can be located anywhere within your script, before or after a return request command. As always, each command must be separated from other commands by semicolons. The following are valid output formats:

:xml

Selects the XML format. This consists of three tags: result, error, date.

result : Contains any output produced by a "return value" command. The class of the returned value is given in the class attribute of this tag. Standard classes are "string" and "numeric" but other complex objects may be returned. Numeric values are given in comma- and semicolon-delimited format where commas delimit row-wise elements in vectors and arrays and semicolons delimit column-wise elements in vectors and arrays. For example, an array with two rows of 5 numbers would be returned as:

 1,2,3,4,5;6,7,8,9,10

In addition, the result tag will have a size attribute included with any string or numeric value. It will give the expected result's size in rows and columns. This can be used by the client to prepare an appropriately-sized matrix for the parsed result and for error checking on the parsed result.

error : Contains a text description of any errors which occurred during the script execution. If no errors occurred, the error tag will be empty.

date : Contains the date and time the request was received by the server.

In XML format, a response with no errors and no results will return as an XML structure with result and error tags empty. Date will contain the relevant date information as usual.

Example:

<response>
<result     class="numeric" size="[2,5]">
1,2,3,4,5;6,7,8,9,10
</result>
<error     class="string"/>
<date     class="string">Thu 06 Sep 2007 14:21:08</date>
</response>

:plain

Selects the plain-text format. If any errors occurred, this will be an error message starting with the text: "ERROR:" and will supersede any results.

When no errors occurred, the content of any return values will be given in a space- and linefeed-delimited format where spaces delimit row-wise elements in vectors and arrays and linefeeds delimit column-wise elements in vectors or arrays. For example, an array with two rows of 5 numbers would appear as:

1     2     3     4     5
6     7     8     9    10

Note that multiple spaces will almost always be used between row-wise elements.

In plain-text format, a response with no errors and no results will return as an empty (null) message.

:html

Selects an HTML-friendly format. This output is appropriate for display by any standard HTML-parser such as a web browser; All text is enclosed in preformatted text tags (<pre>). If any errors occurred, they will be displayed much like the plain format. Otherwise, the result will be included in an XML format (same as the result field in the xml format.)

If more than one format command is included in a script, only the last-encountered format command will be used.

The default format for the response is set by the default_format option. See the Installation and Configuration section for more information.

Write To File Command

Another method of returning results to a client is the write-to-file command. This command provides a simple way to create a specifically-named output file based on a user-created "template" file. This command is particularly useful when the client is unable to parse returned strings, when passing particularly complicated return values, or when Solo_Predictor is being used by legacy systems which expect a file response. See the example scripts section for an example of their use.

NOTE: By default, the writefilefolder option in the configuration file, default.xml, is blank. This disables the writefile command. This option must be configured before using the writefile command in any script.

The general format of the command is:

:writefile 'path/output.ext' 'path/template.tem' -append

When encountered in a script, this command specifies the output filename and extension ('path/output.ext', which should include the entire path to the output file), the name of the template file ('path/template.tem' also including a full path) and a flag indicating if the results should be appended to the end of the output file rather than replacing the file ("-append"). All values except the output filename are optional. If the -append flag is omitted, any existing output file is overwritten. If the template filename is omitted, a template file named 'output.tem' is expected in the same folder as the output file. Note that the only acceptable template file type is ".tem".

The template file can consist of static text and replacement keys. Static text is written to the output file exactly as it appears in the template file (including linefeeds, spaces, tabs and all other text characters). Replacement keys consist of any return request command, described earlier, inside square brackets [ ]. When the output file is written, replacement keys are replaced with the value(s) indicated by the contained return request command. For example, consider the following template text:

1, "residuals", [pred.q]
2, "tsquared", [pred.t2]
3, "y-prediction [[conc]]", [pred.prediction]

This template would create an output file like:

1, "residuals", 0.3225352
2, "tsquared", 0.1212351
3, "y-prediction [conc]", 12.25252

Note the following:

  • The <writefilefolder> option in the default.xml file configures the folder into which files can be written and template files may be read. If this option is set to empty (blank string), then no :writefile commands will be permitted.
  • An optional formatting instruction can be included inside the replacement key brackets. This instruction consists of a simple fprintf instruction (%_._f) followed by a comma and the usual return request command. :For example:
[%i,pred.prediction]
The percent sign can be followed by optional precision indicators _._ but must be followed by a format character (usually i, f, g, or e). For more information on formatting commands see: SPRINTF Documentation] )
  • When the return request command inside of a replacement key consists of more than one value, all values will be returned as comma separated list. To return in any other format, the template file must be hard-coded to access each value individually: [pred.prediction(1)] [pred.prediction(2)]
  • To include square brackets as static text in a template, use double brackets [[ and ]].
  • When using the -append mode, make certain to include any linefeed characters required. Without linefeed characters, each subsequent output will be appended to the same line.

Export command

Any object can be exported to one of several filetypes using the command:

   :export 'output.ext' object

The extension on output filename 'output.ext' defines the filetype and must be one of the following:

.xml = XML encoding of the object
.mat = MAT file containing the object
.m = m-file encoding of the object (Matlab m-code which, when executed, will recreate the object)
.csv = CSV encoding of the object (only valid for dataset objects or numeric arrays)
.smat = SMAT file containing the object (only available if secureMAT file feature is installed)

Note: This command requires the writefilefolder configuration property be assigned.

Other Commands

Several other commands exist for various operations. Each consists of a colon followed by the command name and can generally be used anywhere within a script.

:include 'path/script.txt'

Include the contents of the specified file in the current script. The indicated text file (which can have any file name and extension but must include the full path to the file) will be executed by Solo_Predictor as if it had been passed explicitly in the current script.

This command permits a client to pass a simple :include command to invoke a possibly complicated script. The included script is always executed in the same workspace as the calling script and the calling script has access to any objects created by the included script.

:clear

Clears all objects from the client's workspace. An error will result if a script clears all objects from the workspace, then subsequently attempts to refer to a cleared object.

:list

Returns a list of the names of all objects in the client's workspace. This is generally only useful as the last instruction in a given script as subsequent return value requests will overwrite the output of this command.

Scripting Examples

Reading data from file (local or network)

This example reads data from an SPC-format file located on a remote computer, reads a PLS model from a .mat file assumed to be the only variable in a file located on the local computer, and applies the model to the data. Finally, it returns the predictions from that application.

data = '//datacollector/datafolder/spectrum1.spc';
model = 'C:/modelfolder/mymodel.mat';
pred = data | model;
pred.prediction

Recall that spaces and linefeeds are not necessary (only the semicolons are necessary) but they are included in the above script to help readability.

Two subsequent calls are then used to issue commands to retrieve first the T2 value:

pred.t2

and then the Q (sum squared residuals) values:

pred.q

Reading data from passed CSV

This example starts by pre-reading a PCA model (when the client first starts), then makes predictions using that model by passing directly to the server in a comma-separated values format. The Q contributions and Hotelling's T2 contributions are then retrieved.

When the client starts, it sends the single-command script

model = 'C:/modelfolder/mymodel.mat';

This model will remain in memory as long as the server is running.

Next, the following script is executed each time the client is ready to make a prediction. It passes the seven values expected by the model and makes a prediction.

:plain;
data = '0,100,1242,2320,14,-50,232';
pred = data | model;

Because this script returns no outputs and uses the "plain" response format, the successful application of the model will be indicated by an empty message returned by Solo_Predictor. The client can easily test for any errors by checking for a non-empty return string from the server.

With a successful model application, the client can retrieve the Q contributions:

:plain; pred.qcon

Again, the script requests a "plain" response format so the Q contributions (which are a row-vector of numbers) are returned in a space-delimited format. Finally, another script is sent to retrieve the T2 contributions:

:plain; pred.tcon

The comma-separated return values from this script can also be easily parsed by the client.

Reading data from passed XML

To modify the previous script to pass an XML data structure and include a series of labels with the data, the following script would be used for prediction calls:

:plain;
data = '<obj class="dataset">
<data class="numeric"> 0,100,1242,2320,14,-50,232</data>
  <label>
    <set>
      <mode>2</mode>
      <name>variable labels</name>
      <content>
        <sr>tempA</sr>
        <sr>Aspeed</sr>
        <sr>tempB</sr> 
        <sr>Bspeed</sr> 
        <sr>Rate</sr> 
        <sr>SetPoint</sr> 
        <sr>Time</sr> 
      </content>
    </set>
  </label>
</obj>';
pred = data | model;

For more information on the XML format, see DataSet_XML_Format.

Wait for file and output to results file

Wait for file provides a simple method to interface to programs which do not have the capability to send data through sockets or an ActiveX interface. In this example, we will assume that a client program has been configured to save data files (in .xy format) to a specified "drop folder". Solo_Predictor will monitor this folder and, when a new file appears, it will load the data into an object named "data" (this name is not configurable) and execute an analysis script we will call "analyze.scr".

For this mode, the following options must be configured:

waitforfile must be "on"
waitfolder must give the full path to the drop folder ("c:/df" in this example)
waitfilespec must be defined as "*.xy" (to ignore all other file types in the folder.)
waitscript must point to the "analyze.scr" with the entire path to the file.

The analyze.scr will contain the steps we want to use to analyze the results, and it will contain the instructions to write the results out to a results file.

analyze.scr

modl = '//network-drive/modelfolder/mymodel.mat';
pred = data | modl;
:writefile 'c:/df/results.txt' 'c:/df/template.tmp'

The first line loads the model from a .mat file located on a network drive. The second line applies the model to the data loaded from the dropped file. The final line writes the results out to a file named "results.txt" in a folder named "df". The template file would be defined to output any of the results needed from this application. For an example, see the :writefile definition in the Script Construction section.