Parsemixed

From Eigenvector Research Documentation Wiki
Revision as of 14:32, 10 June 2014 by imported>Scott (→‎Options)
Jump to navigation Jump to search

Purpose

Parse numerical and text data into a DataSet Object.

Synopsis

data = parsemixed(a,b,options);
data = parsemixed(a,delim,options);
data = parsemixed(a,options);

Description

Given two inputs containing a numerical array a and a matching cell array containing text b, PARSEMIXED outputs a DataSet object with a "logical" interpretation of the numerical and text data. It identifies contiguous block of numbers and then attempts to interpret text as labels and label names for that block of data.

Note that this function is also called by xclreadr and xlsreadr among other importing functions. In most cases, those function support passing of options directly to PARSEMIXED to customize importing. See the options below for custom importing options.

Inputs

  • a = a numerical array containing the numerical portion of the data to parse (NOTE: NaN's are OK), OR a text or cell array of data to parse (see (delim) below)
  • b = a cell array of the same size as (a) but containing any strings which were not interpretable as numbers.
  • delim = passed in place of (b), this will be a delimiter to use to parse plain text of (a). If omitted, (a) will be searched for a common delimiter in nearly all lines.

Optional Inputs

  • options = discussed below.

Outputs

  • data = a DataSet object formed from the parsing of the input data.

Options

options = a structure array with the following fields:

  • labelcols: [] specifies one or more columns of the file which should be interpreted as text labels for rows even if parsable as numbers,
  • labelrows: [] specifies one or more rows of the file which should be interpreted as text labels for columns even if parsable as numbers,
  • includecols: [] Specifies one or more columns of the file which should be interpreted as the "include" field for ROWS of the matrix (i.e. this column specifies which rows should be included). Multiple items in this list will be combined using a logical "and" (all must be "1" to include field.
  • includerows: [] Specifies one or more rows of the file which should be interpreted as the "include" field for COLUMNS of the matrix (see above notes about includecols).
  • classcols: [] Specifies one or more columns of the file which should be interpreted as classes for rows of the data.
  • classrows: [] Specifies one or more rows of the file which should be interpreted as classes for columns of the data.
  • axisscalecols: [] Specifies one or more columns of the file which should be interpreted as axisscales for rows of the data.
  • axisscalerows: [] Specifies one or more rows of the file which should be interpreted as axisscales for columns of the data.
  • ignorecols: [] Specifies one or more columns of the file which should be ignored and not imported.
  • ignorerows : [] Specifies one or more rows of the file which should be ignored and not imported.
  • parseengine: [ 'simple' |{'regexp'}] Governs which text parse engine to use. 'regexp' supports far more formats including double-quoted strings and the delimiter and format options below. 'simple' is a less-featured parser which gives the behavior of older parsemixed versions. Setting 'regexp' is not supported in Matlab 6.5.
  • multipledelim: [ 'single' |{'multiple'}] Governs how to handle consecutive delimiters with no content between them. 'multiple' considers each delimiter as sequence of empty elements (NaNs). 'single' considers multiple successive delimiters as a single delimiter.
  • leadingdelim : [ 'ignore' |{'missing'}] Governs handling of delimiters which appear at the beginning of a line. If 'ignore' any leading delimiters are ignored. If 'missing', all leading delimiters are considered as indicating a missing value and NaN will be placed into the given element.
  • euformat: [{'off'}| 'on' ] Governs the use of European Union format for decimals. 'on' expects decimal values to be specified using a comma to separate the whole and fraction parts of a number. e.g: 3,23 = 3.23. NOTE: cannot be used with comma delimiters.
  • maxpreviewcols : [48] Number of columns to show in visual mode.
  • maxpreviewrows : [50] Number of rows to show in visual mode.
  • compactdata: [ 'no' | {'yes'} ] Specifies if columns and rows which are entirely excluded should be permanently removed from the table.
  • transpose : [ {'no'} | 'yes' ] Specifies if the parsed data is transposed (samples are columns) and that the resulting DataSet needs to be transposed after parsing.
  • waitbar: [ 'off' | {'on'} ] Specifies whether waitbars should be shown while the data is being processed.
  • useimporttool: [ {'off'} | 'on' ] Use GUI to identify label/class/axis rows and columns.

See Also

areadr, dataset, importtool, xclreadr, xlsreadr