Parsexml: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Scott
imported>Jeremy
 
(9 intermediate revisions by 3 users not shown)
Line 5: Line 5:
===Synopsis===
===Synopsis===


:object = parseXML(filename)
:[object,theStruct] = parsexml(filename,nooutertag);
If input <tt>filename</tt> is omitted, the user will be prompted for a file name  to read.


===Description===
===Description===


Creates Matlab object from XML file. The format of the file must follow that used by ENCODEXML. Each XML tag will be encoded as a field in a Matlab structure. The top-level tag will be the single field in the top-level of the returned structure and all sub-tags will be sub-fields therein. Contents of those fields can be specified using the following attributes
Creates Matlab object from XML file. The format of the file must follow that used by ENCODEXML. Each XML tag will be encoded as a field in a Matlab structure. The top-level tag will be the single field in the top-level of the returned structure. All sub-tags will be sub-fields. Contents of the fields are specified using the 'class' attributes of each tag.  When 'class' is omitted, a single-entry (non-array) structure is assumed. Tags with the attribute 'class' will be encoded using the following rules:
 
Tags with the attribute 'class' will be encoded using these rules


*'''class="string"'''   : Contents encoded as string or padded string array. If multiple row string, each row should be enclosed in <tt>&lt;sr&gt;</tt> tags.
*'''class="string"''': Contents encoded as string or padded string array. If multiple row string, each row should be enclosed in <tt>&lt;sr&gt;</tt> tags.
<ul>
<pre><oneitem class="string">Just One String</oneitem></pre>
<pre><multirow class="string>
  <sr>Row 1 string</sr>
  <sr>Row 2 string</sr>
  ...
</multirow></pre>
</ul>
*'''class="numeric"'''  : Contents of tag must be a comma-delimited list of values with rows delimited by semicolons. Each row must have the same number of entries (each row must be equal in length) or an error will result. Multi-way matricies can be encapsulated in <tt>&lt;tn mode="i"&gt;</tt> tags where i is the mode that the enclosed item expands on (i>=3).


*'''class="numeric"'''   : Contents of tag must be comma-delimited list of values with rows delimited by semicolons. Each row must have the same number of values (equal in length) or an error will result. Multi-way matricies can be encapulated in <tt>&lt;tn mode="i"&gt;</tt> tags where i is the mode that the enclosed item expands on (i>=3).
::'''Encoding''': Numeric class contents can be encoded as comma-separated values (csv) which is the default, or using base64 encoding. The encoding attribute can be supplied to specify when the contents are encoded using other than CSV. Options include:
:* encoding = "csv"          (default)
:* encoding = "base64"
::When base64 encoding is used, the additional attribute '''precision''' can be included to specify the precision of the numerical values encoded. Options include:
:* precision="64"  for 64-bit double precision values (default)
:* precision="32"  for 32-bit single precision values
:* precision="8"    for 8-bit unsigned integer values
:* precision="1"    for boolean logical values


::Example: row vector
::Example: row vector
      &lt;item class="numeric"&gt; 1,2,3,4 &lt;/item&gt;  
<ul><pre>
 
&lt;item class="numeric"&gt; 1,2,3,4 &lt;/item&gt;  
</pre></ul>
::Example: 2-way matrix
::Example: 2-way matrix
      &lt;item class="numeric"&gt; 11,12,13,14; 21 22 23 24 &lt;/item&gt;  
<ul><pre>
&lt;item class="numeric"&gt; 11,12,13,14; 21 22 23 24 &lt;/item&gt;  
</pre></ul>


::Example: 3-way
::Example: 3-way
      &lt;item class="numeric"&gt;
<ul><pre>
        &lt;tn mode="3"&gt;  
&lt;item class="numeric"&gt;
          111,112,113,114; 121,122,123,124  
  &lt;tn mode="3"&gt;  
        &lt;/tn&gt;  
    111,112,113,114; 121,122,123,124  
        &lt;tn mode="3"&gt;  
  &lt;/tn&gt;  
          211,212,213,214; 221,222,223,224
  &lt;tn mode="3"&gt;  
        &lt;/tn&gt;  
    211,212,213,214; 221,222,223,224
      &lt;/item&gt;  
  &lt;/tn&gt;  
&lt;/item&gt;  
</pre></ul>


*'''class="cell"'''      : Contents encoded as Matlab cell. Format of contents is same as HTML table tags (<tt>&lt;tr&gt;</tt> for new row, <tt>&lt;td&gt;</tt> for new container/column) with the added tag of <tt>&lt;tn mode="i"&gt;</tt> to describe an multi-dimensional cell (see <tt>class="numeric"</tt>).
*'''class="cell"'''      : Contents encoded as Matlab cell. The format of contents is the same as HTML table tags (<tt>&lt;tr&gt;</tt> for a new row, <tt>&lt;td&gt;</tt> for a new container/column) with the added tag of <tt>&lt;tn mode="i"&gt;</tt> to describe a multi-dimensional cell (see <tt>class="numeric"</tt>).
::Example: 3-way cell (with strings in each cell)
::Example: 3-way cell (with strings in each cell)
<pre>
<ul><pre>
      &lt;item class="cell"&gt;  
&lt;item class="cell"&gt;  
        &lt;tn mode="3"&gt;  
  &lt;tn mode="3"&gt;  
            &lt;tr&gt; &lt;td&gt;slab 1, row 1, col 1&lt;td&gt; &lt;td&gt;slab 1, row 1, col 2&lt;td&gt; &lt;/tr&gt;
    &lt;tr&gt; &lt;td&gt;slab 1, row 1, col 1&lt;td&gt; &lt;td&gt;slab 1, row 1, col 2&lt;td&gt; &lt;/tr&gt;
            &lt;tr&gt; &lt;td&gt;slab 1, row 2, col 1&lt;td&gt; &lt;td&gt;slab 1, row 2, col 2&lt;td&gt; &lt;/tr&gt;
    &lt;tr&gt; &lt;td&gt;slab 1, row 2, col 1&lt;td&gt; &lt;td&gt;slab 1, row 2, col 2&lt;td&gt; &lt;/tr&gt;
        &lt;/tn&gt;  
  &lt;/tn&gt;  
        &lt;tn mode="3"&gt;  
  &lt;tn mode="3"&gt;  
            &lt;tr&gt; &lt;td&gt;slab 2, row 1, col 1&lt;td&gt; &lt;td&gt;slab 2, row 1, col 2&lt;td&gt; &lt;/tr&gt;
    &lt;tr&gt; &lt;td&gt;slab 2, row 1, col 1&lt;td&gt; &lt;td&gt;slab 2, row 1, col 2&lt;td&gt; &lt;/tr&gt;
            &lt;tr&gt; &lt;td&gt;slab 2, row 2, col 1&lt;td&gt; &lt;td&gt;slab 2, row 2, col 2&lt;td&gt; &lt;/tr&gt;
    &lt;tr&gt; &lt;td&gt;slab 2, row 2, col 1&lt;td&gt; &lt;td&gt;slab 2, row 2, col 2&lt;td&gt; &lt;/tr&gt;
        &lt;/tn&gt;  
  &lt;/tn&gt;  
      &lt;/item class="cell"&gt;  
&lt;/item class="cell"&gt;  
</pre>
</pre></ul>


*'''class="structure"''' : Used for struture arrays ONLY. Contents encoded into a structure array using array notation identical to that described for class="cell". If a structure is size [1 1] then it does not need to use array notation and must not be marked with this class attribute. Instead, the contents of the structure should simply be enclosed within the tag as sub-tags.
*'''class="structure"''' : Used for struture arrays ONLY. Contents encoded into a structure array use array notation identical to that described for class="cell". If a structure is size [1 1] then it does not need to use array notation and must not be marked with this class attribute. Instead, the contents of the structure should simply be enclosed within the tag as sub-tags.


*'''class="dataset"'''  : Contents will be interpreted as a DataSet Object. Any tags which do not map to valid DataSet Object fields will be ignored. See the DataSet definition for details on valid fields and ENCODEXML for example of DataSet XML format. When class is omitted, a single-entry (non-array) structure is assumed.
*'''class="dataset"'''  : Contents will be interpreted as a DataSet Object. Any tags that do not map to valid DataSet Object fields will be ignored. See the DataSet definition for details on valid fields and ENCODEXML for examples of the DataSet XML format. Also see the simplified "dso" class below.


*'''"Size" attribute''': Tags of class "numeric", "cell", or "structure" (structure-array only) should also include the attribute size="[...]" which gives the size of the tag's contents. Value for size must be enclosed in square brackets and must be at least two elements long (use [0,0] for empty). For example &lt;myvalue class="numeric" size="[3,4]"&gt; says that the field myvalue will be numeric with 3 rows and 4 columns. Size can be multi-dimensional as needed (size="[2,4,6,2]" implies that the contents of the tag will give a 4-dimensional array of the given sizes)
*'''class="dso"''' : Contents will be interpreted as a DataSet Object using the [[DataSet_XML_Format|simplified DataSet object definition]]. This format is generally much easier to use to define a DataSet.


If input <tt>filename</tt> is omitted, the user will be prompted for a file name to read.
*'''NOTE: "Size" attribute''': Tags of class "numeric", "cell", or "structure" (structure-array only) should also include the attribute size="[...]" which gives the size of the tag's contents. The size value must be enclosed in square brackets and must be at least two elements long (use [0,0] for empty). For example &lt;myvalue class="numeric" size="[3,4]"&gt; says that the field myvalue will be numeric with 3 rows and 4 columns. Size can be multi-dimensional as needed (size="[2,4,6,2]" implies that the tag contents will be a 4-dimensional array of the given sizes.
 
====Input====
* '''filename''' = XML filname to convert. If input (filename) is omitted, the user will be prompted for a file name to read.
 
====Optional Input====
* '''nooutertag''' = [ {false} | true ] when set to "true" this input indicates that the outer-most xml object should be stripped from the resulting output (object). This allows direct access to the object itself rather than a structure with the object as the first and only field of that structure.
 
====Outputs====
* '''object''' = MATLAB object.
* '''theStruct' = is the pre-parsed XML object and allows access to raw field attributes and other content that cannot be converted into a Matlab object.


===See Also===
===See Also===


[[autoimport]], [[encodexml]], [[xclreadr]]
[[autoimport]], [[encodexml]], [[textreadr]], [[xclreadr]]

Latest revision as of 10:20, 7 July 2015

Purpose

Convert XML file to a MATLAB structure.

Synopsis

[object,theStruct] = parsexml(filename,nooutertag);

If input filename is omitted, the user will be prompted for a file name to read.

Description

Creates Matlab object from XML file. The format of the file must follow that used by ENCODEXML. Each XML tag will be encoded as a field in a Matlab structure. The top-level tag will be the single field in the top-level of the returned structure. All sub-tags will be sub-fields. Contents of the fields are specified using the 'class' attributes of each tag. When 'class' is omitted, a single-entry (non-array) structure is assumed. Tags with the attribute 'class' will be encoded using the following rules:

  • class="string": Contents encoded as string or padded string array. If multiple row string, each row should be enclosed in <sr> tags.
    <oneitem class="string">Just One String</oneitem>
    <multirow class="string>
      <sr>Row 1 string</sr>
      <sr>Row 2 string</sr>
      ...
    </multirow>
  • class="numeric"  : Contents of tag must be a comma-delimited list of values with rows delimited by semicolons. Each row must have the same number of entries (each row must be equal in length) or an error will result. Multi-way matricies can be encapsulated in <tn mode="i"> tags where i is the mode that the enclosed item expands on (i>=3).
Encoding: Numeric class contents can be encoded as comma-separated values (csv) which is the default, or using base64 encoding. The encoding attribute can be supplied to specify when the contents are encoded using other than CSV. Options include:
  • encoding = "csv" (default)
  • encoding = "base64"
When base64 encoding is used, the additional attribute precision can be included to specify the precision of the numerical values encoded. Options include:
  • precision="64" for 64-bit double precision values (default)
  • precision="32" for 32-bit single precision values
  • precision="8" for 8-bit unsigned integer values
  • precision="1" for boolean logical values
Example: row vector
    <item class="numeric"> 1,2,3,4 </item> 
    
Example: 2-way matrix
    <item class="numeric"> 11,12,13,14; 21 22 23 24 </item> 
    
Example: 3-way
    <item class="numeric">
      <tn mode="3"> 
        111,112,113,114; 121,122,123,124 
      </tn> 
      <tn mode="3"> 
        211,212,213,214; 221,222,223,224
      </tn> 
    </item> 
    
  • class="cell"  : Contents encoded as Matlab cell. The format of contents is the same as HTML table tags (<tr> for a new row, <td> for a new container/column) with the added tag of <tn mode="i"> to describe a multi-dimensional cell (see class="numeric").
Example: 3-way cell (with strings in each cell)
    <item class="cell"> 
      <tn mode="3"> 
         <tr> <td>slab 1, row 1, col 1<td> <td>slab 1, row 1, col 2<td> </tr>
         <tr> <td>slab 1, row 2, col 1<td> <td>slab 1, row 2, col 2<td> </tr>
      </tn> 
      <tn mode="3"> 
         <tr> <td>slab 2, row 1, col 1<td> <td>slab 2, row 1, col 2<td> </tr>
         <tr> <td>slab 2, row 2, col 1<td> <td>slab 2, row 2, col 2<td> </tr>
      </tn> 
    </item class="cell"> 
    
  • class="structure" : Used for struture arrays ONLY. Contents encoded into a structure array use array notation identical to that described for class="cell". If a structure is size [1 1] then it does not need to use array notation and must not be marked with this class attribute. Instead, the contents of the structure should simply be enclosed within the tag as sub-tags.
  • class="dataset"  : Contents will be interpreted as a DataSet Object. Any tags that do not map to valid DataSet Object fields will be ignored. See the DataSet definition for details on valid fields and ENCODEXML for examples of the DataSet XML format. Also see the simplified "dso" class below.
  • NOTE: "Size" attribute: Tags of class "numeric", "cell", or "structure" (structure-array only) should also include the attribute size="[...]" which gives the size of the tag's contents. The size value must be enclosed in square brackets and must be at least two elements long (use [0,0] for empty). For example <myvalue class="numeric" size="[3,4]"> says that the field myvalue will be numeric with 3 rows and 4 columns. Size can be multi-dimensional as needed (size="[2,4,6,2]" implies that the tag contents will be a 4-dimensional array of the given sizes.

Input

  • filename = XML filname to convert. If input (filename) is omitted, the user will be prompted for a file name to read.

Optional Input

  • nooutertag = [ {false} | true ] when set to "true" this input indicates that the outer-most xml object should be stripped from the resulting output (object). This allows direct access to the object itself rather than a structure with the object as the first and only field of that structure.

Outputs

  • object = MATLAB object.
  • theStruct' = is the pre-parsed XML object and allows access to raw field attributes and other content that cannot be converted into a Matlab object.

See Also

autoimport, encodexml, textreadr, xclreadr