Boxplot

From Eigenvector Research Documentation Wiki
Revision as of 09:02, 29 October 2009 by imported>Jeremy
Jump to navigation Jump to search

Purpose

Box plot showing various statistical properties of a data matrix.

Synopsis

boxplot(x,cls);
boxplot(ax,x,cls);
boxplot(ax,x,cls, ...);
boxplot(...,param1,val1,param2,val2,...);
boxplot(...,options);

Description

Generates a "box plot" which includes a box indicating the inner 50th percentile of the data, whiskers showing data range, and mean and median shown as points. Each column of the supplied data matrix, x, is plotted as a separate box/whisker set.

boxplot

The following features are included for each column:

  • top and bottom of the box are the 25th and 75th percentiles Q(0.25) and Q(0.75) with IQR = Q(0.75)-Q(0.25). ['Tag'='Box']
  • dot inside the box is the mean ['Tag' = 'Mean'].
  • horizontal line inside the box is the median ['Tag' = 'Median'].
  • The whiskers extend to the most extreme data points not considered outliers i.e.:
    • Sh = the highest point < Q(0.75)+qrtlim*IQR ['Tag' = 'Upper Adjacent Value'] and
    • Sl = the lowest point > Q(0.25)-qrtlim*IQR ['Tag' = 'Lower Adjacent Value']].

Outliers are determined and plotted using the following rules:

  • values > Q(0.75)+qrtlim*IQR and <= Q(0.75)+qrtlimx*IQR are plotted with an open circle ['Tag' = 'Outliers'].
  • values > Q(0.75)+qrtlimx*IQR are plotted with a closed circle ['Tag' = 'OutliersX'].
  • values <= Q(0.25)-qrtlim*IQR and > Q(0.25)-qrtlimx*IQR are plotted with an open circle['Tag' = 'Outliers'].
  • values <= Q(0.25)-qrtlimx*IQR are plotted with a closed circle ['Tag' = 'OutliersX'].
  • The default is qrtlim = 1.5 and qrtlimx = 3 (see options below).

For ease of handle graphics manipulation, the axes children have been given tag names as defined in the [ ] above.

Inputs

  • x = MxN matrix of class 'double' or 'dataset'
For boxplot(x), N boxes corresponding to columns are plotted.

Optional Inputs

  • cls = flag / indicator variable governing how the data are grouped or classed. cls can be numeric, cell or character array. If numeric or cell, cls must have as many elements as (x) OR contain N elements. If a character array, it must have as many rows as (x) has elements OR contain N rows. cls and (options) are not input together. For boxplot(x,cls), length(unique(cls)) boxes are plotted. If cls has N elements, N boxes corresponding to columns are plotted. If cls has as many elements as (x), then length(unique(cls)) boxes are plotted.
  • ax = target axes handle to plot to. For boxplot(ax,x,cls), the plot will be a child of (ax).

Outputs

  • s = 6xN matrix of results. The rows of s correspond to [Sl; Q(0.25); Q(0.5); Q(0.75); Sh; mean].

Options

options = a structure array with the following fields:

For boxplot(...,param1,val1,param2,val2,...), the parameter names and values must correspond to the options fields and values given below.
  • useclass: [{'yes'} | 'no'], when (x) is a DataSet object:
options.useclass = 'yes' will use x.label{2} as the tick label when length(unique(cls))==N, otherwise if N==1 then x.classid{1} will be used.
  • prctlo: [0.25] low percentile {default = 0.25 for 25th percentile}
  • prcthi: [0.75] high percentile {default = 0.75 for 75th percentile}
  • qrtlim: [1.5] limit for defining outliers
  • qrtlimx: [3] limit for defining extreme outliers
  • boxwidth: [0.45] box width

See Also

hline, plotgui