Boxplot

From Eigenvector Research Documentation Wiki
Revision as of 12:02, 20 December 2011 by imported>Jeremy
Jump to navigation Jump to search

Purpose

Box plot showing various statistical properties of a data matrix.

Synopsis

boxplot(x,...);
boxplot(x,cls,...);
boxplot(ax,x,cls, ...);
boxplot(...,param1,val1,param2,val2,...);
boxplot(...,options);

Description

Generates a "box plot" which includes a box indicating the inner 50th percentile of the data (known as the interquartile range, IQR), whiskers showing robust data range, outliers, and mean and median shown as points. Each column of the supplied data matrix, x, is plotted as a separate box/whisker set.

boxplot

The following features are included for each column:

  • The top and bottom of the box are the 25th and 75th percentiles (defined as Q(0.25) and Q(0.75) respectively). The size of the box is called the Interquartile Range (IQR) and is defined as IQR = Q(0.75)-Q(0.25). ['Tag'='Box']
  • The whiskers extend to the most extreme data points which are not considered outliers (see below for definition of outliers.) These extreme whiskers are called the 'Upper Adjacent Value' and 'Lower Adjacent Value' ['Tag'='Upper Adjacent Value', 'Tag'='Lower Adjacent Value']
  • The horizontal line inside the box (or the dot in a circle "target" - see the medianstyle option) represents the median ['Tag'='Median'].
  • The dot inside the box is the mean ['Tag'='Mean'].
  • Data which falls outside the IQR box by a specific amount are considered "outliers". There are two categories of outliers, Standard and Extreme, which are defined by how far outside the IQR the points fall. By default, these limits are 1.5 and 3.0 (see options qrtlim and qrtlimx, below.) The limits are used as defined below:
  • Standard Outliers fall between 1.5*IQR and 3.0*IQR outside of the IQR and are plotted with an open circle ['Tag'='Outliers']
  • Extreme Outliers fall greater than 3.0*IQR outside the IQR and are plotted with a closed circle ['Tag'='OutliersX'].
For example, if the lower and upper ranges on the IQR are 4 and 6, the IQR=2.0 and values between 6+(1.5*2) and 6+(3*2) ( = 9 and 12) or between 4-(1.5*2) and 4-(3*2) ( = 1 and -2) are considered standard outliers. Samples above 12 or below -2 are considered extreme outliers.
  • When selected via the notch option, notches or markers can be viewed which show the estimated 95% confidence limits for the median. If the confidence limits for two medians do not overlap, they can be assumed to be different at the 95% confidence limit. The notch can be viewed either as a narrowing of the box at the median (where the start and end of the tapered region indicate the upper and lower confidence limits) or as an upper and lower triangle marker which are centered on the 95% confidence limits. ['Tag'='NotchHi', 'Tag'='NotchLo']
The total range between the upper and lower notch bounds is calculated using the formula:
width = IQR*1.58 / sqrt(n)
where n is the number of data items in the given column.

For ease of handle graphics manipulation, the axes children have been given tag names as defined in the [ ] above.

Inputs

  • x = MxN matrix of class 'double' or 'dataset'
For boxplot(x), N boxes corresponding to columns are plotted.

Optional Inputs

  • cls = flag / indicator variable governing how the data are grouped or classed. cls can be numeric, cell or character array. If numeric or cell, cls must have as many elements as (x) OR contain N elements. If a character array, it must have as many rows as (x) has elements OR contain N rows. cls and (options) are not input together. For boxplot(x,cls), length(unique(cls)) boxes are plotted. If cls has N elements, N boxes corresponding to columns are plotted. If cls has as many elements as (x), then length(unique(cls)) boxes are plotted.
  • ax = target axes handle to plot to. For boxplot(ax,x,cls), the plot will be a child of (ax).

Outputs

  • s = 6xN matrix of results. The rows of s correspond to [Sl; Q(0.25); Q(0.5); Q(0.75); Sh; mean].

Options

options = a structure array with the following fields:

For boxplot(...,param1,val1,param2,val2,...), the parameter names and values must correspond to the options fields and values given below.
  • useclass: [{'yes'} | 'no'], when (x) is a DataSet object:
options.useclass = 'yes' will use x.label{2} as the tick label when length(unique(cls))==N, otherwise if N==1 then x.classid{1} will be used.
  • prctlo: [0.25] low percentile {default = 0.25 for 25th percentile}
  • prcthi: [0.75] high percentile {default = 0.75 for 75th percentile}
  • qrtlim: [1.5] limit for defining outliers
  • qrtlimx: [3] limit for defining extreme outliers
  • boxwidth: [0.45] box width

See Also

hline, plotgui