Boxplot

From Eigenvector Research Documentation Wiki
Revision as of 13:17, 9 June 2014 by imported>Jeremy (→‎Options)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Purpose

Box plot showing various statistical properties of a data matrix.

Synopsis

s = boxplot(x,...);
boxplot(x,cls,...);
boxplot(ax,x,cls, ...);
boxplot(...,param1,val1,param2,val2,...);
boxplot(...,options);

Description

Generates a "box plot" which includes a box indicating the inner 50th percentile of the data (known as the interquartile range, IQR), whiskers showing robust data range, outliers, and mean and median shown as points. Each column of the supplied data matrix, x, is plotted as a separate box/whisker set.

boxplot

The following features are included for each column:

  • The top and bottom of the box are the 25th and 75th percentiles (defined as Q(0.25) and Q(0.75) respectively). The size of the box is called the Interquartile Range (IQR) and is defined as IQR = Q(0.75)-Q(0.25). ['Tag'='Box']
  • The whiskers extend to the most extreme data points which are not considered outliers (see below for definition of outliers.) These extreme whiskers are called the 'Upper Adjacent Value' and 'Lower Adjacent Value' ['Tag'='Upper Adjacent Value', 'Tag'='Lower Adjacent Value']
  • The horizontal line inside the box (or the dot in a circle "target" - see the medianstyle option) represents the median ['Tag'='Median'].
  • The dot inside the box is the mean ['Tag'='Mean'].
  • Data which falls outside the IQR box by a specific amount are considered "outliers". There are two categories of outliers, Standard and Extreme, which are defined by how far outside the IQR the points fall. By default, these limits are 1.5 and 3.0 (see options qrtlim and qrtlimx, below.) The limits are used as defined below:
  • Standard Outliers fall between 1.5*IQR and 3.0*IQR outside of the IQR and are plotted with an open circle ['Tag'='Outliers']
  • Extreme Outliers fall greater than 3.0*IQR outside the IQR and are plotted with a closed circle ['Tag'='OutliersX'].
For example, if the lower and upper ranges on the IQR are 4 and 6, the IQR=2.0 and values between 6+(1.5*2) and 6+(3*2) ( = 9 and 12) or between 4-(1.5*2) and 4-(3*2) ( = 1 and -2) are considered standard outliers. Samples above 12 or below -2 are considered extreme outliers.
  • When selected via the notch option, notches or markers can be viewed which show the estimated 95% confidence limits for the median. If the confidence limits for two medians do not overlap, they can be assumed to be different at the 95% confidence limit. The notch can be viewed either as a narrowing of the box at the median (where the start and end of the tapered region indicate the upper and lower confidence limits) or as an upper and lower triangle marker which are centered on the 95% confidence limits. ['Tag'='NotchHi', 'Tag'='NotchLo']
The total range between the upper and lower notch bounds is calculated using the formula:
width = IQR*1.58 / sqrt(n)
where n is the number of data items in the given column.

For ease of handle graphics manipulation, the axes children have been given tag names as defined in the [ ] above.

Inputs

  • x = MxN matrix of class 'double' or 'dataset'
For boxplot(x), N boxes corresponding to columns are plotted.

Optional Inputs

  • cls = flag / indicator variable governing how the data are grouped or classed. cls can be numeric, cell or character array. If numeric or cell, cls must have as many elements as (x) OR contain N elements. If a character array, it must have as many rows as (x) has elements OR contain N rows. cls and (options) are not input together. For boxplot(x,cls), length(unique(cls)) boxes are plotted. If cls has N elements, N boxes corresponding to columns are plotted. If cls has as many elements as (x), then length(unique(cls)) boxes are plotted.
  • ax = target axes handle to plot to. For boxplot(ax,x,cls), the plot will be a child of (ax).

Outputs

  • s = 7xN matrix of results. The rows of s correspond to [Sl; Q(0.25); Q(0.5); Q(0.75); Sh; mean, count].

Options

options = a structure array with any of the following fields. When called using the I/O: boxplot(...,param1,val1,param2,val2,...), the parameter names and values must correspond to the options fields and value options given below.

----
  • useclass: [{'yes'} | 'no'], when (x) is a DataSet object:
options.useclass = 'yes' will use x.label{2} as the tick label when length(unique(cls))==N, otherwise if N==1 then x.classid{1} will be used.
  • prctlo: [0.25] low percentile {default = 0.25 for 25th percentile}
  • prcthi: [0.75] high percentile {default = 0.75 for 75th percentile}
  • qrtlim: [1.5] limit for defining outliers
  • qrtlimx: [3] limit for defining extreme outliers
____
  • plotstyle: [{'traditional'}| 'compact' ] 'compact' automatically chooses default options to produce a plot that is easier to view when there are lots of bars being displayed. See below for defaults.
  • boxwidth: [0.45] box width
  • boxstyle: [{'outline'}| 'filled' ] Style of box. Note that 'filled' will change the colors used for the median object(s).
  • boxcolor: ['b'] Defines the color of the IQR box. String color or vector giving the fractional [red green blue] color code.
____
  • meancolor: ['r'] Defines the color to use for the mean marker.
  • meansymbol: ['.'] Defines symbol to use for the mean symbol.
____
  • outliersize: [6] Size of the outlier markers, in points (n/72 inch).
  • symbol: ['ob'] Marker (and optional color) of outlier markers. Extreme outliers will use the filled version of the given symbol.
  • jitter: [0] Governs addition of random off-axis offset to make overlapping outliers visible. Indicates maximum +/- offset to allow (using a uniform distribution). If zero, all outliers are in-line with no offset.
____
  • medianstyle: [{'line'}| 'target'] Governs method of displaying the median:
'line' = Draws a line for the median.
'target' = Draws a dot inside a white circle.
  • notch: [{'off'} | 'on' | 'marker'] Governs display of 5% significance range for median comparisons. If the notch ranges for two medians do not overlap, they are considered different at the 95% confidence level.
'off' = Do not show median comparison marks.
'on' = Indicates intervals using notches on IQR box when plotstyle is 'traditional' or triangular markers when plotstyle is 'compact'.
'marker' = Indicates intervals using triangular markers for upper and lower ranges.
____
  • labelorientation: [{'horizontal'}| 'inline' ] Governs directionality of labels.
'inline' = Rotates the labels to be vertical. This is the default when plotstyle is 'compact'.
'horizontal' = Leaves the labels horizontal. This is the default when plotstyle is 'traditional'.
  • legend: [ 'off' |{'on'}] Governs "nice legend" mode. When 'on' the plot is created such that the standard legend tool produces a nice looking legend (but not all graphical objects can be "found" by users wanting to do customization). If set to 'off', every graphical object will displayed in the legend box (creating a very poor looking legend). 'off' should only be used when some customization is going to be done by the caller using findobj graphical commands which require all graphical object tags to be visible. It is recommended to use 'on' unless user-customization is desired via Matlab uigraphics commands.

Notes

  1. plotstyle = 'compact' uses the following defaults. The user can override any of these by simply passing an alternative value explicitly:
    boxstyle = 'filled'
    medianstyle = 'target'
    symbol = 'o'
    outliersize = 4
    jitter = 0.5
    labelorientation = 'inline'
  2. This function has a known conflict with the Mathworks (MW) Statistics Toolbox function of the same name. This function is intended to be as compatible as practical with the MW version, including many of the above options which are designed to be similar to the corresponding MW version, but such compatibility is not guaranteed. For more information on this issue, see [our FAQ on this issue]

Examples

  • The generic boxplot which uses the defaults of 'plotstyle'='traditional' and 'notch'='off':
boxplot(x)

boxplot

  • The generic boxplot with median notches turned on.
boxplot(x,'notch','on')

boxplot

  • The "compact" plotstyle (accommodates more columns) with median notches indicated with triangle markers.
boxplot(x,'plotstyle','compact','notch','on')

boxplot

  • Filled box style with median notches shown.
boxplot(x,'boxstyle','filled','notch','on')

boxplot

See Also

hline, plotgui, summary