Boxplot: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
No edit summary
imported>Jeremy
No edit summary
Line 5: Line 5:
===Synopsis===
===Synopsis===


: boxplot(x,cls);
: boxplot(x,...);
: boxplot(ax,x,cls);
: boxplot(x,cls,...);
: boxplot(ax,x,cls, ...);
: boxplot(ax,x,cls, ...);
: boxplot(...,param1,val1,param2,val2,...);
: boxplot(...,param1,val1,param2,val2,...);
Line 13: Line 13:
===Description===
===Description===


Generates a "box plot" which includes a box indicating the inner 50th percentile of the data, whiskers showing data range, and mean and median shown as points. Each column of the supplied data matrix, '''x''', is plotted as a separate box/whisker set.
Generates a "box plot" which includes a box indicating the inner 50th percentile of the data (known as the interquartile range, IQR), whiskers showing robust data range, outliers, and mean and median shown as points. Each column of the supplied data matrix, '''x''', is plotted as a separate box/whisker set.


[[Image:Boxplot.png|||boxplot]]
[[Image:Boxplot.png|||boxplot]]
Line 19: Line 19:
The following features are included for each column:
The following features are included for each column:


* top and bottom of the box are the 25th and 75th percentiles Q(0.25) and Q(0.75) with IQR = Q(0.75)-Q(0.25). ['Tag'='Box']
* The top and bottom of the box are the 25th and 75th percentiles (defined as Q(0.25) and Q(0.75) respectively). The size of the box is called the Interquartile Range (IQR) and is defined as IQR = Q(0.75)-Q(0.25). ['Tag'='Box']
* dot inside the box is the mean ['Tag' = 'Mean'].
* The whiskers extend to the most extreme data points which are not considered outliers (see below for definition of outliers.) These extreme whiskers are called the 'Upper Adjacent Value' and 'Lower Adjacent Value' ['Tag'='Upper Adjacent Value', 'Tag'='Lower Adjacent Value']
* horizontal line inside the box is the median ['Tag' = 'Median'].
* The horizontal line inside the box (or the dot in a circle "target" - see the medianstyle option) represents the median ['Tag'='Median'].
* The whiskers extend to the most extreme data points not considered outliers i.e.:
* The dot inside the box is the mean ['Tag'='Mean'].
** Sh = the highest point < Q(0.75)+qrtlim*IQR ['Tag' = 'Upper Adjacent Value'] and
* Data which falls outside the IQR box by a specific amount are considered "outliers". There are two categories of outliers, Standard and Extreme, which are defined by how far outside the IQR the points fall. By default, these limits are 1.5 and 3.0 (see options qrtlim and qrtlimx, below.) The limits are used as defined below:
** Sl = the lowest point > Q(0.25)-qrtlim*IQR ['Tag' = 'Lower Adjacent Value']].
:* Standard Outliers fall between 1.5*IQR and 3.0*IQR outside of the IQR and are plotted with an open circle ['Tag'='Outliers']
 
:* Extreme Outliers fall greater than 3.0*IQR outside the IQR and are plotted with a closed circle ['Tag'='OutliersX'].
Outliers are determined and plotted using the following rules:
:For example, if the lower and upper ranges on the IQR are 4 and 6, the IQR=2.0 and values between 6+(1.5*2) and 6+(3*2) ( = 9 and 12) or between 4-(1.5*2) and 4-(3*2) ( = 1 and -2) are considered standard outliers. Samples above 12 or below -2 are considered extreme outliers.
 
* When selected via the notch option, notches or markers can be viewed which show the estimated 95% confidence limits for the median. If the confidence limits for two medians do not overlap, they can be assumed to be different at the 95% confidence limit. The notch can be viewed either as a narrowing of the box at the median (where the start and end of the tapered region indicate the upper and lower confidence limits) or as an upper and lower triangle marker which are centered on the 95% confidence limits. ['Tag'='NotchHi', 'Tag'='NotchLo']
* values > Q(0.75)+qrtlim*IQR and <= Q(0.75)+qrtlimx*IQR are plotted with an open circle ['Tag' = 'Outliers'].
:The total range between the upper and lower notch bounds is calculated using the formula:
* values > Q(0.75)+qrtlimx*IQR are plotted with a closed circle ['Tag' = 'OutliersX'].
::width = IQR*1.58 / sqrt(n)
* values <= Q(0.25)-qrtlim*IQR and > Q(0.25)-qrtlimx*IQR are plotted with an open circle['Tag' = 'Outliers'].
:where n is the number of data items in the given column.
* values <= Q(0.25)-qrtlimx*IQR are plotted with a closed circle ['Tag' = 'OutliersX'].
* The default is qrtlim  = 1.5 and qrtlimx = 3 (see options below).


For ease of handle graphics manipulation, the axes children have been given tag names as defined in the [ ] above.
For ease of handle graphics manipulation, the axes children have been given tag names as defined in the [ ] above.
Line 59: Line 57:
* '''qrtlim''': [1.5]  limit for defining outliers
* '''qrtlim''': [1.5]  limit for defining outliers
* '''qrtlimx''': [3]    limit for defining extreme outliers
* '''qrtlimx''': [3]    limit for defining extreme outliers
* '''boxwidth''': [0.45] box width
* '''boxwidth''': [0.45] box width



Revision as of 12:02, 20 December 2011

Purpose

Box plot showing various statistical properties of a data matrix.

Synopsis

boxplot(x,...);
boxplot(x,cls,...);
boxplot(ax,x,cls, ...);
boxplot(...,param1,val1,param2,val2,...);
boxplot(...,options);

Description

Generates a "box plot" which includes a box indicating the inner 50th percentile of the data (known as the interquartile range, IQR), whiskers showing robust data range, outliers, and mean and median shown as points. Each column of the supplied data matrix, x, is plotted as a separate box/whisker set.

boxplot

The following features are included for each column:

  • The top and bottom of the box are the 25th and 75th percentiles (defined as Q(0.25) and Q(0.75) respectively). The size of the box is called the Interquartile Range (IQR) and is defined as IQR = Q(0.75)-Q(0.25). ['Tag'='Box']
  • The whiskers extend to the most extreme data points which are not considered outliers (see below for definition of outliers.) These extreme whiskers are called the 'Upper Adjacent Value' and 'Lower Adjacent Value' ['Tag'='Upper Adjacent Value', 'Tag'='Lower Adjacent Value']
  • The horizontal line inside the box (or the dot in a circle "target" - see the medianstyle option) represents the median ['Tag'='Median'].
  • The dot inside the box is the mean ['Tag'='Mean'].
  • Data which falls outside the IQR box by a specific amount are considered "outliers". There are two categories of outliers, Standard and Extreme, which are defined by how far outside the IQR the points fall. By default, these limits are 1.5 and 3.0 (see options qrtlim and qrtlimx, below.) The limits are used as defined below:
  • Standard Outliers fall between 1.5*IQR and 3.0*IQR outside of the IQR and are plotted with an open circle ['Tag'='Outliers']
  • Extreme Outliers fall greater than 3.0*IQR outside the IQR and are plotted with a closed circle ['Tag'='OutliersX'].
For example, if the lower and upper ranges on the IQR are 4 and 6, the IQR=2.0 and values between 6+(1.5*2) and 6+(3*2) ( = 9 and 12) or between 4-(1.5*2) and 4-(3*2) ( = 1 and -2) are considered standard outliers. Samples above 12 or below -2 are considered extreme outliers.
  • When selected via the notch option, notches or markers can be viewed which show the estimated 95% confidence limits for the median. If the confidence limits for two medians do not overlap, they can be assumed to be different at the 95% confidence limit. The notch can be viewed either as a narrowing of the box at the median (where the start and end of the tapered region indicate the upper and lower confidence limits) or as an upper and lower triangle marker which are centered on the 95% confidence limits. ['Tag'='NotchHi', 'Tag'='NotchLo']
The total range between the upper and lower notch bounds is calculated using the formula:
width = IQR*1.58 / sqrt(n)
where n is the number of data items in the given column.

For ease of handle graphics manipulation, the axes children have been given tag names as defined in the [ ] above.

Inputs

  • x = MxN matrix of class 'double' or 'dataset'
For boxplot(x), N boxes corresponding to columns are plotted.

Optional Inputs

  • cls = flag / indicator variable governing how the data are grouped or classed. cls can be numeric, cell or character array. If numeric or cell, cls must have as many elements as (x) OR contain N elements. If a character array, it must have as many rows as (x) has elements OR contain N rows. cls and (options) are not input together. For boxplot(x,cls), length(unique(cls)) boxes are plotted. If cls has N elements, N boxes corresponding to columns are plotted. If cls has as many elements as (x), then length(unique(cls)) boxes are plotted.
  • ax = target axes handle to plot to. For boxplot(ax,x,cls), the plot will be a child of (ax).

Outputs

  • s = 6xN matrix of results. The rows of s correspond to [Sl; Q(0.25); Q(0.5); Q(0.75); Sh; mean].

Options

options = a structure array with the following fields:

For boxplot(...,param1,val1,param2,val2,...), the parameter names and values must correspond to the options fields and values given below.
  • useclass: [{'yes'} | 'no'], when (x) is a DataSet object:
options.useclass = 'yes' will use x.label{2} as the tick label when length(unique(cls))==N, otherwise if N==1 then x.classid{1} will be used.
  • prctlo: [0.25] low percentile {default = 0.25 for 25th percentile}
  • prcthi: [0.75] high percentile {default = 0.75 for 75th percentile}
  • qrtlim: [1.5] limit for defining outliers
  • qrtlimx: [3] limit for defining extreme outliers
  • boxwidth: [0.45] box width

See Also

hline, plotgui