Wednesday, March 4, 2020

An Introduction to the Interquartile Range

An Introduction to the Interquartile Range The interquartile range (IQR) is the difference between the first quartile and third quartile. The formula for this is: IQR Q3 - Q1 There are many measurements of the variability of a set of data. Both the range and standard deviation tell us how spread out our data is. The problem with these descriptive statistics is that they are quite sensitive to outliers. A measurement of the spread of a dataset that is more resistant to the presence of outliers is the interquartile range. Definition of Interquartile Range As seen above, the interquartile range is built upon the calculation of other statistics. Before determining the interquartile range, we first need to know the values of the first quartile and third quartile. (Of course, the first and third quartiles depend upon the value of the median). Once we have determined the values of the first and third quartiles, the interquartile range is very easy to calculate. All that we have to do is to subtract the first quartile from the third quartile. This explains the use of the term interquartile range for this statistic. Example To see an example of the calculation of an interquartile range, we will consider the set of data: 2, 3, 3, 4, 5, 6, 6, 7, 8, 8, 8, 9. The five number summary for this set of data is: Minimum of 2First quartile of 3.5Median of 6Third quartile of 8Maximum of 9 Thus we see that the interquartile range is 8 – 3.5 4.5. The Significance of the Interquartile Range The range gives us a measurement of how spread out the entirety of our data set is. The interquartile range, which tells us how far apart the first and third quartile are, indicates how spread out the middle 50% of our set of data is. Resistance to Outliers The primary advantage of using the interquartile range rather than the range for the measurement of the spread of a data set is that the interquartile range is not sensitive to outliers. To see this, we will look at an example. From the set of data above we have an interquartile range of 3.5, a range of 9 – 2 7 and a standard deviation of 2.34. If we replace the highest value of 9 with an extreme outlier of 100, then the standard deviation becomes 27.37 and the range is 98. Even though we have quite drastic shifts of these values, the first and third quartiles are unaffected and thus the interquartile range does not change. Use of the Interquartile Range Besides being a less sensitive measure of the spread of a data set, the interquartile range has another important use. Due to its resistance to outliers, the interquartile range is useful in identifying when a value is an outlier. The interquartile range rule is what informs us whether we have a mild or strong outlier.  To look for an outlier, we must look below the first quartile or above the third quartile.  How far we should go depends upon the value of the interquartile range.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.