Estimate Measures of Central Tendency for Already Grouped Data

Estimates the mean, median, and mode of already grouped data given the interval ranges and the frequencies of each group.

grouped_mean(frequencies, intervals, sep = NULL, trim = NULL)

grouped_mode(frequencies, intervals, sep = NULL, trim = NULL, method = 1)

grouped_median(frequencies, intervals, sep = NULL, trim = NULL)

Arguments

frequencies	A vector of frequencies.
intervals	A 2-column `matrix` with the same number of rows as the length of frequencies, with the first column being the lower class boundary, and the second column being the upper class boundary. Alternatively, `intervals` may be a character vector, and you may specify `sep` (and possibly, `trim`) to have the function automatically create the required `matrix`.
sep	Optional character that separates lower and uppper class boundaries if `intervals` is entered as a character vector.
trim	Optional leading or trailing characters to trim from the character vector being used for `intervals`. There is an in-built pattern to trim the breakpoint labels created by `base::cut()`. If you are using a `grouped_*` function on the output of `cut` (where, for some reason, you no longer have access to the original data), you can use `trim = "cut"`.
method	A single value (1 or 2) determining which method will be used to estimate the grouped mode. See the notes section for the different approaches.

Value

A single numeric value representing the grouped mean, median, or mode, depending on which function was called.

Details

Calculation of Grouped Mean

The following formula is used to calculate the grouped mean:

$$M = \frac{\sum f\times x}{n}$$

Where:

f = The frequency of each class
x = The width of each class
n = The sum of the frequencies

Calculation of Grouped Median

The following forumla is used to calculate the grouped median:

$$M = L +\frac{\frac{n}{2}-cf}{f} \times c$$

Where:

L = The lower boundary of the median class
n = The sum of the frequencies
cf = The cumulative frequency of the class below the median class
f = The frequency of the median class
c = The length of the median class

Calculation of Grouped Mode

The following formula is used to calculate the grouped mode if method = 1:

$$M = L + \left ( \frac{f1-f0}{\left ( 2 \times f1 \right ) - f0 - f2} \right ) \times c$$

Where:

L = The lower boundary of the mode class
f1 = The frequency of the mode class
f0 = The frequency of the class before the mode class
f2 = The frequency of the class after the mode class
c = The length of the mode class

Keep in mind that while it might be easy to say which is the modal group, the mode of the source data may not even be in that group. Additionally, it is possible for data to have more than one mode or conversely, no mode.

The following formula is used to calculate the grouped mode if method = 2:

$$M = (3 \times x) - (2 \times y)$$

Where:

x = The group median
y = The group mean

Examples


mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800",
        "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300",
        "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L,
        850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"),
        class = "data.frame", row.names = c(NA, -10L))
mydf
#>       salary number
#> 1  1500-1600    110
#> 2  1600-1700    180
#> 3  1700-1800    320
#> 4  1800-1900    460
#> 5  1900-2000    850
#> 6  2000-2100    250
#> 7  2100-2200    130
#> 8  2200-2300     70
#> 9  2300-2400     20
#> 10 2400-2500     10

with(mydf, grouped_median(frequencies = number, intervals = salary, sep = "-"))
#> [1] 1915.294

## Example with intervals manually specified
Freq <- mydf$number
X <- cbind(c(1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400),
           c(1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500))

grouped_median(Freq, X)
#> [1] 1915.294

# Using `cut`
set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))

with(y, grouped_mean(Freq, Var1, sep = ",", trim = "cut"))
#> [1] 50.69503
mean(x)
#> [1] 50.87

with(y, grouped_median(Freq, Var1, sep = ",", trim = "cut"))
#> [1] 47.2
median(x)
#> [1] 45

## Note that the mode might be really far off depending on the approach used
with(y, grouped_mode(Freq, Var1, sep = ",", trim = "cut"))
#> [1] 84.6
with(y, grouped_mode(Freq, Var1, sep = ",", trim = "cut", method = 2))
#> [1] 40.20994
tail(sort(table(x)))
#> x
#> 44 48 51 87 89 70 
#>  3  3  3  4  4  5