Calculate the Median of Already Grouped Data

Calculates the median of already grouped data given the interval ranges and the frequencies of each group.

GroupedMedian(frequencies, intervals, sep = NULL, trim = NULL)

Arguments

frequencies	A vector of frequencies.
intervals	A 2-row `matrix` with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, `intervals` may be a column in your `data.frame`, and you may specify `sep` (and possibly, `trim`) to have the `GroupedMedian` function automatically create the required `matrix` for you.
sep	Optional. If the `intervals` are represented by a character vector with a character separating the interval ranges.
trim	Characters to trim from the vector before splitting. For example, if you are doing this on the output of `cut` (where, for some reason, you no longer have access to the original data), you can use the pre-set trim pattern `"cut"`.

Value

A single numeric value representing the grouped median.

References

http://stackoverflow.com/a/18931054/1270695

Author

Ananda Mahto

Examples


mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800",
        "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300",
        "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L,
        850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"),
        class = "data.frame", row.names = c(NA, -10L))
mydf
#>       salary number
#> 1  1500-1600    110
#> 2  1600-1700    180
#> 3  1700-1800    320
#> 4  1800-1900    460
#> 5  1900-2000    850
#> 6  2000-2100    250
#> 7  2100-2200    130
#> 8  2200-2300     70
#> 9  2300-2400     20
#> 10 2400-2500     10

GroupedMedian(frequencies = mydf$number, intervals = mydf$salary, sep = "-")
#> [1] 1915.294

## Example with intervals manually specified
X <- rbind(c(1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400),
           c(1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500))

GroupedMedian(mydf$number, X)
#> [1] 1915.294

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))

GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
#> [1] 47.2