Calculates the median of already grouped data given the interval ranges and the frequencies of each group.

GroupedMedian(frequencies, intervals, sep = NULL, trim = NULL)

Arguments

frequencies

A vector of frequencies.

intervals

A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, intervals may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the GroupedMedian function automatically create the required matrix for you.

sep

Optional. If the intervals are represented by a character vector with a character separating the interval ranges.

trim

Characters to trim from the vector before splitting. For example, if you are doing this on the output of cut (where, for some reason, you no longer have access to the original data), you can use the pre-set trim pattern "cut".

Value

A single numeric value representing the grouped median.

References

http://stackoverflow.com/a/18931054/1270695

Author

Ananda Mahto

Examples

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), class = "data.frame", row.names = c(NA, -10L)) mydf
#> salary number #> 1 1500-1600 110 #> 2 1600-1700 180 #> 3 1700-1800 320 #> 4 1800-1900 460 #> 5 1900-2000 850 #> 6 2000-2100 250 #> 7 2100-2200 130 #> 8 2200-2300 70 #> 9 2300-2400 20 #> 10 2400-2500 10
GroupedMedian(frequencies = mydf$number, intervals = mydf$salary, sep = "-")
#> [1] 1915.294
## Example with intervals manually specified X <- rbind(c(1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400), c(1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500)) GroupedMedian(mydf$number, X)
#> [1] 1915.294
set.seed(1) x <- sample(100, 100, replace = TRUE) y <- data.frame(table(cut(x, 10))) GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
#> [1] 47.2