Create random data.frame — sorandf • overflow

sorandf creates a data frame with columns corresponding to different types of random data. It is intended for making reproducible examples easier.

sorandf(
  rows = 10L,
  cols = c("race", "gender", "age"),
  names = make.names(cols)
)

sorandf_add(newf, name)

sorandf_reset()

Arguments

rows	integer. number of rows in data.frame.
cols	character. Vector of types of columns.
names	character. Vector of names of columns in data.frame output.
newf	function to add to sorandf functions. Must have just one parameter, n.
name	character. Name to call new function.

Details

Each column must have a function defining it. It is possible to add new functions using the sorandf_add function and reset back to the defaults using the sorandf_reset function. Within this function, there are a number of defaults which can be called by default:

id = function(n) paste0("ID.", 1:n)

group = function(n) sample(c("control", "treat"), n, replace = TRUE)

hs.grad = function(n) sample(c("yes", "no"), n, replace = TRUE)

race = function(n) sample(c("black", "white", "asian"), n, replace = TRUE, prob=c(.25, .5, .25))

gender = function(n) sample(c("male", "female"), n, replace = TRUE)

age = function(n) sample(18:40, n, replace = TRUE)

m.status = function(n) sample(c("never", "married", "divorced", "widowed"), n, replace = TRUE, prob=c(.25, .4, .3, .05))

political = function(n) sample(c("democrat", "republican", "independent", "other"), n, replace= TRUE, prob=c(.35, .35, .20, .1))

n.kids = function(n) rpois(n, 1.5)

income = function(n) sample(c(seq(0, 30000, by=1000), seq(0, 150000, by=1000)), n, replace=TRUE)

score = function(n) rnorm(n)

Author

Sebastian Campbell, Tyler Rinker

Examples

if (FALSE) {
sorandf(15, c("id", "age", "score"), names= c("card", "years", "points"))

sorandf_add(function(n){sample(1:10, n)}, "newf")
sorandf(10, c("gender", "race", "newf"))
sorandf_reset()
}