7월, 2018의 게시물 표시

Systematic Sampling in R

SampleBy in the doBy package that was useful in R is not supported. I have created sample.by systematic sampling function as below. ^^ ## usage : sample.by(data_as_dataframe, number_of_column, ratio_of_sample) ##             returns a list of sample.df and rest.df   sample.by <- function(df, by.col.loc=1, prop=0.1) {   sample.df <- data.frame()   rest.df <- data.frame()   dat <- split(df, df[by.col.loc])   for(i in 1:NROW(dat)) {     idx <- sample(c(rep(1,n<-round(NROW(dat[[i]]) * prop)),                     rep(2,NROW(dat[[i]]) - n)))     sample.df <- rbind(sample.df, dat[[i]][idx==1,])     rest.df  <- rbind(rest.df, dat[[i]][idx==2,])       }   list(sample.df=sample.df[sample(NROW(sample.df)),],        rest.df=rest.df[sample(NROW(rest.df)),]) } ## example sample.by(iris, 5, 0.7)