The theory of inference from simple random samples (SRSs) is fundamental in statistics; many statistical techniques and formulae assume that the data are an SRS. True random samples are rare; in practice, people tend to draw samples by using pseudo-random number generators (PRNGs) and algorithms that map a set of pseudo-random numbers into a subset of the population. Most statisticians take for granted that the software they use "does the right thing," producing samples that can be treated as if they are SRSs. In fact, the PRNG and the algorithm for drawing samples matter enormously. We show, using basic counting principles, that some widely used methods cannot generate all SRSs of a given size, and those that can do not always do so with equal frequencies in simulations. We compare the "randomness" and computational efficiency of commonly-used PRNGs to PRNGs based on cryptographic hash functions, which avoid these pitfalls. We judge these PRNGs by their ability to generate SRSs and find in simulations that their relative merits varies by seed, population and sample size, and sampling algorithm. These results are not just limited to SRSs but have implications for all resampling methods, including the bootstrap, MCMC, and Monte Carlo integration.
The Berkeley Statistics Annual Research Symposium (BSTARS) surveys the latest research developments in the department, with an emphasis on possible applications to statistical problems encountered in industry. The conference consists of keynote lectures given by faculty members, talks by PhD students about their thesis work, and presentations of industrial research by alliance members. The day-long symposium gives graduate students, faculty, and industry partners an opportunity to connect, discuss research, and review the newest development happening on-campus and in the field of statistics.
Kellie Ottoboni is a former BIDS Data Science Fellow and a graduate of UC Berkeley's Department of Statistics. Her research at BIDS focused on using robust nonparametric statistics and machine learning to make causal inferences from data in the health and social sciences. The goal was to make reliable inferences while making minimal assumptions about the models generating the data. In addition to developing new statistical methods and studying their theoretical properties, Kellie wrote open source software implementing nonparametric methods in R and Python.