Whether and how well people can behave randomly is of interest in many areas of psychological research. The ability to generate randomness is often investigated using random number generation (RNG) tasks, in which participants are asked to generate a sequence of numbers that is as random as possible. However, there is no consensus on how best to quantify the randomness of responses in human-generated sequences. Traditionally, psychologists have used measures of randomness that directly assess specific features of human behavior in RNG tasks, such as the tendency to avoid repetition or to systematically generate numbers that have not been generated in the recent choice history, a behavior known as cycling. Other disciplines have proposed measures of randomness that are based on a more rigorous mathematical foundation and are less restricted to specific features of randomness, such as algorithmic complexity. More recently, variants of these measures have been proposed to assess systematic patterns in short sequences. We report the first large-scale integrative study to compare measures of specific aspects of randomness with entropy-derived measures based on information theory and measures based on algorithmic complexity. We compare the ability of the different measures to discriminate between human-generated sequences and truly random sequences based on atmospheric noise, and provide a systematic analysis of how the usefulness of randomness measures is affected by sequence length. We conclude with recommendations that can guide the selection of appropriate measures of randomness in psychological research.