Conflicting results in the literature raise the question of how reliable findings from single experiments on pedestrian crowd movements are. An important example is the effect social groups have on crowd egress times from confined spaces where both increases and decreases were reported. We identify only six comparable studies and conduct the first analysis on this topic that integrates evidence from multiple studies quantitatively, accounting for different sample sizes. The aggregated findings suggest social groups increase average egress times but there is insufficient evidence to reject the null hypothesis of no effect. The conflicting results across published studies are thus likely to have arisen by chance, as experiments are statistically underpowered for determining a small effect. We find no evidence for publication bias in terms of findings or statistical power. Our work presents a quantitative basis for discussing the statistical reliability of experiments considering the high context-dependency of pedestrian dynamics.