Increasing access to granular “big” data, such as electronic health records (EHRs), insurance claims, and population registries, has the potential to offer unique and important insights into perinatal and paediatric epidemiology. Big data on health outcomes can be linked to novel data sources, such as environmental sensors, genomic biobanks, wearable devices, or social media, to assess determinants of health that have been historically difficult to measure. In addition, studies using big data can often be completed more quickly, at a lower cost, and with fewer ethical concerns than primary data collection. Yet, the promise of these advances must be met with renewed attention toward study rigour to minimise new threats to validity that are unique to big data. Further, studies using big data require highly computational workflows that may be subject to biases introduced while processing and analysing the data, underscoring the importance of adopting ‘best practices for reproducibility’. We outline the challenges and opportunities for rigour and reproducibility in the context of big data perinatal and paediatric epidemiology (Figure 1). Scientific rigour requires careful application of the scientific method in the design, conduct, analysis, interpretation, and reporting of a study to ensure results are robust and minimally biased. Randomised controlled trials are thought to provide the highest-quality evidence but are difficult to conduct in perinatal research due to narrow eligibility criteria and heightened ethical concerns in pregnancy and infancy. In the absence of randomised trials, rigorous evidence can be obtained from observational studies if selection bias and confounders, both measured and unmeasured, are sufficiently accounted for. Observational studies using big data may be particularly subject to selection bias since data are typically generated for clinical or commercial purposes rather than for research. Marginalised communities may be under-represented in EHRs and claim datasets because they receive less frequent care due to structural barriers. Wearables, commercial genetic tests, and social media may be more commonly used in populations with higher income and education levels. The systematic exclusion of certain populations in epidemiologic research cannot be fixed by relying on big data, and biased sampling frames may be harder to detect in pre-existing data sources. Studies constructing cohorts from population registries, EHRs, or claims data are vulnerable to left truncation bias when excluding early pregnancy losses or delayed pregnancy detection, as is common in perinatal studies. If factors that inform if an individual is represented in data also affect the perinatal relationship being studied, the resulting estimates will undoubtedly be biased. Selection biases cannot be easily ameliorated with big data, which only underscores the importance of both classical and novel approaches to detecting and minimising bias. G-methods (e.g., inverse probability weighting and doubly robust estimation) and quasi-experimental designs (e.g., instrumental variables and Mendelian randomization) can reduce bias and mimic RCTs with big data observational epidemiology. However, the rigour of evidence produced with these designs may be highly sensitive to investigators' assumptions. For example, in a study using Mendelian randomization, Diemer et al1 demonstrated how risk differences for prenatal alcohol exposure and attention deficit hyperactivity disorder are sensitive to choices in instrument selection and homogeneity assumptions. Employing causal inference methods with big data does not necessarily eliminate bias, and investigators must carefully assess causal assumptions and be transparent about potential violations. Big data may be susceptible to measurement error if they are not primarily collected for research purposes. Insurance claims and EHRs have become popular data sources for epidemiologic research, but the codes and fields they contain are not used consistently in clinical practice and may portend low sensitivity and specificity. A recent study comparing automated EHR data extraction versus manual chart abstraction for obstetrics research found that automatically extracted measurements had high reliability overall but lower accuracy for variables related to care processes (e.g., labor induction) or requiring provider interpretation (e.g., postpartum haemorrhage).2 Even sophisticated computational tools can be subject to classical forms of information bias. Validation studies and quantitative bias analyses are useful tools to assess the extent of, and potentially correct for, measurement error and misclassification bias in big data studies. Machine learning (ML) can be used to construct new variables from high-dimensional data and free-text fields, such as the use of clustering algorithms by Petersen et al,3 to learn placental features from tissue samples. However, opaque training processes can mask how information is extracted and make it difficult to assess biases in model outputs. Algorithmic bias, in which predictive performance differs across subgroups, can introduce differential misclassification if model accuracy is driven by factors that are informative to the research question at hand. Algorithmic fairness and mitigation methods are essential to minimising these biases in future big data perinatal and paediatric studies.4 While big data may include numerous variables that could be controlled for as potential confounders, care must be taken to avoid over-adjustment bias. For example, gestational age is commonly adjusted for as a confounder when it may in fact mediate a causal question.5 Similarly, whether it is appropriate to treat pregnancy history as a time-dependent confounder depends on the specific research question.6 Directed acyclic graphs are an essential tool for distinguishing confounders from mediators and minimising over-adjustment and collider biases.5, 6 Across scientific disciplines, the “reproducibility crisis” has exposed major vulnerabilities in the practice of science that are likely to increase as big data is used more frequently. New guidelines for rigour, reproducibility, and transparency from the National Institutes of Health and other funding bodies are an important first step.7 Yet, assessing rigour and reproducibility during funding decisions is not enough; researchers must embrace rigour and reproducibility best practices in every stage of their research (Figure 1). Researchers – no matter how well-intentioned or conscientious – are prone to confirmation bias. For example, during data curation and analysis, analysts make countless seemingly innocuous judgments that could cumulatively tilt them towards observing the expected effect. In addition, selective reporting of results from multiple statistical models (i.e., “p-hacking”) can produce apparent effects that otherwise are nonexistent. Conducting data processing and analyses with a masked version of the exposure or treatment variable can reduce confirmation bias because initial estimates viewed while checking code do not reflect study findings.8 Documenting and registering statistical pre-analysis plans holds researchers accountable to the methods they planned before seeing data. Big data studies involve code in every step, from constructing a study population to producing statistical estimates. Adopting reproducible computational workflows can reduce the chance of errors and streamline replication efforts. Best practices for maintaining these workflows include the use of open-source statistical software, modular analysis code, bash scripts, version control, and decision logs.8 Statistical software, such as R and Python, can be used to manage large multi-step analyses and compare intermediate data objects during replication. Breaking an analysis into modular, self-contained scripts allows researchers to assess the results of each step and efficiently identify points of replication failure. Bash scripts can be used to easily rerun scripts in the appropriate order to replicate study findings. Version control tools, such as Github, keep a history of scripts and allow researchers to annotate changes in scripts over time. They can be used to pinpoint specific edits that cause changes in results and make coding processes transparent. Researchers can use decision logs to clearly note necessary deviations from the pre-analysis plan, which increases transparency in reporting. Ideally, a study's computational workflow can be easily replicated from data extraction through table/figure generation, even by investigators not part of the original study team. Having multiple analysts internally replicate data processing and analysis scripts before publication can catch and resolve unintentional errors in code or identify decisions in data processing that are highly impactful in the results.8 Publishing data and code alongside a study manuscript allows for greater transparency and makes results findable, accessible, interoperable, and reusable (FAIR).9 They facilitate replication of study results by other investigators and ultimately may lead to more rapid scientific advances. However, many big data sources contain sensitive information that cannot be made publicly available. Synthetic data can be generated to allow for the publication of de-identified data that retain the statistical properties of the original, sensitive dataset. For example, Braddon et al generated synthetic data from sensitive EHRs using parametric and non-parametric methods and showed that the synthetic data could closely replicate estimates from the original data used.10 When generating synthetic data, context-dependent rules should be enforced to avoid unrealistic relationships between variables. With careful implementation, these developments in data sharing and security can help make perinatal epidemiologic research more transparent and reproducible. Articles in this special “Big Data” issue of Paediatric and Perinatal Epidemiology showcase the promise of big data for perinatal and paediatric epidemiology. Yet, with big data comes big responsibility. Using novel causal inference or machine learning methods with big data does not guarantee that a study is unbiased, and foundational epidemiologic methods to maximise study rigour remain just as essential as ever. As studies using big data become increasingly computationally intensive, best practices in reproducibility must become standard tools in the epidemiology toolbox. Anna Nguyen is a doctoral student in the Department of Epidemiology and Population Health at the Stanford School of Medicine. Her doctoral research focuses on using novel causal inference methods to study the effects of infectious disease interventions on maternal and child health outcomes. She is a consulting data scientist at Interwell Health, where she helps develop predictive models that project the risk of chronic kidney disease using claim data. She holds an MPH in Epidemiology and Biostatistics and a BA in Data Science and Public Health from the University of California, Berkeley. Jade Benjamin-Chung is an Assistant Professor in the Department of Epidemiology and Population Health at the Stanford School of Medicine. Her research focuses on identifying effective interventions to reduce environmentally mediated infectious diseases in mothers and young children. She completed the doctoral study in Epidemiology and a master's in Biostatistics from the University of California, Berkeley. Research reported in this publication was supported, in part, by the National Heart, Lung, and Blood Institute of the National Institutes of Health under the Training in Advanced Data and Analytics for Behavioural and Social Sciences Research Program (T32HL151323). Research reported in this publication was supported in part by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health (K01AI141616; Jade Benjamin-Chung, PI). Jade Benjamin-Chung is a Chan Zuckerberg Biohub Investigator.