Task‐fMRI researchers have great flexibility as to how they analyze their data, with multiple methodological options to choose from at each stage of the analysis workflow. While the development of tools and techniques has broadened our horizons for comprehending the complexities of the human brain, a growing body of research has highlighted the pitfalls of such methodological plurality. In a recent study, we found that the choice of software package used to run the analysis pipeline can have a considerable impact on the final group‐level results of a task‐fMRI investigation (Bowring et al., 2019, BMN). Here we revisit our work, seeking to identify the stages of the pipeline where the greatest variation between analysis software is induced. We carry out further analyses on the three datasets evaluated in BMN, employing a common processing strategy across parts of the analysis workflow and then utilizing procedures from three software packages (AFNI, FSL, and SPM) across the remaining steps of the pipeline. We use quantitative methods to compare the statistical maps and isolate the main stages of the workflow where the three packages diverge. Across all datasets, we find that variation between the packages' results is largely attributable to a handful of individual analysis stages, and that these sources of variability were heterogeneous across the datasets (e.g., choice of first‐level signal model had the most impact for the balloon analog risk task dataset, while first‐level noise model and group‐level model were more influential for the false belief and antisaccade task datasets, respectively). We also observe areas of the analysis workflow where changing the software package causes minimal differences in the final results, finding that the group‐level results were largely unaffected by which software package was used to model the low‐frequency fMRI drifts.