Proliferation markers and profiles have been recommended for guiding the choice of systemic treatments for breast cancer. However, the best molecular marker or test to use has not yet been identified. We did this study to identify factors that drive proliferation and its associated features in breast cancer and assess their association with clinical outcomes and response to chemotherapy. We applied an artificial neural network-based integrative data mining approach to data from three cohorts of patients with breast cancer (the Nottingham discovery cohort (n=171), Uppsala cohort (n=249), and Molecular Taxonomy of Breast Cancer International Consortium [METABRIC] cohort; n=1980). We then identified the genes with the most effect on other genes in the resulting interactome map. Sperm-associated antigen 5 (SPAG5) featured prominently in our interactome map of proliferation and we chose to take it forward in our analysis on the basis of its fundamental role in the function and dynamic regulation of mitotic spindles, mitotic progression, and chromosome segregation fidelity. We investigated the clinicopathological relevance of SPAG5 gene copy number aberrations, mRNA transcript expression, and protein expression and analysed the associations of SPAG5 copy number aberrations, transcript expression, and protein expression with breast cancer-specific survival, disease-free survival, distant relapse-free survival, pathological complete response, and residual cancer burden in the Nottingham discovery cohort, Uppsala cohort, METABRIC cohort, a pooled untreated lymph node-negative cohort (n=684), a multicentre combined cohort (n=5439), the Nottingham historical early stage breast cancer cohort (Nottingham-HES; n=1650), Nottingham early stage oestrogen receptor-negative breast cancer adjuvant chemotherapy cohort (Nottingham-oestrogen receptor-negative-ACT; n=697), the Nottingham anthracycline neoadjuvant chemotherapy cohort (Nottingham-NeoACT; n=200), the MD Anderson taxane plus anthracycline-based neoadjuvant chemotherapy cohort (MD Anderson-NeoACT; n=508), and the multicentre phase 2 neoadjuvant clinical trial cohort (phase 2 NeoACT; NCT00455533; n=253). In the METABRIC cohort, we detected SPAG5 gene gain or amplification at the Ch17q11.2 locus in 206 (10%) of 1980 patients overall, 46 (19%) of 237 patients with a PAM50-HER2 phenotype, and 87 (18%) of 488 patients with PAM50-LumB phenotype. Copy number aberration leading to SPAG5 gain or amplification and high SPAG5 transcript and SPAG5 protein concentrations were associated with shorter overall breast cancer-specific survival (METABRIC cohort [copy number aberration]: hazard ratio [HR] 1·50, 95% CI 1·18-1·92, p=0·00010; METABRIC cohort [transcript]: 1·68, 1·40-2·01, p<0·0001; and Nottingham-HES-breast cancer cohort [protein]: 1·68, 1·32-2·12, p<0·0001). In multivariable analysis, high SPAG5 transcript and SPAG5 protein expression were associated with reduced breast cancer-specific survival at 10 years compared with lower concentrations (Uppsala: HR 1·62, 95% CI 1·03-2·53, p=0·036; METABRIC: 1·27, 1·02-1·58, p=0·034; untreated lymph node-negative cohort: 2·34, 1·24-4·42, p=0·0090; and Nottingham-HES: 1·73, 1·23-2·46, p=0·0020). In patients with oestrogen receptor-negative breast cancer with high SPAG5 protein expression, anthracycline-based adjuvant chemotherapy increased breast cancer-specific survival overall compared with that for patients who did not receive chemotherapy (Nottingham-oestrogen receptor-negative-ACT cohort: HR 0·37, 95% CI 0·20-0·60, p=0·0010). Multivariable analysis showed high SPAG5 transcript concentrations to be independently associated with longer distant relapse-free survival after receiving taxane plus anthracycline neoadjuvant chemotherapy (MD Anderson-NeoACT: HR 0·68, 95% CI 0·48-0·97, p=0·031). In multivariable analysis, both high SPAG5 transcript and high SPAG5 protein concentrations were independent predictors for a higher proportion of patients achieving a pathological complete response after combination cytotoxic chemotherapy (MD Anderson-NeoACT: OR 1·71, 95% CI, 1·07-2·74, p=0·024; Nottingham-ACT: 8·75, 2·42-31·62, p=0·0010). SPAG5 is a novel amplified gene on Ch17q11.2 in breast cancer. The transcript and protein products of SPAG5 are independent prognostic and predictive biomarkers that might have clinical utility as biomarkers for combination cytotoxic chemotherapy sensitivity, especially in oestrogen receptor-negative breast cancer. Nottingham Hospitals Charity and the John and Lucille van Geest Foundation.