This paper develops a statistical learning approach to identify potentially new high-temperature ferroelectric piezoelectric perovskite compounds. Unlike most computational studies on crystal chemistry, where the starting point is some form of electronic structure calculation, we use a data-driven approach to initiate our search. This is accomplished by identifying patterns of behaviour between discrete scalar descriptors associated with crystal and electronic structure and the reported Curie temperature (TC) of known compounds; extracting design rules that govern critical structure–property relationships; and discovering in a quantitative fashion the exact role of these materials descriptors. Our approach applies linear manifold methods for data dimensionality reduction to discover the dominant descriptors governing structure–property correlations (the ‘genes’) and Shannon entropy metrics coupled to recursive partitioning methods to quantitatively assess the specific combination of descriptors that govern the link between crystal chemistry and TC (their ‘sequencing’). We use this information to develop predictive models that can suggest new structure/chemistries and/or properties. In this manner, BiTmO3–PbTiO3 and BiLuO3–PbTiO3 are predicted to have a TC of 730°C and 705°C, respectively. A quantitative structure–property relationship model similar to those used in biology and drug discovery not only predicts our new chemistries but also validates published reports.