When fitting three-parameter flood frequency models to annual maximum (AM) flood series, the lack-of-fit can be mitigated by censoring potentially influential low flows (PILFs). An alternative and less-studied approach is to apply mixture probability models with four or more parameters, which trade off greater flexibility to fit AM series against the need to deal with degeneracy which arises when there is insufficient information to identify mixture components. However, the issue of degeneracy and the lack of a robust inference framework present a significant barrier to adoption. This study investigated the potential of the most parsimonious mixture model, the four-parameter Two-Component Extreme Value (TCEV) model. A Bayesian framework using Markov chain Monte Carlo sampling was developed to robustly characterize parameter uncertainty even in the presence of degeneracy. Two new posterior diagnostics based on the strength of the TCEV components were developed to aid identification of degeneracy. Armed with a robust inference framework, the study evaluated the potential of TCEV using a case study based on 31 catchments in eastern Australia with records exceeding 70 years. The evaluation used short to long records to compare TCEV and Log Pearson III fit and extrapolative uncertainty. The Log Pearson approach (LP3-PILF) censors PILFs following the approach described in Australian Rainfall and Runoff. With respect to goodness-of-fit, we found in most cases that TCEV fitted AM flood peaks well without the need to censor or stratify data, and overall, no clear difference emerged between TCEV and LP3-PILF. However, and contrary to expectation, TCEV produced high flow quantile confidence intervals consistently narrower than LP3-PILF even in the presence of degeneracy. While more case studies in different regions are needed to confirm the potential of TCEV, this study is a reminder that goodness-of-fit is a necessary but not sufficient criterion for selecting the probability model that best represents flood frequency.