The Modified Rankin Scale (mRS) is the most commonly used functional measure in stroke research but is limited by inter-rater reliability (IRR). Various interventions to improve mRS application have been described. We aimed to compare properties of differing approaches to mRS assessment. Multidisciplinary databases (MEDLINE, EMBASE, Health and Psychosocial Instruments [OVID], CINAHL, PsycINFO [EBSCO]) were searched for adult human stroke studies describing psychometric properties of mRS. Two researchers independently screened 20% titles and abstracts, reviewed all full studies, extracted data, and conducted risk of bias (ROB) analysis. Primary outcomes for random-effects meta-analysis were IRR measured by kappa (K) and weighted kappa (KW). Validity and inter-modality reliability measures (Spearman's rho, KW) were also summarised. From 897 titles, 46 studies were eligible, including twelve differing approaches to mRS, 8608 participants. There was high ROB in 14 (30.4%) studies. Overall, reliability was substantial (n = 29 studies, K = 0.65, 95% CI: 0.58-0.71) but IRR was higher for novel approaches to mRS, for example, the Rankin Focussed Assessment (n = 2 studies, K = 0.94, 95% CI: 0.90-0.98) than standard mRS (n = 13 studies, K = 0.55, 95%CI:0.46-0.64). Reliability improved following the introduction of mRS training (K = 0.56, 95% CI: 0.44-0.67; vs K = 0.69, 95% CI: 0.62-0.77). Validity ranged from poor to excellent, with an excellent overall concurrent validity of novel scales (n = 6 studies, KW = 0.86, 95% CI: 0.75-0.97). The agreement between face-to-face and telephone administration was substantial (n = 5 studies, KW = 0.80, 95% CI: 0.74-0.87). The mRS is a valid measure of function but IRR remains an issue. The present findings are limited by a high ROB and possible publication bias. Interventions to improve mRS reliability (training, structured interview, adjudication) seem to be beneficial, but single interventions do not completely remove reliability concerns.