The inter-observer reliabilities of three scoring systems (Excitement Score, Face Mask Acceptance Score, Steward Score) were assessed by pairs of independent evaluators who observed 21 children during emergence from general anaesthesia. Each scoring system was analysed using the intraclass correlation coefficient giving values of 0.997 and 0.988 for the Excitement and Face Mask Acceptance Scores, respectively. Those for the Steward Score were 0.956, 0.924 and 0.295 for the 'consciousness', 'airway', and 'movement' subdivisions, respectively. Ambiguous wording and inexplicit instructions may explain the lower correlations for the Steward Score. Reliability of scoring systems cannot be assumed and systems of undocumented reliability bring into question the results of the studies in which they are employed.