We aimed to develop a formal Farsi (Persian) translation of the Appraisal of Guidelines for Research and Evaluation (AGREE) clinical guideline appraisal instrument. We considered the effect of group discussion in improving the reliability of scores. We followed a multi-step process of translation including independent translations of the instrument and extensive assessment of face validity and fluency. We used the instruments to appraise 11 guidelines from three specialities. After the first appraisal, the raters discussed about each guideline in groups, and had the opportunity to revise their scores individually. In total 96 appraisals were conducted. The intra-class correlations (1,1) were calculated for domain scores obtained by two versions at each time point. We observed no statistically significant differences between the mean values obtained from the English and the translated versions of AGREE, and the scores at two time points. The average domain scores, as well as the reliability rose significantly after discussion. The Farsi version of the AGREE instrument yields in the scores comparable to the original version, despite a lower reliability. Revision of scores after group discussion leads to higher reliability, probably by helping the raters recognize what they might have overlooked during the short time of assessment.