Introduction/Background Inguinal hernia is one of the most common conditions seen in pediatric practice, with 3-5% of newborns experiencing inguinal hernias. An incarcerated or strangulated hernia can result in intestinal necrosis and testicular or ovarian atrophy, while improper repair techniques can lead to recurrence, testicular torsion, or injury of spermatic cord structures. Reduction of hernia contents back into the abdomen is a non-operative procedure that involves a series of simple but specific maneuvers that can prevent incarceration until surgery can be performed.1 No simulation models currently exist to teach an open approach to pediatric inguinal hernia repair nor maneuvers of inguinal hernia reduction. Because inguinal hernias are common, every physician should be adept at performing these procedures. The objective of this project is to develop and evaluate validity evidence for two simulators: a surgical simulator for pediatric inguinal hernia repair and a clinical simulator for pediatric inguinal hernia reduction. Methods Surgical Simulator: The surgical simulator consists of a plastic base with a pigmented silicone rubber covering. The spermatic cord model is secured inside the base and consists of a balloon representing the hernia sac, two elastic strings representing the spermatic vessels and vas deferens, and plastic wrap and silicone representing surrounding tissues. Fourteen novice raters and seventeen expert raters (N=31) from five different institutions in France (Nice, Paris, Strasbourg, Limoges, and Angers) performed the inguinal hernia repair. Participants completed a self-report rating scale ranging from 1 (Don’t know) to 5 (Very realistic, no changes needed). Validity evidence relevant to test content and internal structure was evaluated using the many-facet Rasch analysis. Analysis indicated that Attendings had higher ratings (Observed Average (OA)= 4.4/5.0) than Interns (OA=4.0) and Residents (OA=3.8), p=.001, and that there were overall differences when comparing ratings across institutions, with higher ratings associated with participants from Limoges (OA=4.56), followed by ratings from Nice (OA=4.25), Angers (OA=4.01), Strasbourg (OA=3.89), and Paris (OA=3.64), p=.01. Ratings across seven domains (Realism of materials, Realism of experience, Value, Internal anatomy, Ability to perform tasks, Relevance, Physical attributes) ranged from 3.91 to 4.41. The observed global rating was 2.37/4.0. Clinical Simulator: The clinical simulator consists of a modified baby doll with a pigmented silicone rubber covering. The hernia model inside the baby doll is represented by a sponge inside of a balloon, which can be manipulated from one end of the balloon to the other. Forty-eight novice raters, twenty expert raters, and ten non-identified raters (N=86) from the five institutions in France performed the inguinal hernia reduction. Participants completed a similar self-report rating scale ranging from 1 to 5. Validity evidence relevant to test content and internal structure was evaluated using the many-facet Rasch analysis. Analysis indicated no statistical differences when comparing Attending-specialists (Observed Average (OA)= 3.9/5.0) to Attending-generalists (OA=3.9) and Residents (OA=3.8), p=.97, but that there were overall differences when comparing ratings across institutions, with higher ratings associated with participants from Limoges and Angers (OA=3.9), followed by ratings from Strasbourg and Nice (OA=3.8), and Paris (OA=3.7), p=.001. Ratings across six domains (Ability to perform tasks, Realism of experience, Realism of materials, Physical attributes, Relevance, Value) ranged from 3.5 to 4.2. The observed global rating was 2.4/4.0. Results: Conclusion For the surgical simulator, participants agreed that the model has a great deal of value as a training and testing tool and is highly relevant to their clinical practice. For the clinical simulator, participants agreed that the model has some value as a training and testing tool and is relevant to their clinical practice. For both models, the observed global ratings indicate some need for improvement prior to implementation. Ratings of individual items within each domain will help us identify areas for improvement.