BackgroundSouthern Illinois University School of Medicine (SIUSOM) collects large amounts of data every day. SIUSOM and other similar healthcare systems are always looking for better ways to use the data to understand and address population level problems. The purpose of this study is to analyze the administrative dataset for pediatric patients served by Southern Illinois University School of Medicine (SIUSOM) to uncover patterns that correlate specific demographic information to diagnoses of pediatric diseases. The study uses a cross-sectional database of medical billing information for all pediatric patients served by SIUSOM between June 2013 and December 2016. The dataset consists of about 980.9K clinical visits for 65.4K unique patients and includes patient demographic identifiers such as their sex, date of birth, race, anonymous zipcode and primary and secondary insurance plan as well as the related pediatric diagnosis codes. The goal is to find unknown correlations in this database. MethodWe proposed a two step methodology to derive unknown correlations in SIUSOM administrative database. First, Class association rule mining was used as a well-established data mining method to generate hypothesis and derive associations of the form D → M, where D is diagnosis code of a pediatric disease and M is a patient demographic identifier (age,sex, anonymous zipcode, insurance plan, or race). The resulting associations were pruned and filtered using measures such as lift, odds ratio, relative risk, and confidence. The final associations were selected by a pediatric doctor based on their clinical significance. Second,each association rule in the final set was further validated and adjusted odds ratios were obtained using multiple logistic regression. ResultsSeveral associations were found correlating specific patients’ residential zip codes with the diagnosis codes for viral hepatitis carrier, exposure to communicable diseases, screening for mental and developmental disorder in childhood, history allergy to medications, disturbance of emotions specific to childhood, and acute sinusitis. In addition, the results show that African American patients are more likely to be screened for mental and developmental disorders compared to White patients for SIUSOM pediatric population (Odds Ratio (OR):3.56, 95% Confidence Interval (CI):[3.29,3.85]). ConclusionClass association rule mining is an effective method for detecting signals in a large patient administrative database and generating hypotheses which correlate patients’ demographics with diagnosis of pediatric diseases. A post processing of the hypotheses generated by this method is necessary to prune spurious associations and select a set of clinically relevant hypotheses.