Abstract
The analysis of enormous datasets with missing data entries is a standard task in biological and medical data processing. Large-scale, multi-institution clinical studies are the typical examples of such datasets. These sets make possible the search for multi-parametric relations since from the plenty of the data one is likely to find a satisfying number of subjects with the required parameter ensembles. Specifically, finding combinatorial biomarkers for some given condition also needs a very large dataset to analyze. For fast and automatic multi-parametric relation discovery association-rule finding tools are used for more than two decades in the data-mining community. Here we present the SCARF webserver for generalized association rule mining. Association rules are of the form: a AND b AND … AND x → y, meaning that the presence of properties a AND b AND … AND x implies property y; our algorithm finds generalized association rules, since it also finds logical disjunctions (i.e., ORs) at the left-hand side, allowing the discovery of more complex rules in a more compressed form in the database. This feature also helps reducing the typically very large result-tables of such studies, since allowing ORs in the left-hand side of a single rule could include dozens of classical rules. The capabilities of the SCARF algorithm were demonstrated in mining the Alzheimer’s database of the Coalition Against Major Diseases (CAMD) in our recent publication (Archives of Gerontology and Geriatrics Vol. 73, pp. 300–307, 2017). Here we describe the webserver implementation of the algorithm.
Highlights
Introduction and motivationAn enormous amount of data is generated every day in biological experiments and clinical investigations
Association rules are automatically found patterns in large databases, where, say, each human patient has a number of attributes or parameter values, and the association rules describe implication-like relations between these attributes, like this one: AND →
The computed generalized association rules, with conjunctions and disjunctions in its LHS, have two remarkable properties: (i) any Boolean function can be represented as the ANDs of ORs of the variables and the negations of the variables, these generalized association rules are universal in describing Boolean functions, and (ii) short generalized association rules are capable of describing many non-generalized association rules in one formula, since, e.g., the LHS AND(c ORd) AND(e ORf ) is equivalent to the OR of eight ternary conjunctions; this generalized LHS compresses the LHS of eight non-generalized rules
Summary
An enormous amount of data is generated every day in biological experiments and clinical investigations. (high cholesterol level) AND (high blood pressure) → (heart disease) These rules have a left-hand side (abbreviated by LHS), left from the → symbol, and a right-hand side (RHS), right from the → symbol. – Support: The number of data items (e.g., patients), where both the LHS and RHS are true. – Confidence: The value of the Support, divided by the LHS support In our example it describes the fraction of patients with high cholesterol AND high blood pressure, having heart disease. In association rule mining the association rules with pre-defined minimum support, confidence and lift values need to be found [6, 7]. We present the SCARF webserver that computes generalized association rules, where the LHS can contain disjunctions (i.e., ORs), ANDs, as in the classical association rules. The computed generalized association rules, with conjunctions and disjunctions in its LHS, have two remarkable properties: (i) any Boolean function can be represented as the ANDs of ORs of the variables and the negations of the variables, these generalized association rules are universal in describing Boolean functions, and (ii) short generalized association rules are capable of describing many non-generalized association rules in one formula, since, e.g., the LHS (a ORb) AND(c ORd) AND(e ORf ) is equivalent to the OR of eight ternary conjunctions; this generalized LHS compresses the LHS of eight non-generalized rules
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.