Epistasis detection is a fundamental application in the areas of bioinformatics and biomedicine, providing important insights regarding the relationship between the human genome and the occurrence of certain diseases. Exhaustive epistasis detection approaches are employed to achieve an accurate and deterministic solution, at the cost of high computational complexity, especially when targeting high-order epistasis. While recent works employ vectorization and cache-blocking techniques to alleviate this burden, these solutions are now limited by the maximum performance of the functional units of computing systems. Thus, to further improve the performance of epistasis detection it is necessary to reduce its number of memory transfers and computations. To tackle this issue, this work proposes SpEpistasis, which performs three-way epistasis detection by relying on sparse features, which by only storing the non-zero elements of the dataset, allows for reducing the number of operations needed for epistasis detection. To achieve this goal, a new hybrid format to represent the input dataset is proposed, which stores a subset of the data in the compressed sparse row format. Moreover, new sparse-aware algorithmic approaches are also proposed in order to leverage both the hybrid format and the vector capabilities of current CPUs from Intel, AMD, and ARM. The experimental results show that SpEpistasis provides a speedup up to 3.7× and average speedups of around 1.8× and 1.33× when compared with other state-of-the-art works.
Read full abstract