Formal Concept Analysis (FCA) is a data analysis technique with applications in data mining, artificial intelligence, software engineering, etc. The algorithms for FCA are computationally expensive, and their recursion tree is highly irregular and dynamic in nature. Several distributed FCA algorithms have been proposed to exploit parallelism within and across machines. However, none of the distributed approaches are able to recover from failures in the system. We propose RD-FCA, the first resilient distributed framework for FCA that uses novel load-balancing strategy, handles fail-stop failures and provides at-least-once semantics for concept discovery. Our asynchronous snapshot mechanism with incremental updates reduces the snapshot overhead and minimizes recalculation of concepts during recovery. RD-FCA also supports dynamic addition of workers to accelerate performance. Compared to MapReduce based approaches, RD-FCA performs an order of magnitude faster. We show through extensive evaluation that RD-FCA recovers efficiently from single, multiple, independent and cascading failures.
Read full abstract