Abstract
How can we find patterns and anomalies in a tensor, i.e., multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives at each time step? Finding patterns and anomalies in multi-dimensional data have many important applications, including building safety monitoring, health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard tensor decomposition results are not directly interpretable and few methods that propose to increase interpretability need to be made faster, more memory efficient, and more accurate for large and quickly generated data in the online environment. We propose two versions of a fast, accurate, and directly interpretable tensor decomposition method we call CTD that is based on efficient sampling method. First is the static version of CTD, i.e., CTD-S, that provably guarantees up to 11× higher accuracy than that of the state-of-the-art method. Also, CTD-S is made up to 2.3× faster and up to 24× more memory-efficient than the state-of-the-art method by removing redundancy. Second is the dynamic version of CTD, i.e. CTD-D, which is the first interpretable dynamic tensor decomposition method ever proposed. It is also made up to 82× faster than the already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in online distributed denial of service (DDoS) attack detection and online troll detection.
Highlights
Multi-dimensional array, how can we find patterns and anomalies in an efficient and directly interpretable way? How can we do this in an online environment, where new data arrive at each time step? Many real-world data are multi-dimensional and can be modeled as sparse tensors
We propose CTD, a fast, accurate, and directly interpretable tensor decomposition method
Q1: What is the performance of our static method CTD-S compared to the competing method TENSOR-CUR? Q2: How do the performance of CTD-S and TENSOR-CUR change with regard to the sample size parameter? Q3: What is the performance of our dynamic method CTD-D compared to the static method CTD-S? Q4: What are the results of applying CTD-D for online distributed denial of service (DDoS) attack detection and online troll detection?
Summary
Multi-dimensional array, how can we find patterns and anomalies in an efficient and directly interpretable way? How can we do this in an online environment, where new data arrive at each time step? Many real-world data are multi-dimensional and can be modeled as sparse tensors. TENSOR-CUR [16], the state-ofthe-art sampling-based static tensor decomposition method, has many redundant fibers including duplicates in its factors. These redundancy cause higher memory usage and longer running time. In Tucker-type sampling based tensor decomposition (e.g., ApproxTensorSVD [14] and FBTD (fiber-based tensor decomposition) [15]), factor matrices for all modes are either sampled or generated; the overhead of generating a factor matrix for each mode makes these methods too slow for applications to real-time analysis. CTD-S is a mode-α LR tensor decomposition method and is interpretable since R consists of independent fibers sampled from X. 7: Compute the residual: r~es ðXðaÞð:; i0kÞ À RURT XðaÞð:; i0kÞÞ 8: if jjr~esjj jjXðaÞð:; i0kÞjj
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.