Abstract

Although mobile application ecosystems have experienced tremendous growth in recent years, retrieving content of mobile applications that serves a key to mobile content search engines still faces grand challenges. Compared to web content retrieval, it is much more difficult to capture content in mobile applications due to the diversity of applications and the lack of Uniform Resource Locator indices. In this study, we propose and implement a <underline>u</underline>ser interaction-driven <underline>m</underline>obile <underline>c</underline>ontent <underline>r</underline>etrieval (UMCR) system to address such issues, which is the first mobile content crawler in the current literature. UMCR is a distributed system that contains many measurement nodes, each of which combines the user interaction path traversing (UIPT) and Deep Package Inspection (DPI) together to obtain mobile content. UIPT determines the events of user interactions in various applications to capture the static content such as text and images, in which a traversal depth termination scheme and an optional cut-off component are adopted to balance the content coverage and traversing efficiency. Meanwhile, the analysis based on DPI is responsible for extracting the videos as well as digging the infrastructural information and performance metrics. In addition, a distributed traversal scheduling method is designed for UIPT tasks to improve the throughput and scalability in large-scale content retrieval. Experiments on retrieving content of 64 real mobile applications demonstrate that UMCR can handle diverse mobile applications efficiently. The scheduler can improve throughput by 3 times compared to the legacy arbitrary task assignment strategy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call