SenseLens: An Efficient Social Signal Conditioning System for True Event Detection

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This article narrows the gap between physical sensing systems that measure physical signals and social sensing systems that measure information signals by (i) defining a novel algorithm for extracting information signals (building on results from text embedding) and (ii) showing that it increases the accuracy of truth discovery—the separation of true information from false/manipulated one. The work is applied in the context of separating true and false facts on social media, such as Twitter and Reddit, where users post predominantly short microblogs. The new algorithm decides how to aggregate the signal across words in the microblog for purposes of clustering the miscroblogs in the latent information signal space, where it is easier to separate true and false posts. Although previous literature extensively studied the problem of short text embedding/representation, this article improves previous work in three important respects: (1) Our work constitutes unsupervised truth discovery, requiring no labeled input or prior training. (2) We propose a new distance metric for efficient short text similarity estimation, we call Semantic Subset Matching , that improves our ability to meaningfully cluster microblog posts in the latent information signal space. (3) We introduce an iterative framework that jointly improves miscroblog clustering and truth discovery. The evaluation shows that the approach improves the accuracy of truth-discovery by 6.3%, 2.5%, and 3.8% (constituting a 38.9%, 14.2%, and 18.7% reduction in error, respectively) in three real Twitter data traces.

Similar Papers
  • Book Chapter
  • Cite Count Icon 8
  • 10.1007/978-3-030-02922-7_29
On the Discovery of Continuous Truth: A Semi-supervised Approach with Partial Ground Truths
  • Jan 1, 2018
  • Yi Yang + 2 more

In many applications, the information regarding to the same object can be collected from multiple sources. However, these multi-source data are not reported consistently. In the light of this challenge, truth discovery is emerged to identify truth for each object from multi-source data. Most existing truth discovery methods assume that ground truths are completely unknown, and they focus on the exploration of unsupervised approaches to jointly estimate object truths and source reliabilities. However, in many real world applications, a set of ground truths could be partially available. In this paper, we propose a semi-supervised truth discovery framework to estimate continuous object truths. With the help of ground truths, even a small amount, the accuracy of truth discovery can be improved. We formulate the semi-supervised truth discovery problem as an optimization task where object truths and source reliabilities are modeled as variables. The ground truths are modeled as a regularization term and its contribution to the source weight estimation can be controlled by a parameter. The experiments show that the proposed method is more accurate and efficient than the existing truth discovery methods.

  • Research Article
  • Cite Count Icon 32
  • 10.1109/tbdata.2017.2669308
Scalable Uncertainty-Aware Truth Discovery in Big Data Social Sensing Applications for Cyber-Physical Systems
  • Dec 1, 2020
  • IEEE Transactions on Big Data
  • Chao Huang + 2 more

Social sensing is a new big data application paradigm for Cyber-Physical Systems (CPS), where a group of individuals volunteer (or are recruited) to report measurements or observations about the physical world at scale. A fundamental challenge in social sensing applications lies in discovering the correctness of reported observations and reliability of data sources without prior knowledge on either of them. We refer to this problem as truth discovery. While prior studies have made progress on addressing this challenge, two important limitations exist: (i) current solutions did not fully explore the uncertainty aspect of human reported data, which leads to sub-optimal truth discovery results; (ii) current truth discovery solutions are mostly designed as sequential algorithms that do not scale well to large-scale social sensing events. In this paper, we develop a Scalable Uncertainty-Aware Truth Discovery (SUTD) scheme to address the above limitations. The SUTD scheme solves a constraint estimation problem to jointly estimate the correctness of reported data and the reliability of data sources while explicitly considering the uncertainty on the reported data. To address the scalability challenge, the SUTD is designed to run a Graphic Processing Unit (GPU) with thousands of cores, which is shown to run two to three orders of magnitude faster than the sequential truth discovery solutions. In evaluation, we compare our SUTD scheme to the state-of-the-art solutions using three real world datasets collected from Twitter: Paris Attack, Oregon Shooting, and Baltimore Riots, all in 2015. The evaluation results show that our new scheme significantly outperforms the baselines in terms of both truth discovery accuracy and execution time.

  • Research Article
  • Cite Count Icon 18
  • 10.1016/j.jnca.2023.103811
BLIND: A privacy preserving truth discovery system for mobile crowdsensing
  • Dec 18, 2023
  • Journal of Network and Computer Applications
  • Vincenzo Agate + 3 more

BLIND: A privacy preserving truth discovery system for mobile crowdsensing

  • Research Article
  • 10.1121/1.393133
Laser recording information and pilot signals for tracking on a grooveless recording
  • Jun 1, 1986
  • The Journal of the Acoustical Society of America
  • Hisao Kinjo + 1 more

An information signal recording system records an information signal, a first and a second reference signals for tracking control, and a third reference signal for switching the first and second reference signals at the time of reproduction on a recording disc. The first and second reference signals of different frequencies are alternately recorded on intermediate parts of the recording disc, at positions between centerlines of adjacent information signal tracks. The third reference signal of a still different frequency is recorded at a predetermined position on every information signal track. A reproducing system reproduces the information signal together with the third reference, and the first and second reference signals by a single reproducing element. A tracking control signal is produced from the first and second reference signals switched by the third reference signal separated from the reproduced information signal.

  • Research Article
  • Cite Count Icon 30
  • 10.1109/tcyb.2021.3120134
Interlayer Link Prediction in Multiplex Social Networks Based on Multiple Types of Consistency Between Embedding Vectors.
  • Apr 1, 2023
  • IEEE Transactions on Cybernetics
  • Rui Tang + 5 more

Online users are typically active on multiple social media networks (SMNs), which constitute a multiplex social network. With improvements in cybersecurity awareness, users increasingly choose different usernames and provide different profiles on different SMNs. Thus, it is becoming increasingly challenging to determine whether given accounts on different SMNs belong to the same user; this can be expressed as an interlayer link prediction problem in a multiplex network. To address the challenge of predicting interlayer links, feature or structure information is leveraged. Existing methods that use network embedding techniques to address this problem focus on learning a mapping function to unify all nodes into a common latent representation space for prediction; positional relationships between unmatched nodes and their common matched neighbors (CMNs) are not utilized. Furthermore, the layers are often modeled as unweighted graphs, ignoring the strengths of the relationships between nodes. To address these limitations, we propose a framework based on multiple types of consistency between embedding vectors (MulCEVs). In MulCEV, the traditional embedding-based method is applied to obtain the degree of consistency between the vectors representing the unmatched nodes, and a proposed distance consistency index based on the positions of nodes in each latent space provides additional clues for prediction. By associating these two types of consistency, the effective information in the latent spaces is fully utilized. In addition, MulCEV models the layers as weighted graphs to obtain representation. In this way, the higher the strength of the relationship between nodes, the more similar their embedding vectors in the latent representation space will be. The results of our experiments on several real-world and synthetic datasets demonstrate that the proposed MulCEV framework markedly outperforms current embedding-based methods, especially when the number of training iterations is small.

  • Conference Article
  • Cite Count Icon 17
  • 10.1109/smartcomp.2016.7501723
Towards Emotional-Aware Truth Discovery in Social Sensing Applications
  • May 1, 2016
  • Jermaine Marshall + 1 more

This paper develops a new principled framework to solve an emotional-aware truth discovery problem in social sensing applications. Social sensing has emerged as a new application paradigm of cyber-physical systems with humans-in-the-loop where a large crowd of social sensors (humans or devices on their behalf) are recruited to or spontaneously report observations about the physical environment at scale. A fundamental problem in social sensing applications lies in ascertaining the correctness of the reported observations (often called claims) and the reliability of data sources. We refer to this problem as truth discovery. While significant efforts have been made to address the truth discovery problem, an important aspect of the problem has not been fully explored in previous studies: how to deal with emotional claims. A common assumption made in the previous works is that all claims are assumed to be factual (i.e., either true or false). However, unlike physical sensors, humans are more likely to incorporate personal emotions and sentiments in the reported observations (e.g., tweets, blogs), which can easily confuse the current truth discovery solutions and lead to inaccurate results. In this paper, we develop a new emotional-aware truth discovery scheme that explicitly incorporates emotional information of human reported data into an analytical framework. The new truth discovery scheme solves a maximum likelihood estimation problem to determine both the claim correctness and the source reliability. We compare our emotional-aware scheme with the state-of-the-art baselines through three real world case studies using Twitter data feeds. The evaluation results showed that our new scheme outperforms all compared baselines and significantly improves the truth discovery accuracy in social sensing applications.

  • Research Article
  • Cite Count Icon 92
  • 10.1109/tnet.2021.3110052
Towards Personalized Privacy-Preserving Truth Discovery Over Crowdsourced Data Streams
  • Feb 1, 2022
  • IEEE/ACM Transactions on Networking
  • Xiaoyi Pang + 5 more

Truth discovery is an effective paradigm which could reveal the truth from crowdsouced data with conflicts, enabling data-driven decision-making systems to make quick and smart decisions. The increasing privacy concern promotes users to perturb or encrypt their private data before outsourcing, which poses significant challenges for truth discovery. Although several privacy-preserving truth discovery mechanisms have been proposed, none of them take personal privacy expectation into consideration. In this work, we propose a novel personalized privacy-preserving truth discovery (PPPTD) framework over crowdsourced data streams to achieve timely and accurate truth discovery while guaranteeing the protection of individual privacy. The key challenges of PPPTD lie in improving the accuracy of truth estimation from the perturbed streaming data with personalized protection level. To address these challenges, we first develop a personalized budget initialization mechanism to quantify each user's privacy protection requirement, and allocate personalized privacy budgets to users according to their privacy requirements. Then we propose a deviation-aware weighted aggregation method to improve the accuracy of truth discovery from streaming data with varying degrees of perturbation. In order to achieve privacy-utility tradeoff, we further propose an influence-aware adaptive budget adjustment mechanism that adaptively re-allocates privacy budgets to users based on the evolution of their influence in the weighted aggregation. We prove that PPPTD can achieve <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> -differential privacy over the whole data generated by users and satisfy individual personalized privacy requirements. Extensive experiments on two real-world datasets demonstrate the effectiveness of PPPTD.

  • Conference Article
  • Cite Count Icon 62
  • 10.1109/icdcs.2017.196
Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework
  • Jun 1, 2017
  • Daniel Yue Zhang + 6 more

With the rapid growth of online social media and ubiquitous Internet connectivity, social sensing has emerged as a new crowdsourcing application paradigm of collecting observations (often called claims) about the physical environment from humans or devices on their behalf. A fundamental problem in social sensing applications lies in effectively ascertaining the correctness of claims and the reliability of data sources without knowing either of them a priori, which is referred to as truth discovery. While significant progress has been made to solve the truth discovery problem, some important challenges have not been well addressed yet. First, existing truth discovery solutions did not fully solve the dynamic truth discovery problem where the ground truth of claims changes over time. Second, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. Third, the heterogeneity and unpredictability of the social sensing data traffic pose additional challenges to the resource allocation and system responsiveness. In this paper, we developed a Scalable Streaming Truth Discovery (SSTD) solution to address the above challenges. In this paper, we developed a Scalable Streaming Truth Discovery (SSTD) solution to address the above challenges. In particular, we first developed a dynamic truth discovery scheme based on Hidden Markov Models (HMM) to effectively infer the evolving truth of reported claims. We further developed a distributed framework to implement the dynamic truth discovery scheme using Work Queue in HTCondor system. We also integrated the SSTD scheme with an optimal workload allocation mechanism to dynamically allocate the resources (e.g., cores, memories) to the truth discovery tasks based on their computation requirements. We evaluated SSTD through real world social sensing applications using Twitter data feeds. The evaluation results on three real-world data traces (i.e., Boston Bombing, Paris Shooting and College Football) show that the SSTD scheme is scalable and outperforms the state-of-the-art truth discovery methods in terms of both effectiveness and efficiency.

  • Conference Article
  • Cite Count Icon 35
  • 10.1109/mass.2015.39
Towards Time-Sensitive Truth Discovery in Social Sensing Applications
  • Oct 1, 2015
  • Chao Huang + 2 more

This paper develops a new principled framework for exploiting time-sensitive information to improve the truth discovery accuracy in social sensing applications. This work is motivated by the emergence of social sensing as a new paradigm of collecting observations about the physical environment from humans or devices on their behalf. These observations maybe true or false, and hence are viewed as binary claims. A fundamental problem in social sensing applications lies in ascertaining the correctness of claims and the reliability of data sources. We refer to this problem as truth discovery. Time is a critical dimension that needs to be carefully exploited in the truth discovery solutions. In this paper, we develop a new time-sensitive truth discovery scheme that explicitly incorporates the source responsiveness and the claim lifespan into a rigorous analytical framework. The new truth discovery scheme solves a maximum likelihood estimation problem to determine both the claim correctness and the source reliability. We compare our time-sensitive scheme with the state-of-the-art baselines through an extensive simulation study and a real world case study. The evaluation results showed that our new scheme outperforms all compared baselines and significantly improves the truth discovery accuracy in social sensing applications.

  • Research Article
  • Cite Count Icon 13
  • 10.1109/access.2022.3175306
Leveraging Knowledge-Based Features With Multilevel Attention Mechanisms for Short Arabic Text Classification
  • Jan 1, 2022
  • IEEE Access
  • Iyad Alagha

With the wide spread of short texts through social media platforms, there has become a growing need for effective methods for short-text classification. However, short-text classification has always been challenging due to the ambiguity and the data sparsity of the short text. A common solution is to enrich the short text with additional semantic features extracted from external knowledge, such as Wikipedia, to help the classifier better decide on the correct class. Most existing works, however, focused on text written in English and benefited from the existence of entity-linking tools based on English-based knowledge bases. When it comes to the Arabic language, the exploitation of external knowledge to support the classification of Arabic short text has not been widely explored. This work presents an approach for the classification of short Arabic text that exploits both the Wikipedia-based features and the attention mechanism for effective classification. First, Wikipedia entities mentioned in the short text are identified. Then, Wikipedia categories associated with the identified entities are retrieved and filtered to retain only the most relevant categories. A deep learning model with multiple attention mechanisms is then used to encode the short text and the associated category set. Finally, the short text and category representations are combined together to be fed into the classification layer. The use of the attentive model with category filtering leads to highlighting the most important features while reducing the effect of improper features. Finally, the proposed model is evaluated by comparing it with several deep learning models.

  • Research Article
  • Cite Count Icon 5
  • 10.3390/app122111278
RSVN: A RoBERTa Sentence Vector Normalization Scheme for Short Texts to Extract Semantic Information
  • Nov 7, 2022
  • Applied Sciences
  • Lei Gao + 3 more

With the explosive growth in short texts on the Web and an increasing number of Web corpora consisting of short texts, short texts are playing an important role in various Web applications. Entity linking is a crucial task in knowledge graphs and a key technology in the field of short texts that affects the accuracy of many downstream tasks in natural language processing. However, compared to long texts, the entity-linking task of Chinese short text is a challenging problem due to the serious colloquialism and insufficient contexts. Moreover, existing methods for entity linking in Chinese short text underutilize semantic information and ignore the interaction between label information and the original short text. In this paper, we propose a RoBERTa sentence vector normalization scheme for short texts to fully extract the semantic information. Firstly, the proposed model utilizes RoBERTa to fully capture contextual semantic information. Secondly, the anisotropy of RoBERTa’s output sentence vectors is revised by utilizing the standard Gaussian of flow model, which enables the sentence vectors to more precisely characterize the semantics. In addition, the interaction between label embedding and text embedding is employed to improve the NIL entity classification. Experimental results demonstrate that the proposed model outperforms existing research results and mainstream deep learning methods for entity linking in two Chinese short text datasets.

  • Research Article
  • Cite Count Icon 21
  • 10.1016/j.dss.2019.113142
Developing insights from social media using semantic lexical chains to mine short text structures
  • Aug 26, 2019
  • Decision Support Systems
  • Cecil Eng Huang Chua + 3 more

Developing insights from social media using semantic lexical chains to mine short text structures

  • Research Article
  • Cite Count Icon 7
  • 10.1109/access.2017.2780182
Latent Dirichlet Truth Discovery: Separating Trustworthy and Untrustworthy Components in Data Sources
  • Jan 1, 2018
  • IEEE Access
  • Liyan Zhang + 3 more

The discovery of truth is a critical step toward effective information and knowledge utilization, especially in Web services, social media networks, and sensor networks. Typically, a set of sources with varying reliability claim observations about a set of objects and the goal is to jointly discover the true fact for each object and the trustworthy degree of each source. In this paper, we propose a latent Dirichlet truth (LDT) discovery model to approach this problem. It defines a random field over all the possible configurations of the trustworthy degrees of sources and facts, and the most probable configuration is inferred by a maximum a posteriori criterion over the observed claims. We note that a typical source is usually made of mixed trustworthy and untrustworthy components, since it can make true or false claims on different objects. While most of the existing algorithms do not attempt separate the untrustworthy component from the trustworthy one in each source, the proposed model explicitly identifies untrustworthy component in each source. This makes the LDT model more capable of separating the trustworthy and untrustworthy components, and in turn improves the accuracy of truth discovery. Experiments on real data sets show competitive results compared with existing algorithms.

  • Research Article
  • Cite Count Icon 57
  • 10.1109/tcomm.2016.2641949
Exploiting Full-Duplex Receivers for Achieving Secret Communications in Multiuser MISO Networks
  • Jan 9, 2017
  • IEEE Transactions on Communications
  • Berk Akgun + 2 more

We consider a broadcast channel, in which a multi-antenna transmitter (Alice)\nsends $K$ confidential information signals to $K$ legitimate users (Bobs) in\nthe presence of $L$ eavesdroppers (Eves). Alice uses MIMO precoding to generate\nthe information signals along with her own (Tx-based) friendly jamming.\nInterference at each Bob is removed by MIMO zero-forcing. This, however, leaves\na "vulnerability region" around each Bob, which can be exploited by a nearby\nEve. We address this problem by augmenting Tx-based friendly jamming (TxFJ)\nwith Rx-based friendly jamming (RxFJ), generated by each Bob. Specifically,\neach Bob uses self-interference suppression (SIS) to transmit a friendly\njamming signal while simultaneously receiving an information signal over the\nsame channel. We minimize the powers allocated to the information, TxFJ, and\nRxFJ signals under given guarantees on the individual secrecy rate for each\nBob. The problem is solved for the cases when the eavesdropper's channel state\ninformation is known/unknown. Simulations show the effectiveness of the\nproposed solution. Furthermore, we discuss how to schedule transmissions when\nthe rate requirements need to be satisfied on average rather than\ninstantaneously. Under special cases, a scheduling algorithm that serves only\nthe strongest receivers is shown to outperform the one that schedules all\nreceivers.\n

  • Research Article
  • Cite Count Icon 58
  • 10.1109/tkde.2015.2504928
Truth Discovery in Crowdsourced Detection of Spatial Events
  • Feb 29, 2016
  • IEEE Transactions on Knowledge and Data Engineering
  • Robin Wentao Ouyang + 3 more

The ubiquity of smartphones has led to the emergence of mobile crowdsourcing tasks such as the detection of spatial events when smartphone users move around in their daily lives. However, the credibility of those detected events can be negatively impacted by unreliable participants with low-quality data. Consequently, a major challenge in mobile crowdsourcing is truth discovery, i.e., to discover true events from diverse and noisy participants’ reports. This problem is uniquely distinct from its online counterpart in that it involves uncertainties in both participants’ mobility and reliability . Decoupling these two types of uncertainties through location tracking will raise severe privacy and energy issues, whereas simply ignoring missing reports or treating them as negative reports will significantly degrade the accuracy of truth discovery. In this paper, we propose two new unsupervised models, i.e., Truth finder for Spatial Events (TSE) and Personalized Truth finder for Spatial Events (PTSE), to tackle this problem. In TSE, we model location popularity, location visit indicators, truths of events, and three-way participant reliability in a unified framework. In PTSE, we further model personal location visit tendencies. These proposed models are capable of effectively handling various types of uncertainties and automatically discovering truths without any supervision or location tracking. Experimental results on both real-world and synthetic datasets demonstrate that our proposed models outperform existing state-of-the-art truth discovery approaches in the mobile crowdsourcing environment.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant