Abstract

With the increasing use of online social networking platforms, online surveys are widely used in many fields, e.g., public health, business and sociology, to collect samples and to infer the population characteristics through self-reported data of respondents. Although the online surveys can protect the privacy of respondents, self-reporting is challenged by a low response rate and unreliable answers when the survey contains sensitive questions, such as drug use, sexual behaviors, abortion or criminal activity. To overcome this limitation, this paper develops an approach that collects the second-order information of the respondents, i.e., asking them about the characteristics of their friends, instead of asking the respondents’ own characteristics directly. Then, we generate the inference about the population variable with the Hansen-Hurwitz estimator for the two classic sampling strategies (simple random sampling or random walk-based sampling). The method is evaluated by simulations on both artificial and real-world networks. Results show that the method is able to generate population estimates with high accuracy without knowing the respondents’ own characteristics, and the biases of estimates under various settings are relatively small and are within acceptable limits. The new method offers an alternative way for implementing surveys online and is expected to be able to collect more reliable data with improved population inference on sensitive variables.

Highlights

  • Online social networking platforms, e.g., Facebook, Twitter, etc., on which users share their daily life and build social relations with others, provide a tremendous amount of data for researchers to study social phenomena and to validate the theoretical models [1,2]

  • Simulation with simple random sampling: We first implemented the developed methods on different networks with varying characteristics and studied the performance of the estimator developed for the simple random sampling, i.e., SEC1

  • An analysis of variance (ANOVA) test [41] indicated that there was no significant difference of the average biases among estimates with different average degree (p-value = 0.94)

Read more

Summary

Introduction

E.g., Facebook, Twitter, etc., on which users share their daily life and build social relations with others, provide a tremendous amount of data for researchers to study social phenomena and to validate the theoretical models [1,2]. Compared with the offline surveys such as face-to-face interviews, the online surveys are cost efficient and easy to implement through social networking platforms and can protect the privacy of respondents with the absence of the interviewers [10]. From the samples collected by popular sampling strategies, such as simple random sampling and random walk-based sampling, the population mean is easy to infer when the self-reported data of the respondents’ own characteristics are available [11,12]. When the respondents are randomly selected from the population, the population mean can be estimated by Entropy 2018, 20, 480; doi:10.3390/e20060480 www.mdpi.com/journal/entropy. When the respondents are selected via a crawler-like random walk, the population mean is typically estimated by a re-weighted correction of the nodal degree [15,16,17]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call