Textual Factors: A Scalable, Interpretable, and Data-Driven Approach to Analyzing Unstructured Information
We introduce a general approach for analyzing large-scale text-based data, combining the strengths of neural network language processing and generative statistical modeling to create a factor structure of unstructured data for downstream regressions typically used in social sciences. We generate textual factors by (i) representing texts using vector word embedding, (ii) clustering the vectors using locality-sensitive hashing to generate supports of topics, and (iii) identifying relatively interpretable spanning clusters (i.e., textual factors) through topic modeling. Our data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability, plausibly attaining certain advantages over and complementing other unstructured data analytics used by researchers, including emergent large language models. We conduct initial validation tests of the framework and discuss three types of its applications: (i) enhancing prediction and inference with texts, (ii) interpreting (non–text-based) models, and (iii) constructing new text-based metrics and explanatory variables. We illustrate each of these applications using examples in finance and economics such as macroeconomic forecasting from news articles, interpreting multifactor asset pricing models from corporate filings, and measuring theme-based technology breakthroughs from patents. Finally, we provide a flexible statistical package of textual factors for online distribution to facilitate future research and applications. This paper was accepted by David Simchi-Levi, finance. Funding: The authors gratefully acknowledge the financial support from the Ewing Marion Kauffman Foundation, the Becker Friedman Institute of Economics, the Fama-Miller Center for Research in Finance, INQUIRE Europe, the Kenan Institute of Private Enterprise, and the Risk Institute at OSU Fisher College of Business (while L. W. Cong was a fellow at the institute). W. Zhu acknowledges financial support from the Tsinghua University Initiative Scientific Research Program [Grant 2022Z04W02016], the Tsinghua University School of Economics and Management [Research Grant 2022051002], and the National Natural Science Foundation of China [Grant 72442014]. Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mnsc.2020.01180 .
- Research Article
2
- 10.1287/mnsc.2023.4906
- Sep 7, 2023
- Management Science
Gifts are important instruments for forming bonds in interpersonal relationships. Our study analyzes the phenomenon of gift contagion in online groups. Gift contagion encourages social bonds by prompting further gifts; it may also promote group interaction and solidarity. Using data on 36 million online red packet gifts on a large social site in East Asia, we leverage a natural experimental design to identify the social contagion of gift giving in online groups. Our natural experiment is enabled by the randomization of the gift amount allocation algorithm on the platform, which addresses the common challenge of causal identification in observational data. Our study provides evidence of gift contagion: On average, receiving one additional dollar causes a recipient to send 18 cents back to the group within the subsequent 24 hours. Decomposing this effect, we find that it is mainly driven by the extensive margin: more recipients are triggered to send red packets. Moreover, we find that this effect is stronger for “luckiest draw” recipients, suggesting the presence of a group norm regarding the next red packet sender. Finally, we investigate the moderating effects of group- and individual-level social network characteristics on gift contagion as well as the causal impact of receiving gifts on group network structure. Our study has implications for promoting group dynamics and designing marketing strategies for product adoption. This paper was accepted by Axel Ockenfels, behavioral economics and decision analysis. Funding: T. Liu was supported by Natural Science Foundation of China [Grant 72222005] and Tsinghua University [Grant 2022Z04W01032]. J. Tang was supported by Natural Science Foundation of China for Distinguished Young Scholar [Grant 61825602]. Supplemental Material: The data files and online appendices are available at https://doi.org/10.1287/mnsc.2023.4906 .
- Research Article
12
- 10.2139/ssrn.3307057
- Jan 4, 2019
- SSRN Electronic Journal
Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured Information
- Research Article
- 10.1287/mnsc.2023.03679
- Oct 21, 2025
- Management Science
We analyze the optimal boundary for open data in an economy where financial and real-sector participants access both open and private data. The distinctive features of open access and nonrivalrous usage of open data enable its dual roles as a public information source and innovation input but raise privacy concerns. Our model reveals a novel tradeoff: Although enhanced private data precision and data skills substitute for open data’s information source role, its ability to amplify innovation benefits (via improved investment efficiency) establishes a crucial complementary relationship. This induces a crowding-in effect on the optimal open data boundary under low uncertainty but a crowding-out effect under high uncertainty. The innovation role of open data further generates nonmonotonic effects, yielding complex nonlinear impacts on market and real efficiency. These findings highlight critical policy tradeoffs in balancing innovation, market efficiency, and privacy in the digital age. This paper was accepted by Bo Becker, finance. Funding: Z. Wang acknowledges financing from the National Natural Science Foundation of China [Grants 72442025 and 72272028] and the Graduate Education Reform of Dongbei University of Finance and Economics [Grant yjzd202309]. Z. Qiu acknowledges financing from the Major Program of the National Natural Science Foundation of China [Grant 72192804] and the National Key Research and Development Program of China [Grant 2023YFC3304701]. Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mnsc.2023.03679 .
- Research Article
27
- 10.1287/mnsc.2023.4731
- Mar 17, 2023
- Management Science
Battery swapping for electric vehicle refueling is reviving and thriving. Despite a captivating sustainable future where swapping batteries will be as convenient as refueling gas today, a tension is mounting in practice (beyond the traditional “range anxiety” issue): On one hand, it is desirable to maximize battery proximity and availability to customers. On the other hand, capacitated urban power grids may curb decentralized charging at a slow speed. To reconcile this tension, some cities are embracing an emerging infrastructure network: Decentralized swapping stations replenish charged batteries from centralized charging stations. It remains unclear how to design such a network or whether pooling charging demands will save costs or batteries. In this paper, we model this new urban infrastructure network. This task is complicated by non-Poisson swaps and by the intertwined stochastic operations of swapping, charging, stocking, and circulating batteries among swapping and charging stations. We tackle these complexities by deriving analytical models, which enrich the classical batched repairable-inventory theory. We next propose a joint location and repairable-inventory model for citywide deployment of hub charging stations, with a nonconvex nonconcave objective function. We solve this problem exactly by exploiting submodularity and combining constraint-generation and parameter-search techniques. Even for solving convexified problems, our algorithm brings a speedup of at least three orders of magnitude relative to the Gurobi solver. The major insight is twofold: The benefit of pooling charging demands alone is not enough to justify the adoption of the “swap-locally, charge-centrally” network; instead, the main justification is that faster charging accessible at centralized charging stations significantly reduces the system-wide battery stock level. In a broader sense, this work deepens our understanding of how mobility and energy are coupled toward enabling smart cities. This paper was accepted by Chung Piaw Teo, optimization. Funding: Y. Zhang acknowledges the support from the National Natural Science Foundation of China [Grants 71871023, 72271029, and 72061127001]. W. Qi acknowledges the support from the National Natural Science Foundation of China [Grants 72272014 and 72188101] and the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2019-04769]. N. Zhang acknowledges the support from the China Scholarship Council [202106030140]. Supplemental Material: The data files and online appendices are available at https://doi.org/10.1287/mnsc.2023.4731 .
- Research Article
2
- 10.1287/mnsc.2022.02792
- Feb 1, 2025
- Management Science
Hedge funds with larger macroeconomic-risk betas do not earn higher returns, in contrast to the theoretically predicted risk-return trade-off. Meanwhile, high macro-beta funds deliver higher returns than low macro-beta funds following a low-sentiment period, whereas the risk-return relation is flat following a high-sentiment period. We show that the sophisticated management of hedge funds explains this pattern. The relation between funds’ macro-risk betas and the timing abilities/investor flows is sentiment dependent, and such variation likely drives the contrasting beta-return trade-offs after high- and low-sentiment periods. A similar pattern is also observed in mutual funds. This paper was accepted by Lin William Cong, finance. Funding: X. Zhu acknowledges financial support from the National Natural Science Foundation of China [Grant 72203035] and the Ministry of Education Project of Humanities and Social Sciences [Grant 22YJC790194]. Z. Chen acknowledges financial support from the National Natural Science Foundation of China [Grant 72222004] and Tsinghua University [Grant 20225080020]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.02792 .
- Research Article
2
- 10.1287/mnsc.2023.01459
- Feb 11, 2025
- Management Science
Policies geared toward environmental and economic improvement could unexpectedly lead to negative consequences in other dimensions. Such cases raise a red flag to economists and policymakers who aim to deliver comprehensive and sensible policy evaluations. This article investigates antisocial behaviors in response to the Clean Winter Heating Policy (CWHP), which seeks to improve outdoor air quality. Our results show that participating villagers are more likely to violate laws to burn agricultural waste and exhibit lower prosociality in incentivized dictator games and public goods games. We further explore treatment heterogeneities and find that two channels are likely to play a part. First, the CWHP was perceived as a negative income shock. Therefore, the villagers would want to reduce their expenditure on straw disposal and behave less generously in the incentivized games. Second, the CWHP could trigger discontent and directly affect social preference. Additional evidence suggests that the antisocial (less prosocial) responses could have been avoided by granting larger upfront subsidies. This paper was accepted by Axel Ockenfels, behavioral economics and decision analysis. Funding: J. Cao gratefully acknowledges financial support from the National Natural Science Foundation of China [Grants 72243007 and 72250064] and the Ministry of Science and Technology of the People’s Republic of China [Grant 2023YFE0112900]. T. X. Liu gratefully acknowledges financial support from the National Natural Science Foundation of China [Grants 72222005 and 72342032]. R. Ma gratefully acknowledges financial support from the National Natural Science Foundation of China [Grants 72134006 and 72304272]. A. Sun gratefully acknowledges financial support from the National Natural Science Foundation of China [Grant 72373157], Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China [Grant 22XNA003]. The authors are also thankful for the support from the Energy Foundation, China Southern Power Grid Co., Ltd., Research Center for Green Economy and Sustainable Development and Institute for Global Development of Tsinghua University, and the Harvard-China Project on Energy, Economy and Environment. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.01459 .
- Research Article
- 10.1287/msom.2023.0531
- Apr 29, 2025
- Manufacturing & Service Operations Management
Problem definition: Distributionally robust optimization (DRO) is ubiquitous to address uncertainties inherent in operations management (OM) problems. Recently, an alternative goal-driven framework, robust satisficing (RS), is proposed. RS aims to attain a prescribed target, such as avoiding overshooting the cost budget, as much as possible under uncertainty. The goal-driven modeling philosophy fits many OM problems, yet there is a lack of direct comparisons between DRO and RS. In this paper, we uncover connections between DRO and RS. Methodology/results: Suppose both models are based on the Wasserstein metric and consider a risk-aware convex objective function affected by uncertain parameters. We demonstrate that they share the same solution family. We establish the correspondence between the radius parameter in DRO and the target parameter in RS such that the optimal solutions to the two models coincide. Inspired by the globalized distributionally robust counterpart (GDRC), we extend the analysis to GDRC and the globalized robust satisficing (GRS). We reveal that GDRC and GRS have the same solution families as DRO and RS, respectively. More importantly, we establish novel results on the equivalence of DRO, GDRC, RS, and GRS models under previously stated conditions. Managerial implications: The equivalence results help unify performance bounds of DRO and RS models. Specifically, each model now has an additional set of theoretical guarantees from the other model, and any bounds derived for one model automatically apply to other equivalent models via some parameter mapping. Despite the theoretical equivalence result, the performance of the DRO and RS models can vary depending on how the model parameters are selected. The experimental findings show how these differences emerge when transitioning from theory to practice. Additionally, the experiments provide insights for practitioners, such as how the use of cross-validation can help reflect the true model preferences, particularly when only a few validation points are set. Funding: The research of Z. Wang and L. Ran was supported by the National Natural Science Foundation of China [Grants 72272014, 91746210, and 72061127001]. Z. Wang’s research was also supported by the National Natural Science Foundation of China [Grant 72242106]. The research of M. Zhou was supported by the National Natural Science Foundation of China [Grants 72301075, and 72293564/72293560]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2023.0531 .
- Research Article
- 10.1287/mnsc.2022.02414
- Sep 3, 2024
- Management Science
We propose a model of bank monitoring and borrower financial misreporting. Using the staggered liberalization of the banking sector in China as a natural experiment, we find that, consistent with the model’s prediction, entry by more efficient foreign banks reduces corporate misreporting fraud. Fraud reduction is greatest among borrowers of foreign banks, but fraud also drops among borrowers of domestic banks, suggesting a spillover effect. As predicted by the model, fraud reduction is greatest for borrowers with higher levels of fixed assets or lower levels of current assets. Our evidence suggests that improved bank monitoring reduces financial misreporting. This paper was accepted by Tomasz Piskorski, finance. Funding: M. Li acknowledges support from the National Science Foundation of China [Project 71402078] and the Social Science Foundation of Tsinghua University [Project 2013WKZD004]. Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mnsc.2022.02414 .
- Research Article
18
- 10.1287/msom.2022.1135
- Aug 19, 2022
- Manufacturing & Service Operations Management
Problem definition: In brick-and-mortar fashion retail stores, inventory stockouts are frequent. When a specific size of a fashion product is out of stock, the unmet demand might not be completely lost because of spillovers to adjacent sizes of the same style or to other styles. Little research has been done to study consumer response to stockouts of fashion products because researchers had limited access to proprietary data of fashion retailers and because it is challenging to estimate stockout-based demand spillover patterns using existing approaches due to the enormous number of stockkeeping units (SKUs) and frequent stockouts in fashion retail stores. To fill this void in the literature, we empirically estimate the stockout-based demand spillover effect in a fashion retail setting. Methodology/results: We obtain a large-scale data set from a fashion retail chain selling world-renowned sportswear brands. The retail stores in the sample are dedicated to products of a single brand. Using around 1.5 million granular and real-time sales and inventory records of 217 stores, 503 men’s footwear products, and 4,024 SKUs over a two-year period, we develop a difference-in-differences framework to estimate the stockout-based cross-size demand spillover effect. We demonstrate the validity of this framework by conducting a pretrend test and a placebo test. We find that roughly 51.7% of the unmet demand of an out-of-stock SKU spills over to adjacent sizes of the same style when they are in stock: 25.1% to the adjacent-larger size and 26.6% to the adjacent-smaller size. The cross-size demand spillover effect is larger in regular stores than in flagship stores, larger for casual sports shoes than for specialized sports shoes, and larger for low-price products than for high-price products. Adapting an existing attribute-based demand model to our setting, we estimate that roughly 20.2% of the unmet demand of an out-of-stock SKU spills over to different styles when they are in stock. Taken together, these estimations suggest that about 28.1% of the unmet demand of an out-of-stock SKU becomes lost sales. We further find that when stockouts are widespread among SKUs, stockout-based demand spillovers are significantly reduced, resulting in much increased lost sales. Managerial implications: First, we empirically quantify the stockout-based cross-size demand spillover effect and its impact on lost sales in a brick-and-mortar fashion retail setting. Second, our simulation analysis shows that incorporating the cross-size demand spillover effect into the sportswear retail chain’s proactive transshipment decision can substantially reduce its transshipment cost and improve its profitability. Funding: S. Li and S. Huang were supported by the National Natural Science Foundation of China [Grant 72188101] and the Center for Data Centric Management in the Department of Industrial Engineering at Tsinghua University. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2022.1135 .
- Conference Article
3
- 10.1109/iccoins49721.2021.9497187
- Jul 13, 2021
A high volume of unstructured data is being generated from diverse and heterogeneous sources. The unstructured data analytics process is used to extract valuable insights from these unstructured data sources but unlocking useful and usable information is critical for analytics. Despite advancements in technologies, data preparation requires an inordinate amount of time in unstructured data manipulation into a usable form. Although several data manipulation and preparation techniques have been proposed for unstructured big data, relatively limited research has addressed the usability issues of unstructured data. This study identifies the usability issues of unstructured big data for the analytical process to bridge the identified gap. The usability enhancement model has been proposed for unstructured big data to facilitate the subjective and objective efficacy of unstructured big data for data preparation and manipulation activities. Moreover, concept mapping is an essential element to improve the usability of unstructured big data incorporated in the proposed model with usability rules. These rules reduce the usability gap between data availability and its usefulness for an intended purpose. The proposed research model will help to improve the efficiency of unstructured big data analytics.
- Research Article
12
- 10.31635/ccschem.022.202202410
- Dec 22, 2022
- CCS Chemistry
A Cleavable Self-Inclusion Conjugate with Enhanced Biocompatibility and Antitumor Bioactivity
- Research Article
- 10.1287/mnsc.2023.04183
- Oct 28, 2025
- Management Science
Health sensing for chronic disease management creates immense benefits for social welfare. Existing health sensing studies primarily focus on the prediction of physical chronic diseases. Depression, a widespread complication of chronic diseases is, however, understudied. We draw on the medical literature to support depression detection using motion sensor data. To connect humans in this decision making, safeguard trust, and ensure algorithm transparency, we develop an interpretable deep learning model: temporal prototype network (TempPNet). TempPNet is built on the emergent prototype learning models. To accommodate the temporal characteristic of sensor data and the progressive property of depression, TempPNet differs from existing prototype learning models in its capability of capturing temporal progressions of prototypes. Extensive empirical analyses using real-world motion sensor data show that TempPNet outperforms state-of-the-art benchmarks in depression detection. Moreover, TempPNet interprets its decision by visualizing the temporal progression of depression and its corresponding symptoms detected from sensor data. We further employ a user study and a medical expert panel to demonstrate its superiority over the benchmarks in interpretability. This study offers an algorithmic solution for impactful social good—collaborative care of chronic diseases and depression in health sensing. Methodologically, it contributes to extant literature with a novel interpretable deep learning model for depression detection from sensor data. Patients, doctors, and caregivers can deploy our model on mobile devices to monitor patients’ depression risks in real time. Our model’s interpretability also allows human experts to participate in the decision making by reviewing the interpretation and making informed interventions. This paper was accepted by D. J. Wu, information systems. Funding: J. Xie and X. Fang are supported by the University of Delaware Research Foundation Strategic Initiatives Grant and Alfred Lerner College of Business and Economics Research Grant, X. Zhao acknowledges financial support from the National Natural Science Foundation of China [Grant 72401172] and the Fundamental Research Funds for the Central Universities [Grant 2023110139, 2023110318]. J. Xie and X. Fang did not receive any form of support from, nor do they have any affiliation with, X. Zhao’s funding sources. Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mnsc.2023.04183 .
- Research Article
17
- 10.1287/mnsc.2023.4733
- Mar 10, 2023
- Management Science
As the industrial Internet-of-things (IIoT) is becoming increasingly valuable, manufacturers are eager to establish IIoT-based platforms for preventative maintenance (PM). These platforms reposition the roles of manufacturers and reshape the patterns of the after-sales service market. Manufacturers can adopt either the competitive strategy by introducing improved after-sales services to compete with independent maintenance, repair, and operations firms (MROs) or the “coopetitive” strategy by simultaneously opening the platform to these MROs. However, relevant research on this topic remains scarce. Hence, our study fills this gap by investigating a manufacturer’s decision for the IIoT-based platform and the subsequent relationship with an MRO. First, we find that even when the product value is relatively low and the IIoT adoption increases PM cost; interestingly, it is sometimes beneficial for the manufacturer to establish the platform. Next, we find that even if the royalty revenue is lower than the increased IIoT costs, the manufacturer may sometimes still adopt the coopetitive strategy of opening the platform. Moreover, with the opening of the platform, the manufacturer pays more technology investment, even in the competitive market. Furthermore, whenever the manufacturer opens the platform, the MRO can profit more by accessing the platform. Hence, there is sometimes a win-win equilibrium with the IIoT adoption. In addition, we find that the opening of the platform sometimes causes less customer surplus but generates more social welfare. Following the IIoT adoption, our findings offer insightful takeaways for the manufacturer’s decisions on the establishment and opening of the platform, the MRO’s reaction, and policymakers’ welfare policies. This paper was accepted by Jeannette Song, operations management. Funding: This work was supported by the National Natural Science Foundation of China [Grants 71922009, 72188101, 71871080, and 72071057]. Supplemental Material: The data files and online appendices are available at https://doi.org/10.1287/mnsc.2023.4733 .
- Research Article
- 10.1287/mnsc.2022.01530
- May 8, 2024
- Management Science
This paper examines the incentive to register for deceased organ donation under alternative organ allocation priority rules, which may prioritize registered donors and/or patients with higher valuations for organ transplantation. Specifically, the donor priority rule grants higher priority on the organ waiting list to those who have previously registered as donors. The dual-incentive priority rules allocate organs based on donor status, followed by individual valuations within the same donor status, or vice versa. Both theoretical and experimental results suggest that the efficacy of the donor priority rule and the dual-incentive priority rules critically depends on the information environment. When organ transplantation valuations are unobservable prior to making donation decisions, the hybrid dual-incentive rules generate higher donation rates. In contrast, if valuations are observable, the dual-incentive priority rules create unbalanced incentives between high- and low-value agents, potentially undermining the efficacy of the hybrid dual-incentive rules in increasing overall donation rates. This paper was accepted by Marie Claire Villeval, behavioral economics and decision analysis. Funding: This research is supported by the National Natural Science Foundation of China [Grants 72173103, 72373127, and 71988101], the Singapore Ministry of Education (MOE) Academic Research Fund Tier 1 [RG57/20], and the Open Foundation of Key Laboratory of Interdisciplinary Research of Computation and Economics (Shanghai University of Finance and Economics), Ministry of Education of China. Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mnsc.2022.01530 .
- Research Article
- 10.1287/mnsc.2022.01896
- May 9, 2025
- Management Science
Firms increasingly use a combination of image and text description when displaying products and engaging consumers. Existing research has examined consumers’ response to text and image stimuli separately but has yet to systematically consider how the semantic relationship between image and text impacts consumer choice. In this research, we conduct a series of multimethod empirical studies to examine the congruence between image- and text-based product representation. First, we propose a deep-learning approach to measure image-text congruence by building a state-of-the-art two-branch neural network model based on wide residual networks and bidirectional encoder representations from transformers. Next, we apply our method to data from an online reading platform and discover a U-shaped effect of image-text congruence: Consumers’ preference toward a product is higher when the congruence between the image and text representation is either high or low than when the congruence is at the medium level. We then conduct experiments to establish the causal effect of this finding and explore the underlying mechanisms. We further explore the generalizability of the proposed deep-learning model and our substantive finding in two additional settings. Our research contributes to the literature on consumer information processing and generates managerial implications for practitioners on how to strategically pair images and text on digital platforms. This paper was accepted by Duncan Simester, marketing. Funding: J. Cao acknowledges financial support from Young Scientists Fund of National Natural Science Foundation of China [Grant 72402192], the General Research Fund [Grant 17501423] and Early Career Scheme [Grant 27502521] of the Research Grants Council of Hong Kong, and the Institute of Behavioural and Decision Science, the University of Hong Kong (HKU). Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mnsc.2022.01896 .
- New
- Research Article
- 10.1287/mnsc.2023.03895
- Nov 4, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2024.08469
- Nov 3, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2023.01192
- Nov 3, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2024.08815
- Nov 3, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2023.03157
- Nov 3, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2025.eb.v71n11
- Nov 1, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2024.06469
- Nov 1, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2024.04939
- Oct 30, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2024.05922
- Oct 29, 2025
- Management Science
- New
- Research Article
- 10.1287/mnsc.2022.02048
- Oct 29, 2025
- Management Science
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.