Abstract
In the Big Data era, informational systems involving humans and machines are being deployed in multifarious societal settings. Many use data analytics as subcomponents for descriptive, predictive, and prescriptive tasks, often trained using machine learning. Yet when analytics components are placed in large-scale sociotechnical systems, it is often difficult to characterize how well the systems will act, measured with criteria relevant in the world. Here, we propose a system modeling technique that treats data analytics components as `noisy black boxes' or stochastic kernels, which together with elementary stochastic analysis provides insight into fundamental performance limits. An example application is helping prioritize people's limited attention, where learning algorithms rank tasks using noisy features and people sequentially select from the ranked list. This paper demonstrates the general technique by developing a stochastic model of analytics-enabled sequential selection, derives fundamental limits using concomitants of order statistics, and assesses limits in terms of system-wide performance metrics like screening cost and value of objects selected. Connections to sample complexity for bipartite ranking are also made.
Highlights
There is an emerging ubiquity to data analytics that have multifarious machine learning and data mining algorithm subcomponents and that are embedded in sociotechnical systems, such as firms and cities
Data analytics have emerged as a key driver of value in business operations and allow firms to differentiate themselves in competitive markets (Apte et al, 2003; Davenport and Harris, 2007; Varshney and Mojsilović, 2011)
In the remainder of this paper, we demonstrate the approach of treating machine learning components as stochastic kernels in analyzing the performance of sociotechnical systems, through an example of sequential selection
Summary
There is an emerging ubiquity to data analytics that have multifarious machine learning and data mining algorithm subcomponents and that are embedded in sociotechnical systems, such as firms and cities. The easy theoretical approach is meant to yield insights for consumption by potential users of data systems, such as business executives or city government officials Such users are interested in understanding the basic trade-offs present in these systems under metrics they care about, knowing how much value an algorithm deployment effort can provide, and determining whether it is worthwhile spending time/energy in developing specific advanced algorithms. They are typically not interested in detailed evaluation of specific algorithm performance, which has. We describe how the approach was successfully used by human resource executives in a large multinational corporation and by government officials in a medium-sized American city
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.