Abstract

This paper investigates the effectiveness of time-dependent data in improving the quality of AI-based products and services. Time-dependency means that data loses its relevance to problems over time. This loss causes deterioration in the algorithm's performance and, thereby, a decline in created business value. We model time-dependency as a shift in the probability distribution and derive several counter-intuitive results. We, theoretically, prove that even an infinite amount of data collected over time may have limited substance for predicting the future, and an algorithm that is trained on a current dataset of bounded size can attain a similar performance. Moreover, we prove that increasing data volume by including older datasets may put a company in a disadvantageous position. Having these results, we answer questions on how data volume creates a competitive advantage. We argue that time-dependency weakens the barrier to entry that data volume creates for a business. So much that competing firms equipped with a limited, but sufficient, amount of current data can attain better performance. This result, together with the fact that older datasets may deteriorate algorithms' performance, casts doubt on the significance of first-mover advantage in AI-based markets. We complement our theoretical results with an experiment. In the experiment, we empirically measure the value loss in text data for the next word prediction task. The empirical measurements confirm the significance of time-dependency and value depreciation in AI-based businesses. For example, after seven years, 100MB of text data becomes as useful as 50MB of current data for the next word prediction task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.