Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond

Hao Liu,Dejing Dou,Qingyu Guo,Hengshu Zhu,Hui Xiong,Fuzhen Zhuang,Shenwen Yang

doi:10.1145/3511896

Abstract

Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team’s competition performance, and investigate the team’s evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams’ future performances. By leveraging team’s rank as a proxy, we observe “the stronger, the stronger” rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team’s future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team’s future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data

Lead the way for us

Journal: ACM Transactions on Knowledge Discovery from Data	Publication Date: Apr 5, 2022
Citations: 2

Similar Papers

Deep feature synthesis: Towards automating data science endeavors
James Max Kanter ... Kalyan Veeramachaneni
-
James Max Kanter, et. al.James Max Kanter ... Kalyan Veeramachaneni
01 Oct 2015
01 Oct 2015

An analysis of design process and performance in distributed data science teams
Torsten Maier ... Christopher Mccomb
Team Performance Management: An International Journal | VOL. 25
Torsten Maier, et. al.Torsten Maier ... Christopher Mccomb
25 Sep 2019
Team Performance Management: An International Journal | VOL. 25

DSLE: A Smart Platform for Designing Data Science Competitions
Giuseppe Attanasio ... Andrea Pasini
-
Giuseppe Attanasio, et. al.Giuseppe Attanasio ... Andrea Pasini
01 Jul 2020
01 Jul 2020

Data Science Knowledge and Skills That Reliability Engineers Need: A Survey
Altricia Jordan ... Daniel Berleant
-
Altricia Jordan, et. al.Altricia Jordan ... Daniel Berleant
23 Jan 2023
23 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data