Abstract

Prediction over tabular data is often a crucial task in many real-life applications. Recent advances in deep learning give rise to various deep models for tabular data prediction. A common and essential step in these models is to vectorize raw input features in tabular data into dense embeddings. Choosing a suitable dimension for each feature is challenging yet necessary to improve model&#x2019;s performance and reduce memory cost of model parameters. Existing solutions to embedding dimensionality search always choose dimensions from a restricted candidate set. This restriction improves the search efficiency but would produce suboptimal embedding dimensions that hurt model&#x2019;s predictive performance. In this paper, we develop AutoSrh, a flexible embedding dimensionality search framework that can select varying dimensions for different features through differentiable optimization. The key idea of AutoSrh&#x00A0;is to relax the search space to be continuous and optimize the selection of embedding dimensions via gradient descent. After optimization, AutoSrh&#x00A0;performs embedding pruning to derive the mixed embedding dimensions and retrains the model to further improve the performance. Extensive experiments on five real-world tabular datasets demonstrate that AutoSrh&#x00A0;can achieve better predictive performance than the existing approaches with 1.1<inline-formula><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula>1.6x lower training time cost and reserve model&#x2019;s predictive performance while reducing 50<inline-formula><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula>95&#x0025; embedding parameters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.