Abstract
This paper examines the challenging problem of new user cold starts in subset labelled and extremely sparsely labelled big data. We introduce a new Isle of Wight Supply Chain (IWSC) dataset demonstrating these characteristics. We also introduce a new technique addressing these challenges, the Transitive Semantic Relationships (TSR) model, which infers potential relationships from user and item text content and few labelled examples. We perform both implicit and explicit evaluation of TSR as a recommender system and from new user cold starts we achieve a hit-rate@10 of 77% on a collection of 630 items with only 376 supply-chain consumer labels, and 67% with only 142 supply-chain supplier labels, demonstrating a high level of performance even with extremely few labels in challenging cold-start scenarios. TSR is suitable for any dataset featuring few labels and user and item content, where similarity of content indicates similar relationship forming capability. TSR can be used as a standalone recommender system or to complement existing high-performance recommender models that require more labels or do not support cold starts.
Highlights
New Big Data recommendation systems face a high barrier to entry due to the large labelled data requirement of most existing recommendation techniques such as collaborative filtering and bespoke deep learning models such as Suglia et al [23]
A practical example of the former might be a collection of resumes and a collection of job adverts, while an example of the later might be descriptions of companies looking for supply chain opportunities, as in the Isle of Wight Supply Chain (IWSC) dataset on which we evaluate Transitive Semantic Relationships (TSR) later in this paper
It is of particular interest that this model was fine-tuned on the SNLI dataset [5], a set of sentence pairs labelled as contradiction, entailment, or unrelated; we speculate that this may require the model to learn similar linguistic features as are likely needed for the supply chain inference task as the ability to discern whether pairs of descriptions are entailed or contradictory is essential to human judgements for this task, in particular, in determining if companies serve similar supply chain roles
Summary
New Big Data recommendation systems face a high barrier to entry due to the large labelled data requirement of most existing recommendation techniques such as collaborative filtering and bespoke deep learning models such as Suglia et al [23]. In this paper we investigate this data problem and the limitations of existing recommender systems, and go on to introduce a new technique for providing personalised recommendations for new users even in highly challenging datasets where few labels are available for most items, and no labels are known for the new user, and without the need for the user to answer a questionnaire. We achieve this by using user and item content information in the form of natural language descriptive text to expand on the few or sparsely distributed known relationships in the dataset
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.