Abstract

In this study, we propose self‐retrieval learning, a self‐supervised learning method that does not require an annotated dataset. In self‐retrieval learning, keywords extracted from documents are used as queries to construct training data that imitate the relationship between query and corpus, such that the documents themselves are retrieved. In the usual supervised learning for information retrieval, a pair of query and corpus document is required as training data, but self‐retrieval learning does not require such data. In addition, it does not use information such as reference lists or other documents connected to the query, but only the text of the documents in the target domain. In our experiments, self‐retrieval learning was performed on the EU and UK legal document retrieval task using a retrieval model called DRMM. We found that self‐retrieval learning not only does not require supervised datasets, but also outperforms supervised learning with the same model in terms of retrieval accuracy. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.