Siamese BERT-Based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Matěj Kocián,Vladimír Kadlec,Jakub Náplava,Daniel Štancl

doi:10.1609/aaai.v36i11.21502

Siamese BERT-Based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Matěj Kocián, Vladimír Kadlec + Show 2 more

Open Access

https://doi.org/10.1609/aaai.v36i11.21502

Copy DOI

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 6

Affiliation: Snam (Italy)

#Czech Dataset #Commercial Search Engine + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.