Abstract

We are in the middle of an historical paradigm shift. It is a change similar in scale to those confronting the Library of Alexandria, twenty-two centuries ago. Metadata, indexes and taxonomies were the paradigm during the age of paper and print, and librarians and publishers leveraged them for searching. Now the amount of documents has grown to levels that make those traditional tools less efficient for users and less affordable for publishers. But, in the last three decades, search technologies have created new solutions such as direct queries, relevance ranking or faceted results, as well as the promises of conceptual search engines and ontologies. However, this integration of legal knowledge has not yet proven scalable in large databases: the improvements in recall have a negative effect on precision and performance. We have focused in one key behavior of legal experts in legal searches: the creation of “better queries” as a result of knowledge of the domain and search techniques. This is the same that happens on taxonomical classical searches, but in full-text we could try to encode part of that knowledge in a search engine. To achieve this goal, we have developed both the technology to semantically analyze documents and queries, and a methodology to fill a dictionary with 10,000 concepts and 40,000 expressions. This has been put in production with a 3 million legal documents database. In addition to the semantic improvements, these developments have created significant improvements in the relevance algorithm and complementary tools such as dynamic summaries and query reformulation trough local context analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.