Abstract
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user’s registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion
Highlights
In order to retrieve and explore database content, query interfaces are required
Human computer interfaces (HCI) make use of these application programming interfaces (API) to provide frontends to interact with the databases
After analyzing query logs of the LAILAPS search engine [18] and The Arabidopsis Information Resource (TAIR) [19], we have found that more than 61.5% queries are composed by more than one term
Summary
In order to retrieve and explore database content, query interfaces are required. These are, at a simplistic view, brokers between the user’s information needs and the database content that is accessible using declarative query languages like SQL or imperative application programming interfaces (API). Keywords need to be tokenized, i.e. decompose phrases into words, remove prefixes and punctuation, filter abundant tokens, correct spelling or map to representative vocabulary and synonyms Those tasks are well parameterizable to match the particular properties of database and search engines. Mikolov [11] published the Word2vec algorithm It uses a log-linear neuronal network to quickly train vector representations of words from large text corpora. An already compiled version is available as JAR package for an instant use in command line: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion The final trained model is stored as a binary encoded associative list of word to word vector pairs. To provide the query suggestion module as a remote API to the community, a RESTful web service was implemented It features to send a user typed query and retrieve a JSON based document with the top five alternative query suggestions. } An example API call for alternatives for “heading date” is http://webapps.ipk-gatersleben.de/lailapssuggestion/api/v1/semanticsuggestion/heading% 20date The JSON result is { “suggestions”:
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.