Abstract

There exists a vast amount of data freely available that contains insights that can transform and accelerate progress in science, technology, and business. The great majority of these insights go undiscovered because it is so difficult and time-consuming to analyze data. A key source of this difficulty is that questions and analyses that take human scientists only a few seconds to formulate in human language can take days using existing computer interfaces and data analysis software. For example, consider the task of determining which are the five molecules that most highly correlate with the expression of genes regulating neurotransmitters involved in creating or erasing longterm memories. This is a task that now can take a highly trained researcher days to accomplish. This is true even though all the data required is publicly available and the computational power to analyze it is very inexpensive. Such data analyses are so expensive because of the computer interfaces people must use to perform them. Today, one cannot simply ask a computer in ordinary human language “Which are the five molecules that most highly correlate with the expression of genes regulating neurotransmitters involved in creating or erasing long-term memories” and get the answer. Instead, because of the limitations of computer natural language processing abilities, people must use cumbersome software and often write specially tailored programs to perform such analyses. If, however, these limitations of natural language processing were overcome, scientists would become dramatically more productive. They would be able to generate and explore hypotheses that in the past were too time-consuming. They could ask and answer, literally, orders of magnitudes more questions than they could in the past. A key challenge to elevating the natural language processing abilities of computers to the level where they will be useful in data analysis is the gulf between the nature of human and computer languages. Human language is generally ambiguous, incomplete, and often non-literal while computer languages are specifically and purposely designed to be unambiguous, complete, and literal. Although simple queries such as restaurants near boston can be understood by search engines and products such as Siri, many of the most important use cases require computers to understand much more complex utterances.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.