Abstract

Restricted accessMoreSectionsView PDF ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InRedditEmail Cite this article Rosenfeld Ronald 2000Incorporating linguistic structure into statistical language modelsPhil. Trans. R. Soc. A.3581311–1324http://doi.org/10.1098/rsta.2000.0588SectionRestricted accessIncorporating linguistic structure into statistical language models Ronald Rosenfeld Ronald Rosenfeld School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Google Scholar Find this author on PubMed Search for more papers by this author Ronald Rosenfeld Ronald Rosenfeld School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Google Scholar Find this author on PubMed Search for more papers by this author Published:15 April 2000https://doi.org/10.1098/rsta.2000.0588AbstractStatistical language models estimate the distribution of natural language for the purpose of improving various language technology applications. Ironically, the most successful models of this type take little advantage of the nature of language. I review the extent to which various aspects of natural language are captured in current models. I then describe a general framework, recently developed at our laboratory, for incorporating arbitrary linguistic structure into a statistical framework, and present a methodology for eliciting linguistic features currently missing from the model. Finally, I ponder our failure heretofore to integrate linguistic theories into a statistical framework, and suggest possible reasons for it. Previous ArticleNext Article VIEW FULL TEXT DOWNLOAD PDF FiguresRelatedReferencesDetailsCited by Toral A, Pecina P, Wang L and van Genabith J (2015) Linguistically-augmented perplexity-based data selection for language models, Computer Speech & Language, 10.1016/j.csl.2014.10.002, 32:1, (11-26), Online publication date: 1-Jul-2015. Devanbu P (2015) New Initiative: The Naturalness of Software 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), 10.1109/ICSE.2015.190, 978-1-4799-1934-5, (543-546) Wang S, Wang S, Cheng L, Greiner R and Schuurmans D (2012) EXPLOITING SYNTACTIC, SEMANTIC, AND LEXICAL REGULARITIES IN LANGUAGE MODELING VIA DIRECTED MARKOV RANDOM FIELDS, Computational Intelligence, 10.1111/j.1467-8640.2012.00436.x, 29:4, (649-679), Online publication date: 1-Nov-2013. Tan M, Zhou W, Zheng L and Wang S (2012) A Scalable Distributed Syntactic, Semantic, and Lexical Language Model, Computational Linguistics, 10.1162/COLI_a_00107, 38:3, (631-671), Online publication date: 1-Sep-2012. Lee R, Jonathan P and Ziman P (2010) Pictish symbols revealed as a written language through application of Shannon entropy, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 466:2121, (2545-2560), Online publication date: 8-Sep-2010. Lee S, Baker J, Song J and Wetherbe J (2010) An Empirical Comparison of Four Text Mining Methods 2010 43rd Hawaii International Conference on System Sciences, 10.1109/HICSS.2010.48, 978-1-4244-5509-6, (1-10) This Issue15 April 2000Volume 358Issue 1769Discussion Meeting Issue ‘Computers, language and speech: formal theories and statistical data’ organized by the Royal Society and the British Academy Article InformationDOI:https://doi.org/10.1098/rsta.2000.0588Published by:Royal SocietyPrint ISSN:1364-503XOnline ISSN:1471-2962History: Published online15/04/2000Published in print15/04/2000 License: Citations and impact KeywordsHuman language technologiesStatistical language modellingFeature induction

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.