Abstract
The Andersen-Forbes BH database is based upon the text of L, having omitted cantillations, corrected obvious errors, segmented or ligatured orthographic words by rule, and resolved homographs. Each segment has an associated set of grammatical features and an assistive gloss. Our linguistic preferences favour data-driven over theory-driven analyses, language performance over language competence, and quantitative over qualitative language models. In our research, we rely on successive approximations, planning at least one step ahead at each stage. In representing grammatical structure, we opt for simple descriptive features displayed in a single-level environment that allows representation of non-binary, discontinuous, and ambiguous situations. As our work has progressed, refinements and extensions have been added, among them naïve semantic categories, constituent licensing relations, and semantic roles. With proper care, the grammatical data may be productively probed. As we enhance the database’s consistency, we are also extending its linguistic coverage and refining its search methodology
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have