Computational Modeling of Language Development
As has long been argued in the field of first language acquisition, the input to grammar learning is impoverished: the grammatical structures of utterances are not present in the input, explicit corrections are rare, and in the case of pro-drop languages such as Mandarin Chinese, verb arguments are often omitted in the utterances. While this observation has in the past led to postulations that various aspects of language are innate, only in the more recent history have children been studied with an eye towards the ability they bring to the task of language learning.
Taking the view that grammars consist of conventionalized mappings between form and meaning, as per the construction grammar approach, I work on a computational model of child grammar learning demonstrating how the problem of impoverished input is alleviated by bootstrapping from context. By tracking situational and discourse events with a dynamically updated model of context, the learner is able to partially infer meaning structures of utterances, which in turn provides leverage for hypothesizing new grammatical structures. Using Embodied Construction Grammar, a unification-grammar based formalism, my model learns early grammatical constructions from situated language input in a Bayesian learning framework.
Situated Language Understanding
Language is used in context, thus a system for language understanding must take contextual information into account. This fact is true across all languages, but is particularly saliant for pro-drop languages such as Mandarin, where arguments (e.g. syntactic subject and objects) are often omitted and a complete understanding can only be obtained by retrieving omitted references from context.
In our group we worked on an improved grammar representation and supporting processing machinery to deal with omissible and optional constructional constituents. This allows for patterns of productive argument omission to be succinctly represented in the grammar without losing the generalizations across the patterns. The backbone support is a best-fit constructional analyzer, developed by my colleague John Bryant, along with a model of context that is dynamically updated through simulation. The current context model implementation uses a simple relational database, but we believe it to be a basis for extensions for more complicated linguistic phenomena such as mental spaces.
NTL framework
Most of the work I have done are based on notions of embodied semantics and simulation semantics as advocated by the Neural Theory of Language (NTL) project. More information can be found on the NTL webpage at:
http://www.icsi.berkeley.edu/NTL
|