Hidden Markov Models (HMM)
- State transition rules: P(noun | det adj) = 0.23
- Output rules: P(dog | noun) = 0.008
Probabilistic Context Free Grammars (PCFG)
Extends HMM with nonterminal symbols.
- noun-phrase ::= det adj noun, p = 0.18
- sentence ::= noun-phrase verb noun-phrase, p = 0.29
Unsupervised PCFG learning has not been demonstrated
Natural language is ambiguous, depends on semantics.
- I ate pizza with pepperoni.
- I ate pizza with Joe.
- I ate pizza with a fork.
Artificial languages (C++, Java) model syntax before semantics
- Lexical - extract tokens
- Syntax - parse
- Semantics - code generation
Humans learn semantics first
- Lexical - Infants segment speech at 7-10.5 months (Jusczyk, 1996)
- Semantics - First words (nouns and verbs) at 12 months
- Syntax - Phrases by age 2, simple sentences by age 3