Learning Word Boundaries
Goal: Establish the learnability of natural language.
Can natural language be learned just from examples?
- Osterson, Stob, Weinstein (1986): Not all languages are learnable
(for example, the class of recursive languages is not learnable).
- Jusczyk (1996): Infants learn to segment speech into words
at 10.5 months (before learning any words).
- Hutchens and Alder (1998): Words start at high entropy boundaries.
Find the word boundaries without knowing English vocabulary
Is there a boundary between suit and case in
H(c|suit) + H(t|case) > 4.2 bpc? (n = 5)
Boundary detection in Alice in Wonderland
|n ||Threshold (bpc) || Recall/ Precision
| 2 || 7.5 || 0.41
| 3 || 6.9 || 0.63
| 4 || 5.7 || 0.75
| 5 || 4.2 || 0.77
See also A Note on Lexical Acquisition in Text
without Spaces (unpublished).