Back
Up
Next
Learning Word Boundaries
Goal: Establish the learnability of natural language.
Can natural language be learned just from examples?
- Osterson, Stob, Weinstein (1986): Not all languages are learnable
(for example, the class of recursive languages is not learnable).
- Jusczyk (1996): Infants learn to segment speech into words
at 10.5 months (before learning any words).
- Hutchens and Alder (1998): Words start at high entropy boundaries.
Find the word boundaries without knowing English vocabulary
in...
alicewasbeginningtogetverytiredofsittingbyhersisteronthebankandofhavingnothing
todoonceortwiceshehadpeepedintothebookhersisterwasreadingbutithadnopicturesorc
onversationsinitandwhatistheuseofabookthoughtalicewithoutpicturesorconversatio
Boundary entropy
Is there a boundary between suit and case in
suitcase?
H(c|suit) + H(t|case) > 4.2 bpc? (n = 5)
Boundary detection in Alice in Wonderland
n | Threshold (bpc) | Recall/ Precision
|
---|
2 | 7.5 | 0.41
|
3 | 6.9 | 0.63
|
4 | 5.7 | 0.75
|
5 | 4.2 | 0.77
|
See also A Note on Lexical Acquisition in Text
without Spaces (unpublished).