Back
Up
Next
35 Language Models
Character models Corpus Size H (bpc) Reference
LZ (compress) book1 731Kb 3.151 compress 4.3d (1990)
LZ (zip) book1 731Kb 2.956 PKZIP 2.04e (1993)
LZ (gzip -9) book1 731Kb 2.921 gzip 1.2.4 (1993)
PPMC5 (ha -a2) book1 731Kb 2.141 ha 0.98, Hirvola 1993
BW (szip) book1 731Kb 2.102 szip 1.05x, Schindler 1997, 1998
Neural net, 2 layer book1 731Kb 2.062 Mahoney, 1999
PPMZ (boa -m15) book1 731Kb 1.962 boa 0.58b, Sutton 1998
PPMZ (rkive) book1 731Kb 1.943 rkive 1.91b1, Taylor 1998
BW (szip) book1 768Kb 2.345 szip 1.05x
BW book1 768Kb 2.49 Burrows, Wheeler, 1994
Hector 103Mb 2.01
PPM* book1 768Kb 2.40 Cleary, Teahan, Witten, 1995
Neural net, 3 layer Munchner 600Kb 2.89 Schmidhuber, Heil, 1996
PPM5 Malone 46Kb 2.402 Teahan, Cleary, 1996
Malone 6.6Mb 1.598
PPM5+bigrams Malone 6.6Mb 1.488 Teahan, Cleary, 1997
PPM5 WSJ 15.4Mb 1.602 Teahan, Cleary, 1997
Symbol ranking Calgary 3.1Mb 3.1 Fenwick, 1997
Lexical Models Corpus Size H Reference
WDLZW text? 62Kb 2.88 Jiang, Jones, 1992
Bigram LOB 6Mb 2.104 Ney, Essen, Kneser, 1995
Trigram WSJ 250Mb 1.325 Kneser, Ney, 1995
Trigram WSJ 250Mb 1.341 Seymore, Rosenfeld, 1996
5-gram scaled NAB 1.32Gb 1.301 Kneser, 1996
n-gram+phrases SWB 11 Mb 1.226* Ries, Buo, Waibel, 1996
4-gram scaled WSJ 250Mb 1.284 Ristad, Thomas, 1997
Trigram BNC 550Mb 1.398 Clarkson, Robinson, 1997
Trigram+distant bigrams WSJ 25Mb 1.437 Martin, Ney, Zaplo, 1999
Semantic models Corpus Size H Reference
Bigram+topic+cache LOB 6Mb 2.028 Kneser, Steinbiss, 1993
Phrase bigrams+5 topics RR 330Kb 0.823* Giachen, 1995
Func. & content trigram Vermobil 281Kb 1.927* Geutner, 1996
Trigram+triggers WSJ 210Mb 1.221 Rosenfeld, 1996
Trigram+triggers WSJ 25Mb 1.446 Simons, Ney, Martin, 1997
Trigram+topic+cache BNC 550Mb 1.303 Clarkson, Robinson, 1997
Bigram+LSA WSJ 250Mb 1.325 Bellegarda, 1998
Trigram+cache+topic (IR) WSJ 210Mb 1.283 Mahajan, Beeferman, Huang, 1999
Trigram+topic SWB 11.5Mb 1.211* Khudanpur, Wu, 1999
Syntactic models Corpus Size H Reference
Trigram+POS Office 137Mb 1.702 Jelinek, Mercer, Roikos, 1990
Tagged Malone 6.6Mb 1.433 Teahan, Cleary, 1998
WSJ 5.63Mb 1.490
Human models Corpus Size H Reference
Character ranking Malone ? 0.6-1.3 Shannon, 1950
Gambling Malone ? < 1.3 Cover, King, 1978
Gambling Malay ? < 1.3 Tan, 1981
*Speech (excluded)