Back Up Next

A Neural Network Character Model

Goal: Establish a framework for a "complete" model.

Prediction (language modeling) + arithmetic encoding = text compression.

Input layer - sliding window of last 5 characters
Hidden layer - 4 ´ 10⁶ contexts of 1-5 characters
Output - 1 unit = P(next bit is a 1)

First layer

Fixed weights
Computed using a fast hash function
Only 5 hidden units active at any time

Second layer

Weights adjusted to minimize prediction error
Learning rate is a function of outcomes (count of 0's and 1's) at each connection

Results

Tuned on Alice in Wonderland, tested on book1 (Far from the Madding Crowd, Hardy)

Program Version Year Type Alice bpc Compress Decomp. Book1 bpc
compress 4.3d 1990 Limpel Ziv 3.270 1 sec 0.5 sec 3.486
pkzip 2.04e 1993 Limpel Ziv 2.884 6 sec <0.5 sec 3.288
gzip -9 1.2.4 1993 Limpel Ziv 2.848 7 sec 0.5 sec 3.250
szip -b41 -o0 1.05Xf 1998 Burrows Wheeler 2.239 9 sec 6 sec 2.345
ha a2 0.98 1993 PPMC5 2.171 16 sec 16 sec 2.453
boa -m15 0.58b 1998 PPMZ 2.061 33 sec 34 sec 2.204
rkive -mt3 1.91b1 1998 PPMZ 2.055 134 sec 117 sec 2.120
Neural Network P6 1999 NN 2.129 26 sec 26 sec 2.283

Program	Version	Year	Type	Alice bpc	Compress	Decomp.	Book1 bpc
compress	4.3d	1990	Limpel Ziv	3.270	1 sec	0.5 sec	3.486
pkzip	2.04e	1993	Limpel Ziv	2.884	6 sec	<0.5 sec	3.288
gzip -9	1.2.4	1993	Limpel Ziv	2.848	7 sec	0.5 sec	3.250
szip -b41 -o0	1.05Xf	1998	Burrows Wheeler	2.239	9 sec	6 sec	2.345
ha a2	0.98	1993	PPMC5	2.171	16 sec	16 sec	2.453
boa -m15	0.58b	1998	PPMZ	2.061	33 sec	34 sec	2.204
rkive -mt3	1.91b1	1998	PPMZ	2.055	134 sec	117 sec	2.120
Neural Network	P6	1999	NN	2.129	26 sec	26 sec	2.283

See also Fast Text Compression with Neural Networks, to appear, AAAI Proceedings, 2000.