Input: the cat in the hat
+----------+
+----+------+ |
v v | |
Compressed: the cat in (4)h(2)
Compression P(a) = .04
+-----------+ P(b) = .003 +---------+
the cat in th_ --> | Predictor | --> ... --> | Encoder | --> X
+-----------+ P(e) = .3 +---------+ |
... ^ |
e --+ |
|
+----------+
Decompression P(a) = .04 v
+-----------+ P(b) = .003 +---------+
the cat in th_ --> | Predictor | --> ... --> | Decoder | --> e
^ +-----------+ P(e) = .3 +---------+ |
| ... |
+---------------------------------------------------------+
0 .7 .8 1
+-----------------------------------+
| a |b| c|d| e | ... | t | ..... |
+-----------------------------------+
/ \
/ \
.7 .74 .76 .8
+-----------------------------------+
| a |||| e || h | i |||| o |..|
+-----------------------------------+
/ \
/ \
.74 .746 .752 .76
+-----------------------------------+
| a || e || i | .. | "the" = .75 (11 in binary)
+-----------------------------------+
P(the) = P(t)P(h|t)P(e|th) = .752 - .746 = .006
Optimum code length = log2 1/0.006 » 7.38 bits
Arithmetic encoding is always within 1 bit of optimal (8 bits or less).