Back Up Next

# Character models

### Limpel Ziv

Z (compress), ZIP, GZ (gzip), GIF (very fast, but poor compression)
```
Input:        the cat in the hat

+----------+
+----+------+   |
v    v      |   |
Compressed:   the cat in (4)h(2)

```

### Predictive arithmetic encoding

PPMC, PPMZ, neural network (slower but better compression)
```
Compression                              P(a) = .04
+-----------+     P(b) = .003     +---------+
the cat in th_  -->  | Predictor | --> ...         --> | Encoder | --> X
+-----------+     P(e) = .3       +---------+     |
...                  ^          |
e --+          |
|
+----------+
Decompression                            P(a) = .04           v
+-----------+     P(b) = .003     +---------+
the cat in th_  -->  | Predictor | --> ...         --> | Decoder | --> e
^       +-----------+     P(e) = .3       +---------+     |
|                         ...                             |
+---------------------------------------------------------+

0                     .7   .8       1
+-----------------------------------+
| a |b| c|d|  e  | ... |  t | ..... |
+-----------------------------------+
/                            \
/                                \
.7           .74     .76           .8
+-----------------------------------+
|  a  |||| e ||   h   | i |||| o |..|
+-----------------------------------+
/                           \
/                               \
.74        .746         .752       .76
+-----------------------------------+
|    a      ||     e     ||  i | .. |    "the" = .75  (11 in binary)
+-----------------------------------+

```
P(the) = P(t)P(h|t)P(e|th) = .752 - .746 = .006

Optimum code length = log2 1/0.006 » 7.38 bits

Arithmetic encoding is always within 1 bit of optimal (8 bits or less).