Back Up Next

Character models

Limpel Ziv

Z (compress), ZIP, GZ (gzip), GIF (very fast, but poor compression)

  Input:        the cat in the hat

                     +----------+
                +----+------+   |
                v    v      |   |
  Compressed:   the cat in (4)h(2)

Predictive arithmetic encoding

PPMC, PPMZ, neural network (slower but better compression)

Compression                              P(a) = .04
                       +-----------+     P(b) = .003     +---------+
  the cat in th_  -->  | Predictor | --> ...         --> | Encoder | --> X
                       +-----------+     P(e) = .3       +---------+     |
                                         ...                  ^          |
                                                          e --+          |
                                                                         |
                                                              +----------+
Decompression                            P(a) = .04           v
                       +-----------+     P(b) = .003     +---------+
  the cat in th_  -->  | Predictor | --> ...         --> | Decoder | --> e
               ^       +-----------+     P(e) = .3       +---------+     |
               |                         ...                             |
               +---------------------------------------------------------+



   0                     .7   .8       1
   +-----------------------------------+
   | a |b| c|d|  e  | ... |  t | ..... |
   +-----------------------------------+
       /                            \
     /                                \
   .7           .74     .76           .8
   +-----------------------------------+
   |  a  |||| e ||   h   | i |||| o |..|
   +-----------------------------------+
       /                           \
     /                               \
  .74        .746         .752       .76
   +-----------------------------------+
   |    a      ||     e     ||  i | .. |    "the" = .75  (11 in binary)
   +-----------------------------------+

P(the) = P(t)P(h|t)P(e|th) = .752 - .746 = .006

Optimum code length = log2 1/0.006 » 7.38 bits

Arithmetic encoding is always within 1 bit of optimal (8 bits or less).