Matt Mahoney
Last update: July 21, 2009. history
This page is no longer maintained. The newest version can be found at
http://mattmahoney.net/dc/text.html
This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006. About the test data.
The goal of this benchmark is not to find the best overall compression program, but to encourage research in artificial intelligence and natural language processing (NLP). A fundamental problem in both NLP and text compression is modeling: the ability to distinguish between high probability strings like recognize speech and low probability strings like reckon eyes peach. Rationale.
This is an open benchmark. Anyone may contribute results. Please read the rules first.
Compression improvements to the first 108 bytes are eligible for the Hutter Prize, with 50,000 euros of funding.
Compressors are ranked by the compressed size of enwik9 (109 bytes) plus the size of a zip archive containing the decompressor. Options are selected for maximum compression at the cost of speed and memory. Other data in the table does not affect rankings. This benchmark is for informational purposes only. There is no prize money for a top ranking. Notes about the table:
Compression Compressed size Decompressor Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- durilca'kingsize -t2 -m13000 -o40 16,258,380 127,695,666 333,790 xd 128,029,456 1413 1805 13000 PPM 31 paq8hp12any -8 16,230,028 132,045,026 330,700 x 132,375,726 56993 1850 CM 22 drt|lpaq9m 9 17,964,751 143,943,759 110,579 x 144,054,338 2107 2151 1542 CM 26 xwrt 3.2|ppmonstr J (note 13) 18,456,706 148,915,761 79,404 sx 148,995,165 2987 2546 1650 PPM xwrt 3.2 -l14 -b255 -m96 -s -e40000 -f200 18,679,742 151,171,364 52,569 s 151,223,933 2537 2328 1691 CM nanozip 0.06a w32c -cc -m1670m 18,754,787 151,295,782 0 xd 151,295,782 2156 2173 1670 CM 26 WinRK 3.03 pwcm +td 800MB SFX 18,612,453 156,291,924 99,665 xd 156,391,589 68555 800 CM 10 ppmonstr J -m1700 -o16 19,055,092 157,007,383 42,019 x 157,049,402 3574 ~3600 1700 PPM slim 23d -m1700 -o12 19,077,276 159,772,839 69,453 x 159,842,292 5232 ~5400 1700 PPM bwmonstr 0.02 20,307,295 160,468,597 69,401 x 160,537,998 331801 156147 590 BWT 30 10 bbb m1000 20,847,290 164,032,650 11,227 s 164,043,877 4524 2619 1401 BWT paq9a -9 19,974,112 165,193,368 13,749 s 165,207,117 3997 4021 1585 CM uda 0.300 19,393,460 166,272,261 11,264 x 166,283,525 25282 25174 180 CM nanozipltcb 20,494,670 166,251,135 239,124 x 166,490,259 348 185 1729 BWT reorder_v2|bcm 0.08 e477 20,665,536 168,598,121 80,661 x 168,678,782 552 420 2385 BWT 28 cmm4 v0.1e 96 20,569,034 172,669,955 31,314 x 172,701,269 2052 2056 1321 CM ccmx 1.30 7 20,857,925 174,142,092 15,014 x 174,157,106 1313 1338 1332 CM bit 0.7 -p=5 20,823,204 174,425,039 62,493 x 174,487,532 2050 2100 663 CM 26 mcomp 2.00 -mw -M320m 21,103,670 174,388,351 172,531 x 174,560,882 473 399 1643 BWT 26 epmopt|epm r9 -m800 -n20 --fixedorder:12 19,713,502 174,817,424 141,101 x 174,958,525 3179 3376 800 PPM 20 WinUDA 2.91 mode 3 (194 MB) 20,332,366 174,975,730 17,203 x 174,992,933 23610 23473 194 CM dark 0.51 -b333mf 21,169,819 175,471,417 34,797 x 175,506,214 533 453 1692 BWT FreeArc 0.40pre-4 -mppmd:1012m:o13:r1 20,931,605 175,254,732 748,202 x 176,002,934 1175 1216 1046 PPM hook v1.4 1700 21,990,502 176,648,663 37,004 x 176,685,667 741 695 1777 DMC 26 7zip 4.46a -m0=ppmd:mem=1630m:o=10 ... 21,197,559 178,965,454 0 xd 178,965,454 503 546 1630 PPM 23 M99 v2.1 e -m 239m 21,251,170 178,910,174 68,052 x 178,978,226 713 535 1500 BWT ash 04a /m700 /o10 19,963,105 180,735,542 11,137 x 180,746,679 6100 5853 700 CM pimple2 20,871,457 180,251,530 78,642 x 180,330,172 18474 17992 128 CM ocamyd LTCB 1.0 -s0 -m3 21,285,121 182,359,986 21,030 x 182,381,016 108960~110000 300 DMC 6 bee 0.79 b0154 -m3 -d8 20,975,994 182,373,904 57,046 x 182,430,950 9295 9285 512 PPM 30 uhbc 1.0 -m3 -b100m 20,930,838 182,918,172 56,242 x 182,974,414 1569 809 800 BWT ppmd J1 -m256 -o10 -r1 21,388,296 183,964,915 11,099 s 183,976,014 880 895 256 PPM tc 5.2 dev 2 21,481,399 184,939,711 41,112 x 184,980,823 3637 3655 230 CM ppmvc v1.1 -m256 -o8 -r1 21,484,294 186,208,405 25,241 x 186,233,646 898 913 272 PPM chile 0.4 -b=244141 22,218,917 186,979,614 11,530 s 186,991,144 2513 512 1426 BWT CTXf 0.75 pre b1 -me 22,072,783 191,008,871 57,337 x 191,066,298 1112 1037 78 PPM rings 1.5 9 21,848,093 191,067,972 44,565 x 191,112,537 172 189 426 BWT m03exp 2005-02-15 32MB blocks 21,948,192 191,250,500 44,593 x 191,295,093 ~4800 ~2100 256 BWT Stuffit 12.0.0.17 -m=4 -l=16 -x=30 22,105,654 190,372,707 2,658,122 xd 193,030,829 628 658 1062 PPM ppmx 0.03 22,572,808 193,643,464 54,964 x 193,698,428 777 784 609 PPM 26 40 enc 0.15 aq 22,156,982 195,604,166 94,888 x 195,699,054 6843 6868 50 CM sbc 0.970r2 -ad -m3 -b63 22,470,539 197,066,203 99,094 xd 197,165,297 1733 313 224 BWT WinRAR 3.60b3 -mc7:128t+ -sfxWinCon.sfx 22,713,569 198,454,545 0 xd 198,454,545 506 415 128 PPM quark v0.95r beta -m1 -d25 -l8 22,988,924 198,600,023 80,264 x 198,680,287 27952 217 534 LZ77 bssc 0.95 alpha -b16383 23,117,061 201,810,709 45,489 x 201,856,198 578 217 140 BWT 4 M1 0.3b e8-m103b1-mh 23,456,037 207,931,967 23,150 s 207,955,117 383 412 33 CM 26 uharc 0.6b -mx -md32768 23,911,123 208,026,696 73,608 xd 208,100,304 1666 1330 50 PPM GRZipII 0.2.4 -b8m 23,846,878 208,993,966 41,645 s 209,035,641 312 216 58 BWT 4x4 0.2a 4t (grzip:m1:h18) 23,833,244 208,787,642 317,097 x 209,104,739 386 240 269 BWT rzm 0.07h 24,361,070 210,126,103 17,667 x 210,143,770 2336 81 160 ROLZ 50 pim 2.50 best 24,303,638 210,124,895 330,901 x 210,455,796 764 ~764 88 PPM CTW 0.1 -d6 -n16M -f16M 23,670,293 211,995,206 43,247 x 212,038,452 19221 19524 144 CM boa 0.58b -m15 24,322,643 213,845,481 55,813 x 213,901,294 3953 ~4100 17 PPM TarsaLZP Aug 8 2007 25,134,862 215,301,412 2,843 xd 215,304,255 249 287 341 LZP lzturbo 0.94 -59 -b100 -p0 24,763,542 217,342,694 152,254 x 217,494,948 5196 20 1450 LZ77 26 LZPXj 1.2h 9 25,205,783 217,880,584 4,853 s 217,885,437 783 717 1316 PPM scmppm 0.93.3 -l 9 25,198,832 217,867,392 37,043 s 217,904,435 708 644 20 PPM PX v1.0 24,971,871 219,091,398 3,054 s 219,094,452 1838 1809 66 CM 3 DGCA 1.10 default+SFX 25,203,248 219,655,072 0 xd 219,655,072 858 270 76 Squeez 5.20.4600 sqx2.0 32MB Ultra 25,118,441 220,004,873 91,019 xd 220,095,892 2575 116 365 60 fpaq2 25,287,775 221,242,386 3,429 s 221,245,815 20183 20186 131 CM dmc c 1800000000 25,320,517 222,605,607 2,220 s 222,607,827 676 721 1800 DMC flashzip 0.94 -m2 -s7 -b5 26,236,095 226,981,882 35,996 x 227,017,878 2451 87 132 ROLZ 26 balz 1.13 ex 26,421,416 228,337,644 49,024 x 228,286,668 3700 190 206 ROLZ lzpm 0.11 9 26,501,542 229,083,971 46,824 x 229,130,795 15395 57 740 ROLZ qazar 0.0pre5 -l7 -d9 -x7 26,455,170 229,846,871 71,959 x 229,918,830 5738 903 105 LZP qc 0.050 -8 26,763,343 232,784,501 46,100 x 232,830,601 8218 1503 151 ppms J -o5 26,310,248 233,442,414 16,467 x 233,458,881 330 354 1.8 PPM WinTurtle 1.60 512 MB buffer 28,379,612 245,217,944 160,090 x 245,378,034 273 237 583 PPM cabarc 1.00.0601 -m lzx:21 28,465,607 250,756,595 51,917 xd 250,808,853 1619 15 20 LZ77 70 sr3 28,926,691 253,031,980 5,611 x 253,037,591 130 146 68 SR bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129 8 BWT quad v1.11 -x 29,110,579 256,145,858 13,387 s 256,159,245 956 116 34 ROLZ WinACE -sfx -m5 -d4096 29,481,470 257,237,710 0 xd 257,237,710 1080 77 4 tornado 0.4a -11 30,157,610 258,761,459 42,516 s 258,803,975 783 25 1513 LZ77 sr3c 1.0 29,731,019 266,035,006 7,701 x 266,042,707 160 145 5 SR 26 lzc v0.08 10 30,611,315 266,565,255 11,364 x 266,576,619 302 63 550 LZ77 packet 0.90b -m4 -s9 31,208,752 273,176,127 32,305 x 273,208,432 3871 48 10 LZ77 bzp 0.2 31,563,865 283,908,295 36,808 x 283,945,103 110 120 3 LZP ha 0.98 a2 31,250,524 285,739,328 28,404 x 285,767,732 2010 1800 0.8 PPM 80 lcssr 0.2 -b7 -l9 34,549,048 296,160,661 8,802 x 296,169,463 8186 8281 1184 SR csc2 34,119,354 298,385,256 9,092 x 298,394,348 141 201 49 LZP 26 slug 1.27 35,093,954 309,201,454 6,809 x 309,208,263 32 28 14 ROLZ kzip May 13 2006 /b1024 35,016,649 310,188,783 29,184 xd 310,217,967 6063 62 121 LZ77 2 uc2 rev 3 pro -tst 35,384,822 312,767,652 123,031 x 312,890,683 360 63 4 LZ77 thor 0.95 e4 35,795,184 314,092,324 49,925 x 314,142,249 64 34 16 LZP gzip124hack 1.2.4 -9 36,273,716 321,050,648 62,653 x 321,113,301 149 19 1 LZ77 gzip 1.3.5 -9 36,445,248 322,591,995 38,801 x 322,630,796 101 17 1.6 LZ77 Info-ZIP 2.3.1 -9 36,445,373 322,592,120 57,583 x 322,649,703 104 35 0.1 LZ77 pkzip 2.0.4 -ex 36,556,552 323,403,526 29,184 xd 323,432,710 171 50 2.5 LZ77 90 jar (Java) 0.98-gcc cvfM 36,520,144 323,747,582 19,054 x 323,766,636 118 95 1.2 LZ77 PeaZip better, no integrity check 36,580,548 323,884,274 561,079 x 324,445,353 243 243 8 LZ77 20 lzgt3a 37,444,440 334,405,713 4,387 xd 334,410,100 1581 2886 2 LZ77 lzss 0.01 ex 38,254,303 337,565,308 44,555 x 337,609,863 9708 14 625 LZ77 lzuf Apr.15.2009 38,036,810 338,488,945 4,070 xd 338,493,015 446 40 2 LZ77 26 pucrunch -d -c0 39,199,165 350,265,471 34,359 s 350,299,830 2649 463 2 LZ77 lzop v1.01 -9 41,217,688 366,349,786 54,438 x 366,404,224 289 12 1.8 LZ77 lzw 0.2 41,960,994 367,633,910 671 s 367,634,581 3597 31 18 LZW arbc2z 38,756,037 379,054,068 6,255 sd 379,060,323 2659 2674 68 PPM xdelta 3.0u -9 44,288,463 389,302,725 107,985 x 389,410,710 1021 30 47 LZ77 100 srank 1.1 -C8 43,091,439 409,217,739 6,546 x 409,224,285 51 45 2 SR QuickLZ 1.30b (quick3) 46,378,438 410,633,262 44,202 x 410,677,464 48 12 3 LZ77 compress 4.3d 45,763,941 424,588,663 16,473 x 424,605,136 103 70 1.8 LZW BriefLZ 1.05 46,638,341 425,384,313 5,298 x 425,389,611 66 18 2 LZ77 lzrw3-a 48,009,194 438,253,704 4,750 x 438,258,454 38 17 2 LZ77 fcm1 45,402,225 447,305,681 1,116 s 447,306,797 228 261 1 CM1 runcoder1 46,883,939 458,125,932 5,488 s 458,131,420 140 156 4 o1 26 FastLZ Jun 12 2007 54,658,924 493,066,558 7,065 xd 493,073,623 18 13 1 LZ77 flzp v1 57,366,279 497,535,428 3,942 s 497,539,370 78 38 8 LZP fpaq0f2 56,916,872 558,645,708 3,066 x 558,648,769 222 207 0.4 o0 110 ppp 61,657,971 579,352,307 1,472 s 579,353,779 80 59 1 SR lzbw1 0.8 67,620,436 590,235,688 21,751 x 590,257,439 15 12 55 LZP 26 NTFS LZNT1 76,955,648 636,870,656 0 636,870,656 10 9 0.1 LZ77 26 shindlet_fs 62,890,267 637,390,277 1,275 xd 637,391,552 113 103 0.6 o0 arb255 63,501,996 644,561,595 4,871 sd 644,566,466 2551 2574 1.6 o0 compact 63,862,371 648,370,029 3,600 sd 648,373,629 216 164 0.2 o0 lzp2 74,358,722 655,709,055 5,855 xd 655,714,910 11 9 15 LZP 26 barf (2 passes) 76,074,327 758,482,743 983,782 s 759,466,525 756 53 4 LZ77 arb2x v20060602 99,642,909 995,674,993 3,433 sd 995,678,426 2616 2464 1.6 o0b
Compression Compressed size Decompressor Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- hipp 5819 /o8 20,555,951 (fails) 36,724 x 5570 5670 719 CM XMill 0.8 -w -P -9 -m800 26,579,004 (fails) 114,764 xd 616 530 800 PPM lzp3o2 33,041,439 (fails) 23,427 xd 230 270 151 LZP
Programs that properly decompress enwik8 and don't use external dictionaries are still eligible for the Hutter Prize.
Compression Compressed size Decompressor Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- rdmc 0.06b 33,181,612 1394 1381 DMC 6 ESP v1.92 36,651,292 223 LZ77 16
Pareto frontier: compressed size vs. compression time as of Aug. 18, 2008 from the main table (options for maximum compression).
Pareto frontier: compressed size vs. memory as of Aug. 18, 2008
(options for maximum compression).
I only test the latest supported version of a program. I attempt to find the
options that select the best compression, but will not generally do an exhausitve
search. If an option advertises maximum compression or memory, I don't try the alternatives.
If you know of a better combination, please let me know.
I will select the maximum memory setting that does not cause disk thrashing, usually about 1800 MB.
If the compressor is not downloadable as a zip file then I will compress the source or
executable (whichever archive is smaller) plus any other needed files (dictionaries) into a single zip
archive using 7zip 4.32 -tzip -mx=9.
If no executable is available I will attempt to compile in C or C++
(MinGW 3.4.2, Borland 5.5 or Digital Mars), Java 1.5.0, MASM, NASM, or gas.
1. Reported by Guillermo Gabrielli, May 16, 2006. Timed on a Celeron D325 2.53Ghz Windows XP SP2 256MB RAM. I have not verified results submitted by others. Timing information, when available,
may vary widely depending on the test machine used.
The numbers in the headings are the compression ratios on enwik9.
durilca and durilca'light 0.5 by Dmitry Shkarin
(Apr. 1, 2006) are closed source, experimental command line file compressors
based on ppmd/ppmonstr with filters for text,
exe, and data with fixed length records (wav, bmp, etc). durilca'light is a faster
version with less compression. Unfortunately both
crash on enwik9. Decompression is verified on enwik8.
The -m700 option selects 700 MB of memory. (It appears to use
substantially more for enwik9 according to Windows task manager).
-o12 selects PPM order 12 (optimal for enwik9 -t0).
-t0 (default) turns off text modeling, which hurts compression but is necessary
to compress enwik9 (although decompression still crashes). -t2(3) turns
on text preprocessing (dictionary; thus the increased decompressor size).
-t2 also supports 3 additive flags (4, 8, 16) which have no effect on this
data, thus -t2(31) or -t2 (default is 31) give the same compression as -t(3).
durilca 0.5(Hutter)
was released 1457Z Aug. 16, 2006. It does not use external dictionaries.
When run with 1 GB memory (-m700), -o13 is optimal. With 2 GB (-m1650), -o21 is optimal.
The unzipped .exe file is 86,016 bytes.
durilca4linux_1
(0825Z Aug 23 2006)
is a Linux version of durilca 0.5(Hutter) which successfully compresses enwik9 and
decompresses with UnDur
(23,375 bytes zipped, 42,065 bytes uncompressed). All
versions of durilca require memory specified by -m plus memory to read the input file
into memory. In Windows, this exceeds the 2 GB process limit regardless of available
RAM and swap. Thus, enwik9 compresses
only under Linux with 2 GB real memory and 1 GB additional swap.
The -o12 option is optimal for enwik9 (tested under 64 bit SuSE 10.0 by the author),
-o24 for enwik8 (verified by me under 64 bit Ubuntu 2.6.15).
durilca4linux_2
(Oct. 16, 2006)
is a closed source Linux version specialized for this benchmark.
It includes a warning that use on other files may cause data loss.
It requires AMD64 Linux and 3 GB of memory (2 GB for enwik8).
The decompressor files (EnWiki.dur and UnDur)
are contained within a 241,322 byte zip file in the rar distribution. To compress:
durilca4linux_3
(dictionary version v1)
was released Feb. 21, 2008. Like version 2, it requires extraction of EnWiki.dur
before compressing or decompressing, and may not work with files other than
enwik8 and enwik9. As tested, requires 64-bit Linux, 4 GB RAM, and 5 GB RAM+swap.
undur3 v2 contains
an improved dictionary (version v2), released Apr. 22, 2008,
for DURILCA4Linux_3. The compression
and decompression programs are the same. The decompression program UnDur (Linux
executable) is included. To compress, download durilca4linux_3 and replace the
dictionary (EnWiki.dur) with this one. The options are -m3600 (3600 MB memory),
-o14 (order 14 PPM), -t2 (text model 2).
undur3 v3,
released May 22, 2008,
uses an improved dictionary but the same compressor and decompressor as v1 and v2.
The dictionary contains 123,995 lowercase words separated by NUL bytes.
Of these, 5579 words occur more than once (wasted space?)
I tested options -m1500 under Ubuntu Linix with 2 GB memory.
At -m1500 top reports 2157 MB virtual memory and 1894 MB real memory. -m1600
caused disk thrashing.
durilca
kingsize (July 21, 2009) runs under 64 bit Windows and requires 13 GB memory.
It is designed to work only on this benchmark and not in general. The dictionary
file EnWiki.fsd must be extracted first from EnWiki.dur before compression or
decompression. Requires msvcr90.dll. enwik8 can be compressed with -m1200 (1.2 GB).
paq8hp12any is the top ranked program of the
PAQ series of context mixing compressors,
described below in chronological order. All can be found at this link, except as
noted. paq8hp* series compressors can also be found
here.
All programs are free, GPL open source, command line archivers. Most take a
single option controlling memory usage.
p5, p6, and p12 (Matt Mahoney, May 13, 2000) use a neural network
with 256K or 4M inputs, no hidden layer and a single output to predict
the next bit of input,
given hashes of various contexts to select active inputs. The output
is arithmetic coded. p5 uses 1 MB memory
and context orders 0 to 3. p6 uses 16 MB and orders 0-5. p12
uses 16 MB, orders 1-4 and word-level orders 0-1 as an optimization
for text. The programs take no options. The algorithm is described in
M. Mahoney,
Fast Text Compression with Neural Networks, Proc. AAAI FLAIRS, Orlando, 2000
(C) 2000, AAAI.
paq1 (Matt Mahoney, Jan. 6, 2001) replaces the neural network in p5, p6, p12
with a fixed weighted averaging
of model outputs. Described in an unpublished report, M. Mahoney,
The PAQ1
Data Compression Program, 2002.
paq6 (Matt Mahoney and Serge Osnach, Dec. 30, 2003) evolved as a series
of improvements to paq1. It is described in
M. Mahoney,
Adaptive Weighing of Context Models for Lossless Data Compression,
Florida Tech. Technical Report CS-2005-16, 2005. The most significant
improvements are replacing the fixed model weights with adaptive linear
mixing (Matt Mahoney), and SSE (secondary symbol estimation) postprocessing
on the output probability, and modeling of sparse contexts (Serge Osnach).
Other models were added for x86 executable code, and automatic detection
of fixed length records in binary data.
paqar 4.5 (Alexander Ratushnyak, Feb. 13, 2006)
is the last of a long series of improvements to paq6 by
Alexander Ratushnyak (paqar: multimixer model, .exe preprocessor, other model
improvements), Przemyslaw Skibinski (WRT text preprocessing), Berto Destasio (model tuning),
Fabio Buffoni (speed optimizations), David. A Scott
(arithmetic coder optimizations), Jason Schmidt (model improvements), and
Johan de Bock (compiler optimizations). For text, the biggest improvement was from
WRT (Word Reducing Transform),
which replaces words with shorter codes from an external English dictionary
to PAsQDa 1.0 on Jan. 18, 2005.
WRT is described in
P. Skibiński, Sz. Grabowski, and S. Deorowicz,
Revisiting
dictionary-based compression, Software - Practice & Experience, 35 (15),
pp. 1455-1476, December 2005.
There were a great number of versions by many contributors, mostly in 2004 when the PAQ
series moved to the top of most compression benchmarks and attracted interest.
Prior to PAQ, the top ranked programs were generally closed source.
paq8f (Matt Mahoney, Feb. 28, 2006) evolved from paq7 (Dec. 24, 2005) as a
complete rewrite of paq6/paqar. The important improvements were replacing the
adaptive linear mixing of models with a neural network (coded in MMX assembler),
a more memory-efficient mapping of contexts to bit histories using a cache-aligned
hash table, adaptive mapping of bit histories to probabilities,
and models for bmp, tiff, and jpeg images. It models text using whole-word
contexts and case folding, like all versions back to p12, but lacks WRT text
preprocessing. It served as a baseline for the Hutter prize. Details are
in the source code comments.
paq8g (Przemyslaw Skibinski, Mar. 3, 2006) adds back WRT text preprocessing.
paq8h (Alexander Ratushnyak, Mar. 24, 2006) added additional contexts
to the neural network mixer. It was top ranked on enwik9 (but not enwik8)
when the Hutter prize was launched on Aug. 6, 2006. This is the 78'th version
since p5.
raq8g by Rudi Cilibrasi,
released 0721Z Aug. 16, 2006, is a modification of paq8f. It adds
a NestModel to model nesting of parenthesis and brackets.
The test below for -7 is based on
a Windows compile, raq8g.exe.
The test for -8 was under Linux. The unzipped Linux executable is 27,660 bytes.
paq8hp1
(source code)
by Alexander Ratushnyak, 1945Z Aug. 21, 2006. It is a modification of paq8h
using a custom dictionary tuned to enwik8 for the Hutter prize. Because the
Hutter prize requires no external dictionaries, the dictionary is spliced into
the .exe file during the build process. When run, it creates
the dictionary as a temporary file. The program must be run in the current
directory (not in your PATH or with an explicit path), or else it can't find
this file. The unzipped paq8hp1.exe is 206,764 bytes.
Decompression was verified for enwik8 (60730 ns/b for -8, 60660 ns/b for -7).
enwik9 is pending.
paq8hp2
(source code)
by Alexander Ratushnyak, 0233Z Aug. 28, 2006 is an improved version of paq8hp1
submitted for the Hutter prize. paq8hp2.exe size is 205,276 bytes.
It differs from paq8hp1 mainly in that the 43K word dictionary for 2-3 byte codes is sorted alphabetically.
The 80 most frequent words, coded as 1 byte before compression, are grouped by syntactic type
(pronoun, preposition, etc).
paq8hp3
(source code)
by Alexander Ratushnyak, released Aug. 29, 2006 is an improved version of paq8hp2
submitted for the Hutter prize on Sept. 3, 2006.
The 80 dictionary words coded with 1 byte and 2560 words coded with 2 bytes
are organized into semantically related groups or by common suffixes.
The 40,960 words with 3 byte codes are sorted from the last character in reverse
alphabetical order. paq8hp3.exe is 178,468 bytes unzipped.
enwik9 decompression is not yet verified. For enwik8, decompression is verified
with time 60300 ns/b compression, 60220 ns/b decompression.
paq8hp4
(source code)
by Alexander Ratushnyak, released and submitted for the Hutter prize on
Sept. 10, 2006, is an improved version of paq8hp3.
The dictionary is further organized into semantically related groups among 3-byte codes.
The unzipped size of paq8hp4.exe is 206,336 bytes.
paq8hp5
(source code)
by Alexander Ratushnyak, released Sept. 20, 2006, is an improved version of paq8hp4,
submitted for the Hutter prize on Sept. 25, 2006.
The unzipped size of paq8hp5.exe is 174,616 bytes (in spite of a slightly larger dictionary).
The dictionary size is optimized for enwik8; a larger dictionary would improve compression
of enwik9. Decompression is verified for enwik8 only (-8 at 74640 ns/b).
A Linux port of paq8hp5 is by
Лъчезар Илиев Георгиев (Luchezar Georgiev), Oct 26, 2006
(mirror).
paq8hp6
(source code)
by Alexander Ratushnyak, released Oct. 29, 2006, is an improved version of paq8hp5.
It was submitted as a Hutter prize candidate on Nov. 6, 2006.
Unzipped paq8hp6.exe size is 170,400 bytes.
The -8 option was not tested on enwik9 due to disk thrashing on my 2 GB PC. Compression was about
25% finished after 9 hours.
paq8j by Bill Pettis,
Nov. 13, 2006, is based on paq8f (no dictionary) with model improvements taken
from paq8hp5. It is a general purpose compressor like paq8f, not specialized for text.
paq8ja.zip by Serge Osnach, Nov. 16, 2006, is an improvement
of paq8j, using additional contexts based on character classifications.
paq8jb.zip by Serge Osnach, Nov. 22, 2006, adds
contexts using the distance to an anchor byte (x00, space, newline, xff)
combined with previous characters.
The -8 test caused some minor disk thrashing at 2 GB
memory under WinXP Home (82% CPU usage). Time reported is wall time.
paq8jc.zip by Serge Osnach, Nov. 28, 2006, improves the
record model for better compression of some binary files, although it is
slightly worse for text. Time for -8 is wall time at 72% CPU usage.
paq8hp7a
by Alexander Ratushnyak, Dec. 7, 2006, was intended to supercede
paq8hp6 as a Hutter prize entry,
then was withdrawn on Dec. 10, 2006 with the release of paq8hp7.
Unzipped executable size is 151,664 bytes. -8 for enwik9 (but not enwik8) caused
disk thrashing on my computer (2 GB, WinXP).
paq8hp7
(source code) by
Alexander Ratushnyak, Dec. 10, 2006, as a Hutter prize entry.
Unzipped paq8hp7.exe size is 152,556 bytes.
paq8jd by
Bill Pettis,
Dec. 30, 2006, improves on paq8j with additional SSE (APM) stages.
enwik8 -8 caused some disk thrashing at 2 GB memory.
paq8hp8
(source code)
by Alexander Rasushnyak, Jan. 18, 2007, as a Hutter prize entry
(replacing an incorrect version posted 2 days earlier).
Unzipped size is 152,692 bytes. The dictionary is identical to paq8hp7.
paq8k is by
Bill Pettis, Feb. 13, 2007.
paq8hp9
(mirror)
(source code)
by Alexander Ratushnyak, Feb. 20, 2007, is a Hutter prize entry.
Only the -7 option works.
The unzipped size of paq8hp9.exe is 112,628 bytes.
paq8hp9any
(Feb. 23, 2007) by Alexander Ratushnyak
is a paq8hp9 -7 compatible version with external dictionary where all options work.
However the zipped program is larger and -8 was not tested due to disk thrashing,
so results are unchanged.
paq8l by
Matt Mahoney, Mar. 8, 2007, is based on paq8jd. It adds a DMC model
and minor improvements.
paq8hp10
(mirror), Mar. 26, 2007,
by Alexander Ratushnyak was derived from paq8hp9 as a Hutter prize entry.
The unzipped size is 103,224 bytes. Only the -7 option works.
paq8hp10any,
(source code),
Mar. 31, 2007, by Alexander Ratushnyak is archive compatible with paq8hp10 -7 but
works with other memory options. When run, paq8hp10.exe and both dictionary
files should be in the current directory. This program is not a Hutter prize entry.
paq8hp11
(mirror)
by Alexander Ratushnyak, Apr. 30, 2007, is a Hutter prize entry.
paq8hp11.exe is 99,816 bytes. Like paq8hp10, it works only with the -7 option.
paq8hp11any
(source code)
by Alexander Ratushnyak, May 2, 2007, is a paq8hp11 variant
that accepts any memory option. It was optimized for
speed rather than size. It includes two dictionary files which must
be present in the current directory when run, unlike paq8hp11 where the
dictionary is self extracted. -8 selects 1850 MB memory. -7 produces
the same archive as paq8hp11. Run speeds for -8 enwik8 are 76770+76820 ns/B.
paq8hp12
(mirror)
by Alexander Ratushnyak, May 14, 2007, is a Hutter prize entry.
paq8hp12.exe size is 99,696 bytes. It works only with the -7 option like paq8hp11.
paq8hp12any
(source code)
by Alexander Ratushnyak, May 20, 2007, is a paq8hp12 variant that accepts
any memory option (like paq8hp11any). The -7 option produces an archive
identical to that of paq8hp12.
paq8hp12any was
updated
(mirror)
(mirror)
on Jan. 9, 2009
to fix a compiler issue and add a 64 bit Linux version.
Compressed file format was not changed. It was not retested.
paq8fthis2
by Jan Ondrus, Aug. 12, 2007, is paq8f with an improved model for compressing JPEG
images. It is otherwise archive compatible with paq8f for data without JPEG images (such as
enwik8 and enwik9).
paq8n by Matt Mahoney,
Aug. 18, 2007, combines paq8l with the JPEG model from paq8fthis2.
paq8o and paq8osse by
Andreas Morphis, Aug 22 2007, is paq8n with an improved model for .bmp images.
There are two executables that produce identical archives. paq8o.exe is for
Pentium MMX or higher. paq8osse.exe is for newer processors that support SSE2 instructions
like the Pentium 4. It is about 8% faster, but uses more memory.
Both use the same C++ source but use
different (but equivalent) assembler code to implement the neural network mixer.
paq8osse.exe was compiled with Intel C++, which produces slightly faster executables than
g++ used in earlier versions. The current version is
paq8o ver. 2 (Aug. 24, 2007),
which fixes the file name extension (was .paq8n) but does not change compression.
The benchmark is based on the first version.
paq8o3 by KZ, Sept. 11, 2007,
combines paq8o with an improved JPEG model from paq8fthis3 (Jan Ondrus, Sept. 8, 2007)
and an improved model for grayscale PGM images from paq8i
(Pavel Holoborodko, Aug. 18, 2006). Text compression is unchanged from paq8l, paq8m,
paq8o, or paq8o2.
paq8o4 v1 by KZ, Sept. 15, 2007,
includes a grayscale .bmp model (based on the grayscale PGM model). Text compression
is unaffected. It was compiled with Intel C++.
paq8o4 v2 by
Matt Mahoney, Sept. 17, 2007,
is a port to g++ which allows wildcards, directory traversal, and directory creation,
but is 8% slower. It is archive compatible with v1.
paq8o6 by KZ,
Sept. 28, 2007, is based
on paq8o5 by KZ,
Sept. 21, 2007 with the improved JPEG model from
paq8fthis4
by Jan Ondrus, Sept. 27, 2007. paq8o5 is paq8o4 with an improved StateMap
from lpaq1. The improved compression of enwik8 comes from this StateMap.
Compression of enwik8 is unchanged from paq8o5 to paq8o6.
paq8o7 by
KZ, Oct. 16, 2007, improves paq8o6 with improved JPEG compression and support
for 4 and 8 bit BMP images. Text is not affected.
paq8o8 by
KZ, Oct. 23, 2007, improves paq8o7 with improved JPEG compression further.
paq8o8-jun7
is a DOS port of paq8o8 by Rugxulo, June 7, 2008.
paq8o10t
is by KZ, June 11, 2008.
Discussion.
decomp8 is a Hutter
Prize entry by Alexander Ratushnyak, Mar. 23, 2009. It consists of a
decompressor (Windows executable only) and an archive (archive8.bin) which
decompresses to enwik8. There is no compressor. During decompression, the
program creates a temporary file containing a dictionary similar to the one
used in paq8hp12. The command to decompress is "decomp8 archive8.bin enwik8".
The total size (not zipped) is 15,986,677 bytes.
paq8p3 is
by KZ, Apr. 19, 2009.
paq8p3 v2 is
by KZ, Apr. 21, 2009.
decomp8b is
an update to the Hutter prize entry
decomp8 by Alexander Ratushnyak, Apr. 22, 2009. Total size
(not zipped) is 15,958,674 bytes.
decmprs8 is
an update to the Hutter prize entry
decomp8b by Alexander Ratushyak, May 23, 2009. Total size
(not zipped) is 15,949,688 bytes. To decompress: decmprs8.exe archive8.dat enwik8
Options select memory usage as shown in the table. Early versions took no options.
paq8hp1 through paq8hp12 can be used as a preprocessor to other compressors
by compressing with option -0. In the following tests on ppmonstr, options were tuned
for the best possible compression of enwik8 with 2 GB memory (1.65 GB available under WinXP).
The xml-wrt 2.0 options are -l0 -w -s -c -b255 -m100 -e2300 (level 0, turn off word containers,
turn off space modeling, turn off containers, 255 MB buffer for dictionary, 100 MB buffer,
2300 word dictionary).
The xml-wrt 3.0 options are -l0 -b255 -m255 -3 -s -e7000 (-3 = optimize for PPM).
xml-wrt prepends the dictionary to its output.
To make the comparison fair, the compressed size of the dictionary
must be added. This is done in two ways, first by compressing the preprocessed text
and dictionary and adding the compressed sizes,
and second by prepending the dictionary to the preprocessed
text before compression. The first method compresses about 1-2 KB smaller.
The uncompressed size of each dictionary for paq8hp1 through paq8hp4
is 398,210 bytes. They contain
identical words, but in different order. The first two dictionaries are identical.
They compress smaller because they are sorted alphabetically.
The dictionary for paq8hp5 is 411,681 bytes. It contains all of the words in
the first 4 dictionaries plus 1280 new words (44,880 total).
The transform done by paq8hp1 through paq8hp5
is based on WRT by Przemyslaw Skibinski, which first appeared
in PAsQDa and paqar, and later in paq8g and xml-wrt. The steps are as follows:
lpaq versions 1 through 8 may be downloaded here.
lpaq9* can be downloaded here.
lpaq1 is a free,
open source (GPL) file compressor by Matt Mahoney, July 24, 2007. It uses context mixing.
It is a "lite" version of paq8l, about 35 times faster at the cost of about
10% in compression. The "9" option selects maximum memory. The options
range from 0 (6 MB) to 9 (1.5 GB). Memory usage is 3 + 3*2N MB,
N = 0..9.
The compressor mixes 7 contexts: orders 1, 2, 3, 4, 6, a unigram word context
(consecutive letters, case insensitive), and a matched bit context. The contexts
(except the matched bit) are mapped to nonstationary bit histories using
nibble-aligned hash tables, then mapped to bit prediction
probabilities using stationary adaptive tables with bit counts to control adaptation rate.
The matched bit context maps the predicted bit (based on a context match),
match length and order-1 context (or order 0 if no match) to a bit prediction.
The probabilities are combined
in the logistic domain (log(p/(1-p)) using a single layer neural network selected
by a small context (3 high bits of last byte + context order), then passed through
2 SSE stages (orders 0 and 1) and arithmetic coded. Except for one model for
ASCII text, there are no specialized models for binary data, .exe, .bmp, .jpeg, etc.
lpaq2 by
Alexander Ratushnyak, Sept. 20, 2007, contains some speed optimizations.
lprepaq 1.2 by Christian Schnaader, Sept. 29, 2007,
is lpaq1 combined with precomp as a preprocessor. precomp compresses JPEG files
and also expands data segments compressed with zlib, often making them more
compressible. This preprocessing has no effect on text files.
lpaq3 and elpaq3 by
Alexander Ratushnyak, Sept. 29, 2007, has two versions with the same source
code. When compiled with
-DWIKI, the result is elpaq3 which is tuned for large text files. The normal
compile produces lpaq3.
lpaq3a by
Alexander Ratushnyak, Sept. 30, 2007, improves compression on some files
over lpaq3 (but not enwik8/9). The archive also contains lpaq3e.exe, which is
an archive compatible (Intel compile) of elpaq3.exe.
lpaq4 and lpaq4e
(mirror)
are by Alexander Ratushnyak, Oct. 1, 2007. lpaq4e is tuned for large text files.
lpaq5 and lpaq5e
are by Alexander Ratushnyak, Oct. 16, 2007. Option 9 selects 1542 MB memory.
lpaq5e is tuned for large text files. It includes separate programs
for compression only (lpaq5e-c.exe) and decompression only (lpaq5e-d.exe).
Tests were done with these programs, rather
than the version that does both (lpaq5e.exe).
lpaq6 and lpaq6e
are by Alexander Ratushnyak, Oct. 22, 2007. Option 9 selects 1542 MB memory.
lpaq6e is tuned for large text files. lpaq6 includes a E8E9 transform for
compressing x86 executables.
lpaq7 and lpaq7e
(mirror)
are by Alexander Ratushnyak, Oct. 31, 2007.
lpaq8 and lpaq8e
are by Alexander Ratushnyak, Dec. 10, 2007. The executables are packed with upack.
zip -9 would make them larger.
lpaq1a by
Matt Mahoney, Dec. 21, 2007, uses the same model as lpaq1 but replaces the
arithmetic coder with the asymmetric binary coder from fpaqb.
lpq1 by
Matt Mahoney, Dec. 23, 2007, is an archiver (not a file compressor) based
on lpaq1 option 7.
drt|lpaq9e
(mirror) is by
Alexander Ratushnyak, Feb. 20, 2008. It is specialized for English text.
It includes a separate program drt.exe (without source code) which performs
a dictionary transform prior to compression with lpaq9e. The option 9 is
for lpaq9e which selects maximum memory. The program size is computed by adding
lpaq9e.exe, drt.exe, and the compressed dictionary, which must be uncompressed
with lpaq9e before running. The size is smaller without a zip archive.
Decompression consists of uncompressing the dictionary with lpaq9e,
uncompressing the transformed file with lpaq9e, and reversing the transform
with drt. Run times are for the sum of all three operations
(1+62+2943, 1+2929+45 sec).
lpaq9f by
Alexander Rasushnyak, Apr. 27, 2007, works like lpaq9e. Run times are
(2+55+2801, 2+2819+38 sec). drt uses 8 MB for compression and 4 MB
for decompression.
lpaq9g by
Alexander Rasushnyak, May 23, 2008, works like lpaq9e. Run times are
(2+51+2691, 2+2682+38 sec).
lpaq9h by
Alexander Rasushnyak, June 3, 2008, works like lpaq9e. Run times are
(2+53+2530, 2+2529+44 sec).
lpaq9i by
Alexander Rasushnyak, June 13, 2008, works like lpaq9e. Run times are
(2+59+2425, 2+2453+46 sec). drt.exe and the dictionary file
(tmpdict0.dic) are unchanged in all versions starting with lpaq9f.
lpaq9j
(mirror)
by Alexander Ratushnyak, Aug. 17, 2008, has a new version of drt.exe and
dictionary. Run times are (2+58+2365, 2+2358+48 sec).
lpaq9k
(mirror)
is by Alexander Ratushnyak, Sept. 30, 2008. Run times are (2+59+2336,
2+2346+47 sec). Decompressor size is as 3 files (not zipped).
lpaq9l
(mirror)
is by Alexander Ratushnyak, Dec. 2, 2008. Run times are (2+41+2132,
2+2179+40 sec) on the computer described in note 26, and
(2+58+2338, 2+2422+50) on the computer used to test all the earlier versions.
Decompressor size is as 3 files (not zipped).
lpaq9m
(mirror)
is by Alexander Ratushnyak, Feb. 20, 2009. Run times are
(2+38+2067, 2+2111+38). Decompressor size is 3 files (not zipped).
drt may be combined with other compressors to improve compression.
The following were obtained using drt and tmpdict0.dic (from lpaq9i)
with ppmonstr J (PPM). Option -m1650 selects 1650 MB memory. -r1 partially
rebuilds the model when memory is exhausted. -o select the PPM model order.
Compression time is for ppmonstr only. Mem8 is actual memory used to compress
enwik8.drt. enwik9.drt always uses 1650 MB. As a separate compressor, the
compressor size would be 147,915 for a zip file containing drt.exe, ppmonstr.exe,
and tmpdict0.pmm (tmpdict0.dic compressed with ppmonstr -m1650 -r1 -o64).
Total size would be 148,047,289.
For drt 9j, the decompressor size is 149,468 and total size is 147,196,757.
xml-wrt 2.0 and higher and xwrt 3.2
can be used as either a standalone compressor or as a preprocessor to other compressors.
The table below shows the best known settings for enwik9 and enwik8 for xml-wrt 3.0 and 2.0 as
a preprocessor to ppmonstr var. J, the best known combination for which xml-wrt improves compression.
xml-wrt 1.0 is a preprocessor only.
See also xml-wrt and xwrt as a standalone compressor.
xml-wrt 1.0
(XML Word Reducing Transform)
is a free command line single file preprocessor with source code
by Przemyslaw Skibinski, May 10, 2006.
It is not intended to compress files by itself (although it does somewhat).
Rather, it is intended to improve the compressibility of text and XML files by replacing
common words and XML substrings with shorter symbols. (So it is actually LZW with a
static dictionary prepended to the output).
It improves compression for most programs except for those
that already have English text models such as paq8h. Some additional results
are shown below for combinations with some other compressors.
The following table shows the compressed size (without decompressor
except SFX) of enwik8 before and after the XML-WRT transform with option -f180
for several compressors. A ratio less than 1 means that XML-WRT improves compression.
The -f option (default -f6) selects the minimum word frequency required to have it
added to the dictionary. The optimal setting depends on the input size. When used
with ppmd or ppmonstr (the best compressors improved by XML-WRT), the optimal settings
are about -f180 for enwik8 and -f1800 for enwik9, which results in a dictionary of 7697
words for enwik8 and 6657 words for enwik9.
The following table shows the effect of the -f and -o options for ppmonstr -m800 enwik9.
The best combination found is -f1800 -o8.
The following table shows that the optimal setting for -f is lower for smaller files
(with ppmd):
The default values of -s (disable spaces model) and -t (disable try smaller word)
appear to work best on this data.
xml-wrt 2.0
released June 14, 2006 (updated June 19, 2006)
has additional transform options, and also includes LZ77 (zlib)
and LZMA (LZ with arithmetic coding) compression. When used as a preprocessor,
this compression is turned off. enwik9 was compressed using the options:
The option -l0 turns off compression. -w turns off word containers. -s turns off
space modeling (this hurts compression in version 1.0 but helps in 2.0). -c turns
off word and number containers (independent of -w and -n. -n hurts compression).
-b255 sets memory for the dictionary to 255 MB, the maximum. -m100 sets the
memory buffer to 100 MB, which is not maximum (255 MB), but larger values hurt
compression. -e10000 sets the dictionary size to 10000 words. (The dictionary
size can also be controlled with -f as in version 1.0, but using -e is less dependent
on input size so it helps with enwik8). Additional tests showing the effects of -e, -m, and -o:
The optimal values of -w -c -s -n (turn off number containers) and
-t (turn off try shorter words) was determined on enwik7 and enwik8
but not tested on enwik9.
A bug fix for LZMA compression, released June 19, 2006, does not change any
values for the June 14, 2006 version (using the -l0 option).
However the compressed source code increases from 25,290 bytes to 25,354 bytes.
The June 14 version is no longer published. The URL is unchanged.
xml-wrt 3.0
(Sept. 14, 2006) option -3 means to optimize the default settings for PPM compressors.
Version 3.0 also has a FastPAQ8 compressor for standalone compression
which was tested separately.
xwrt 3.2 (see below) with ppmonstr J has the following results.
ppmonstr option -o64 is optimal for enwik8, but -o10 is optimal for enwik9.
-m1650 selects 1650 MB memory.
xwrt option -2 optimizes for PPM. -b255 selects buffer size 255 MB for building
the dictionary. -m255 selects 255 MB memory buffer. -s turns off space modeling.
-f64 sets minimum word frequency for the dictionary to 64. Program size and
times are xwrt + ppmonstr. Memory usage is 512 MB for xwrt, 1650 MB for ppmonstr.
xml-wrt 2.0
is a free command line file compressor with source available, by Przemyslaw Skibinski,
June 19, 2006. It uses LZMA (LZ77 + arithmetic coding) with preprocessing for modeing text,
XML tags, dates, and numbers. It may also be used as a preprocessor for input
to other compressors. Version 1.0 was strictly a preprocessor without built-in compression.
The -l6 option selects maximum LZMA compression. -b255 selects maximum buffer
size of 255 MB for building a dynamic dictionary. -m255 selects maximum memory.
-s turns off spaces modeling. -f8 sets the minimum word frequency for dictionary
inclusion to 8 (default is 6).
xml-wrt 3.0
(Sept. 14, 2006)
includes a stripped-down version of PAQ8 (-l11 option) in addition to LZMA compression.
xwrt 3.2
(Oct. 29, 2007) is a dictionary preprocessor frontend to LZMA, PPMVC and lpaq6 as
well as a standalone preprocessor. Option -l14 selects lpaq6 option 9 (1542 MB).
-b255 selects 255 MB memory (maximum) for building the dictionary. -m96 selects
96 MB buffer during compression. (Higher values cause out of memory error).
-s turns of space modeling. -e40000 limits the dictionary size to 40000 words.
-f200 limits the dictionary to words that occur at least 200 times.
nanozip 0.01a is a free, experimental,
closed source GUI and command line archiver by Sami Runsas, July 14, 2008.
For these tests, the command line version (smaller executable) was used. It compresses
using several algorithms (fastest to best): LZP (options -cf and -cF), LZ77
(-cd, -cD), BWT (-co, -cO, uses 5N block size)
and CM (-cc). The uppercase options (-cF, -cD, -cO) compress
better but slower than the corresponding lowercase options and may use more memory.
The default compression mode is -co (fast BWT).
-m1500m selects 1500 MB memory, although the reported memory usage may differ and
the actual memory usage (Cmem, Dmem, in MB)
measured with Task Manager is usually lower than reported.
The program will use less memory depending on available physical memory when run.
-forcemem was used to override this.
For all tests, -nm was used to turn off checksums and not store timestamps or file
permissions. For -cO, the program uses a LZ77 variant (called LZT)
instead of BWT for binary files. -txt is an optimization for text files with -co or -cO.
nanozip 0.03a was released July 31, 2008. Only -cc was tested.
nanozip 0.05a was released Oct. 20, 2008. Options are as in 0.01a and include
-nm -forcemem.
nanozip 0.06a was released Feb. 13, 2009. Options are as in 0.01a and include
-nm -forcemem. w32c creates a self extracting archive (.exe file).
WinRK 3.0.3 is a commercial
GUI archiver by Malcolm Taylor
(Mar. 6, 2006). It is top ranked on some benchmarks.
Unfortunately it is not available for free download (as of May 16, 2006). The
"free trial" expires as soon as you install it.
(Update, Sept. 11, 2006: versions 3.0.2 and 3.0.3 are no longer available for download.
They appear to have been withdrawn last month).
WinRK in PWCM mode (Paq Weighted Context
Modeling) is based on the paq7/8 algorithm with text dictionary preprocessing
and specialized models for wav, bmp, and exe files. Version 3.0.2 was based on
the earlier paq6 algorithm which uses adaptive linear model mixing rather than
a neural network which mixes bitwise predictions from models
in the logistic (log p/(1-p)) domain. The +td and -td options turns English dictionary
preprocessing on or off respectively. 800MB selects the memory limit. When not
specified, PWCM appears to allocate all available memory except leaving 8 MB.
RK and RKC are predecessors of WinRK so I don't plan to test them.
ppmonstr, ppmd, and ppms var. J are
free command line file compressors by Dmitry Shkarin (model) and
Dmitry Subbotin (range coder), Feb. 16, 2006. (ppms on Feb. 21, 2006).
ppmonstr is a slower, experimental version of ppmd with better compression.
Source code is available for ppms and ppmd but not ppmonstr.
ppms is a small memory (1 MB) version of ppmd.
They all use PPMII (PPM with information inheritance). The -m256
option selects 256 MB memory (maximum for ppmd). The -o10 option selects
PPM order 10. (Higher orders use up memory faster which hurts
compression). When ppmd runs out of memory, it discards the
model and starts over. The -r1 option (default in ppmonstr)
tells ppmd to back up and partially rebuild the model before resuming compression.
The default options for ppmd are -m10 -o4 -r0 which are designed for reasonably
good compression with high speed and low memory usage (see table below).
ppms accepts only options -o2 through -o8. The default is -o5. This also gives
the best compression on enwik8. Task Manager shows 1.8 MB memory used.
ppmd was updated to J1 on May 10, 2006 to fix a bug. Compression benchmarks are unchanged
except the size of the compressor (11,099 bytes as zipped source code).
ppmonstr is unchanged.
slim 23d is a free, closed source command line
archiver by Serge Voskoboynikov, Sept 21, 2004. It uses a PPMII core
(ppmd/ppmonstr) by Dmitry Shkarin with filters for special file types including text.
The -m700 option selects 700 MB of memory. (I found -m800 causes
disk thrashing at 1 GB). The -o10 option selects order 10 PPM. (-o12 and -o16
caused slim to fail on enwik9, creating an empty archive and exiting after about 60% completion with 1 GB.
Smaller files were OK. There was no error with 2 GB).
As with other PPM compressors (ppmd, ppmonstr), using a higher order improves
compression but consumes memory faster. For enwik8, -o32 is optimal with 700MB available,
but lower orders are better for enwik9.
bwmonstr 0.01 was released Mar. 18, 2009.
bwmonstr 0.02 was released July 8, 2009. It uses a compressed representation internally,
thus memory usage is less than the 1 GB block size. It compresses the entire input file in
a single block and will fail if there is not enough memory. The program is multi-threaded
even on a single block. Times shown are for a single core processor, but would be faster on
a multi-core processor.
reorder2 is an alphabet reordering program by Eugene Shelwien.
drt is the dictionary preprocessor from lpaq9m by Alexander Ratushnyak
The m1000 command selects 1000 MB block size. Thus, enwik9 is suffix sorted in one block.
This is accomplished by sorting 16 smaller blocks, writing the pointers to 4 GB
of temporary files, and merging them. The inverse transform is done in memory without
building a linked list. Rather, the next position is found by looking up the
approximate location in an index of size n/16 and finding the exact location by
linear search.
bbb.exe Win32 executable
compiled with MinGW g++ 3.4.2 and UPX 1.24w.
bbb Linux executable, supplied by
Phil Carmody (Aug. 31, 2006). Compiled with g++-4.1 -Wall -O2 -o bbb bbb.cpp; strip bbb
bbb has a faster mode for both compression and decompression that does a "normal"
BWT using 5x blocksize in memory. Output format is the same for fast and slow mode
for both compression and decompression. A file compressed in fast mode can be
decompressed in slow mode on another computer with less memory, and vice versa.
The mode has no effect on the compressed file contents.
Recommended usage for best compression: For files smaller than 20% of available
memory, use fast mode and one block. For example, if you have 1 GB memory (800 MB
available under Windows) and foo is 100 MB:
bbb results by block size are shown below.
Gain is the compression improvement obtained by using a larger block size.
Gain(blocksize) is defined as C(blocksize/10)/C(blocksize) - 1 where
C(x) means the compressed size of enwik9 with block size x.
Compression times are fast modes for block sizes 10 through 108
and slow mode for 109 on a 2.2 GHz Athlon-64 with 2 GB memory under WinXP Home SP2.
uda 0.300 is a free, experimental
file compressor by dwing, July 16, 2006. It is a modification of PAQ8H with optimizations
for speed. It takes no options. The decompressor size is for uda.exe, since this is smaller
than the corresponding zip file.
nanozipltcb is a free file compressor
by Sami Runsas, July 25, 2008. It uses BWT. It takes no options. It is a customized version of
nanozip, similar to -cO -txt -m1700m, but
tuned to this benchmark. Files compressed with
nanozipltcb are not compatible with nanozip.
bcm 0.03
(discussion) is a free
command line compressor by Ilia Muraviev, Feb. 9, 2009. It uses BWT with a fixed
block size of 32 MB and an order 0 CM back end. It takes no command line options.
bcm 0.04
(discusion) was released
Feb. 11, 2009. It increases the block size to 64 MB and has modeling improvements
including interpolated SSE.
bcm 0.05
(discussion)
was released Mar. 5, 2009. The option -b327680 selects 327680 KB block size. It uses
5x block size memory.
bcm 0.07
(discussion)
was released Mar. 15, 2009.
bcm 0.08
(discussion)
was released May 31, 2009. The command e370 means to use a block size of 370 MB.
Memory usage is 5 times block size. Larger values gave an "out of memory" error
under 32 bit Windows Vista with 3 GB memory.
reorder v2
(discussion)
is an alphabet reordering preprocessor for BWT compressors by Eugene Shelwien,
May 26, 2009.
xlt
is a pair of 256 byte files that defines the alphabet permutation used
by reorder, released June 4, 2009 by Eugene Shelwien.
cmm1 is a free,
open source (GPL) file compressor by Christopher Mattern, Sept. 18, 2007.
It uses context mixing with LZP preprocessing.
cmm2
was released Dec. 10, 2007 without source code.
cmm2 080113
was released Jan. 13, 2008 without source code.
cmm3 080207
(test release) was released Feb. 7, 2008 without source code.
cmm4 v0.0
(test release) was released Mar. 14, 2008 without source code.
cmm4 v0.1e
was released Apr. 20, 2008 without source code. It takes a 2 digit option "wm"
(e.g. 96 meaning w=9, m=6). Memory usage is 2w MB for a sliding
window, and 12*2m MB for a context mixing model
(order 1,2,3,4,6). On my machine m=7 caused disk thrashing.
Description by the author:
CMM4 0.1e Is a variable order context mixing coder, it predicts using
the four "highest" (ranking: 643210) models in each bit coding step and,
in addition, the match model input. Orders 0 and 1 are implemented using
a table lookup, all higher orders use nibble based hashing. Matches are
found using order 4 and 6 LZP, the pointers and a quick exclusion hash
are stored within the model's hashing tables. The mixer joins the 4 (or
5 in presence of a match model) predictions and outputs them to a SSE
stage. A mixer (similar to (L)PAQ) is selected based on the last byte's
4 MSBs and on the coding order. The SSE context is made of an order 0
context and qunatized combination of the previous symbol rank, the match
length and partially matched symbol. This results in a notable
compression increase on redundant data. The model's counters are
quantized using the PAQ's state machine since CMM4 (will be replaced).
Despite the use of hashing most data structures are tuned to never cross
a cache line per nibble (the models) or octet (the mixer) (only SSE
does). The core compression performance is equivalent to LPAQ1/2, while
being faster. In addition there's a filter framework, which currently
implements an x86 transform and will be extended.
ccm 1.1.1a
(Feb. 23, 2007) has only one version.
ccm 1.1.2a
(Mar. 2, 2007) includes a ccm_low version using less memory, which was not tested.
ccm 1.20a
(Mar. 21, 2007) has only one version.
ccm 1.20d (Apr. 8, 2007)
has two versions: ccm using 99MB memory and ccmx using 210 MB for better
compression. Only ccmx was tested.
ccm 1.21
(mirror)
(Apr. 22, 2007)
includes an option to select memory usage. 7 selects maximum memory, 1300 MB.
Only the high compression version (ccmx) was tested.
ccm 1.30
(mirror)
was released Jan. 7, 2008. Only ccmx 7 (high compression version,
maximum memory) was tested.
bit 0.1is a free, closed
source file compressor by Osman Turan, Dec. 19, 2007. It uses ROLZ optimized
for binary files. It takes no options.
bit 0.2b is an archiver,
released June 14, 2008.
Option -m lwcm selects the compression type (lightweight context mixint).
This is the only type supported. Option -mem 9 selects maximum memory.
This option ranges from 0 to 9 and uses 3 + 2opt MB memory.
The program uses order 1, 2, 3, 4, and 6 context mixing with 2 SSE stages
as discussed here.
Comments by author:
LWCX (Light-Weight Context Mixing) is a codec of BIT Archiver.
It's designed for getting high compression ratio with acceptable speed
(Not enough fast currently). LWCX is a bit-wise context mixing schema which
tries to mix order-n models (order 012346). The statistics are gathered by
the counters which predict next bit by semi-stationary update rule. After
gathering the predictions from all models, a neural network (similar to PAQ's
neural network) tries to output a new mixed prediction. The mixed prediction
is processed by a 2D SSE stage which have 32 vertices. Finally, a carryless
arithmetic coder codes the given bit with final prediction.
Most of data structures are designed for avoiding cache misses. Order-0
and order-1 models' statistics stored in a direct lookup table. Higher orders
(order 2346) models' statistics stored in a large hash table. Hash table size
can be selected by "-mem N" option (memory usage is 3+2^(N+1) MB, N ranges
0 to 9). The codec locates a hash entry per only coding nibble.
bit 0.7 has options
-p=1 through -p=5 to select memory usage of 10 + 20*2p MB.
mcomp
x32 v2.00 is a free, closed source,
command line file
compressor by Malcolm Taylor (author of WinRK), released Aug. 23, 2008. It uses a large
number of algorithms, although not the same ones as WinRK. There is a 32 bit version
(mcomp_x32.exe) and a 64 bit version (mcomp_x64.exe) for Windows. Only the 32 bit
version was tested (in 32-bit Vista). It displays the following help message:
pofile(s) means input file and output file. When run with no compression options, the
program decompresses. Test results are as follows on a dual core 2 GHz Pentium T3200 with 3 GB
as in note 26.
-mb produces bzip2 compatible format. -M has no effect. Memory usage is fixed at 4 MB.
-mc uses DMC. If memory is greater than -M512, then the program aborts with an assertion failed.
-md and -md64 are supposed to generate deflate and deflate64 formats (zip or gzip). However
-mdf and -md64f (fast modes) crash immediately during compression. The other modes decompress to
files that are the correct size but not identical to the original. Run times are very slow due
to most of the CPU time spent in the kernel (up to 90%) as reported by timer 3.01.
-mp used PPMD var. J, but allows more memory (up to about 1800 MB). The original program
was limited to 256 MB. The optimal orders are different for enwik8 and enwik9.
Higher orders help compression, but lower orders save memory on larger files. The
maximum order is -o16. Higher values have no effect.
Decompression is slow due to 55% of the CPU time spent in the kernel. Normally this
is around 1% and decompression speed would be the same as compression.
-msl and -msm ignore the -M option and use 1 MB memory, resulting in poor compression.
-mw (experimental BWT) is the only option that uses both cores. All others result
in 50% CPU usage on a 2 core processor. The -M option actually
selects the block size, not total memory usage. Memory usage is 5x block size if one core is used,
or 10x if both are used. Both are used only if enough memory is available. The default is to
split the file in half and compress the two halves in parallel. However, better but slower compression
can be obtained by using -M to select one block for the whole file. Maximum memory is 2 GB, even
if more is available. For enwik9, -M320 selects 3 blocks, which are compressed in series on one core.
For two cores, time reported is wall time.
Process time for -mw -M320m is 187% of wall time for compression and 139% for decompression.
epmopt + epm r9 is an experimental,
closed source
command line optimizer and file compressor by Serge Osnach, Oct. 16, 2003. It was
intended for enc r16, but development on that project has stopped at enc r15, according
to the web page (in Russian). The program has two parts: epm, a
PPM compressor with text preprocessing, and epmopt, which attempts to optimize
the parameters to epm by compressing repeatedly and varying the options one at a
time until there is no more improvement. The input to epmopt may be different
than epm, and supports optimization on sets of files matching patterns in
specified sets of directories. The options to epm are memory limit, PPM order,
and 20 undocumented options each specified by a single digit. The exact same options
must be passed to the decompressor. In the results, I added 27 bytes to the
compressed file sizes to account for this information. enwik9 was compressed
and decompressed as follows:
Warning: epm failed to decompress correctly on enwik7 (first
107 bytes). In the output, some linefeeds were changed
to spaces. This happened with all parameter combinations I
tested including defaults: epm c enwik7 enwik7.epm.
Decompression was bit-exact for enwik5, enwik6, enwik8 and enwik9.
WinUDA 0.291 is a
free, closed source GUI
archiver by dwing, July 4, 2005. It uses context mixing and is
derived from paq6. Mode 3 is the slowest (about 3x slower than
mode 0) and uses the most memory, 194 MB.
dark v0.51 is a free, closed source
archiver by Malyshev Dmitry Alexandrovich, Jan. 2, 2007. It uses BWT + distance coding without preprocessors.
The -b333m option selects 333 MB
blocks. -f (-f0 in 0.40 and 0.46, not supported in 0.32) forces no segmentation.
Memory usage is 5 times the block size for compression
(6x prior to v0.46).
opendark ver. A is an open source version of dark. The supplied Windows dark.exe
crashed when decompressing enwik9 (size is 177,675,818).
Decompression works up to -b127m. opendark does not support the -f option.
FreeArc 0.36 is a free, open source archiver
by Bulat Ziganshin, Feb. 21, 2007. It incorporates 7 compression libraries - PPMd,
GRZipII, LZMA (7zip), plus BCJ (7zip), REP (rzip-like), dynamic dictionary and LZP
preprocessors. The option -m9 selects maximum compression (dict + LZP + PPMd for text
files, REP+LZMA for binary). -lc1600000000 limits
memory to 1.6 GB (same as -lc1600m). There is an option to use ppmonstr as an external
compressor, which was not included in the test.
FreeArc 4.0 pre-4 ppmd generally gives the best compression for text. It will also call ppmonstr
as an external program, but this mode was not tested, even though it compresses better.
For this test, the Windows command line version was tested. The option
-mppmd:1012m:o13:r1 is equivalent to ppmd -m1012 -o13 -r1, selecting 1012 MB memory,
order 13, and partial reinitialization of the model when memory is exhausted.
Note that ppmd normally allows only up to -m256. This program was tested with 2 GB
memory but values higher than -m1012 caused the program to crash during compression.
After each input bit, the next state represents a context obtained by appending that
bit on the right and possibly dropping bits on the left.
States are cloned (copied) whenever the incoming and outgoing counts exceed certain limits.
This has the effect of creating a new context in which no bits are dropped.
In the example below, the state representing context 110 (dropping 2 bits from the
previous context) is cloned by creating a new state 11110 because the incoming 0
transition count (ny for y=0) from state 1111 exceeded a limit. The new context is
longer because it does not drop any bits. This transition is moved to point to the
new state. Other incoming transitions (not shown) remain pointing to the
original state. The outgoing transitions are copied. The counts of the original
state are distributed to the new state in proportion to the moved transition's
contribution to those counts, which is w = ny/(n0+n1).
Normally, the initial set of contexts begin on byte boundaries. The cloning
mechanism ensures that new contexts also have this property.
In hook v0.2, the counts are 32 bit floating point numbers initialized to 0.1. The initial
state machine has 256*255 states representing bytewise order 1 contexts with uniform
statistics. When memory is exhausted, the model is discarded and the state machine
is reinitialized.
A new state is cloned when ny > limit and n0+n1-ny > length, where limit and length
are parameters. The optimal parameters for enwik8 and enwik9 are "c 7 2 6",
c means compress, 7 selects
the maximum of 1 GB memory (64M states at 16 bytes each, minimum is 8 MB memory),
2 is the limit (range 1 to 7),
and 6 selects a length of 32 (possible values are 1, 2, 3, 4, 8, 16, 32, 64).
Larger lengths are better for large files because they
conserve memory at the expense of compression.
hook v0.3 (Jan. 11, 2007) allows up to 1.8 GB memory (first option = 9)
and uses double precision predictions in the 32 bit arithmetic coder.
hook v0.3a (Jan. 12, 2007) initializes the counts to 0.125 (instead of 0.1) and uses 24 bit
precision in the arithmetic coder (instead of 32 bit).
hook v0.4 (Jan. 15, 2007) initializes counts to 0.1. Argument 2 selects length 3 (not 2).
hook v0.5b (Jan. 22, 2007) adds an LZP preprocessor. If the next byte to be coded is the
same as the byte that occurred in the last matching 3 byte context, then this is indicated
by coding a flag bit in an order 3 model (32 MB memory), and a match length coded by DMC
with a fixed size of 128 MB. If there is no match, then the literal byte is coded by
another variable sized DMC model. The parameters "c 1600000000 2 64 1 6" select compression
(c), 1.6 GB for the DMC literal model (1600000000), a limit of 2 (minimum count for the cloned
state), length of 64 (minimum remaining count for the state to be cloned), LZP selected (1),
and a minimum match length of 6.
hook v0.6 (Feb. 7, 2007) removes the "length" parameter (effectively infinite). The
arguments "c 1600 4 1 6" mean to compress (c), use 1600 MB memory, set the "limit" parameter
to 4, turn on LZP preprocessing (1) with a minimum match length of 6. The "limit" parameter
is the minimum count for an outbound DMC state transition to clone the state. Limit was
tuned on enwik8.
hook v0.6b (Feb. 8, 2007) includes support for files up to 264 bytes (compiled
by Ilia Muraviev. Earlier versions were compiled with MinGW g++ 3.4.5 by Matt Mahoney.)
"limit" was tuned on both enwik8 and enwik9. Higher values
conserve memory at the expense of compression on smaller files.
hook v0.6c (Feb. 14, 2007) stores the input filename in the compressed file and uses
it during decompression.
hook v0.7 (Mar. 10, 2007) uses 325 MB more memory than advertised so it was tested with
a lower option.
hook v0.7b (Mar. 12, 2007) reduces the excess memory to 94 MB.
hook v0.8 was released Mar. 17, 2007. Some additional results on enwik9
decreasing the rate at which the state machine fills up and is flushed:
hook v0.8b (Mar. 18, 2007) has some LZP improvements.
hook v0.8c (Mar. 19, 2007) is a minor bug fix. Compressed sizes are 1 byte
larger than v0.8b.
hook v0.8d was released Mar. 21, 2007.
hook v0.8e was released Mar. 27, 2007.
hook v0.9 (Apr. 6, 2007) is closed source. It requires a processor that
supports SSE instructions. It has some speed improvements and
a E8/E9 filter for improved compression of .exe files. Memory usage is
the second argument + 60MB.
freehook 0.2
is an open source port of hook v0.8e from C++ to C by Eugene Ortmann, Apr. 7, 2007.
The supplied .exe file requires SSE instructions (Pentium 3 or higher),
but the source can be recompiled for other processors.
hook v0.9b (Apr 10, 2007) replaces floating point arithmetic with integer
arithmetic, so that archives are compatible across different processors.
Note: I reduced the memory setting from 1800 to 1700 to prevent disk thrashing,
which was a problem in earlier tests. I will do this from now on.
This hurts enwik9 compression (but not enwik8) slightly, from 180,444,546
to 180,582,601. Actual memory usage is 60 MB over.
freehook 0.3
(Apr 10, 2007) has only very minor changes from 0.2 but is
slightly faster due to different g++ compiler options. Compression is the
same as 0.2. Memory usage is about 160 MB over.
hook v0.9c (May 8, 2007) has some speed improvements in the arithmetic
coder. It compresses the same size as v0.9b.
hook v1.0 (Sept. 20, 2007) is closed source. The only option is
memory size in MB.
The zip file linked above contains all versions (C++ source and Win32 .exe).
hook 1.1 (Nov. 13, 2007)
improves BMP and WAV compression.
hook 1.3 was
released Dec. 14, 2007, modified Dec. 15, 2007.
hook 1.4 was
released Apr. 29, 2009.
7zip 4.42 is an open source GUI and command line archiver
by Igor Pavlov, May 14, 2006. It compresses to 7z, zip, gzip, ppmd.H and tar format,
optionally encrypts with AES, and will uncompress several other formats.
7z is the default format. It uses LZMA compression, a variation of LZ77.
The option -mx=9 selects ultra (maximum) compression in this mode. The option
-sfx7zCon.sfx creates a console-based self extracting executable by prepending
a 131,584 byte decompressor. This is slightly smaller than the Windows GUI version
(132,096 bytes) and much smaller than the decompression program itself as a zipped
self extracting download (817,795 bytes). The best compression is with ppmd.
The options are -m0=ppmd:mem=768m:o=10 equivalent to ppmd var H (with minor changes)
order 10 with 768 MB memory.
The following include the best known option combinations for 7zip on enwik8
in ppmd (PPM), 7z (LZMA), bzip2 (BWT) and zip (LZ77) formats.
M99
(mirror) is a free
file compressor by Michael Maniscalco, originally written in 1999 and ported
to Windows on Mar. 27, 2007. It uses BWT, based on MSufSort 3.1.
M99 is a predecessor to M03. Command line is:
Version 2.1 was released Apr. 19, 2007.
M99 2.2.1,
released July 18, 2008,
has an optimization to compress the contents of TAR files separately. For other files,
it increases the size by 1 byte.
pimple 1.43 beta is
a free, closed source GUI archiver by Ilia Muraviev, Apr. 24, 2006. It uses
context mixing.
pimple2 is a
command line file compressor, June 11, 2007.
ash 04a
is a free, experimental command line file compressor by
Eugene D. Shelwien, Dec. 5, 2003. The /m700 option
selects 700 MB memory limit. (/m800 causes disk thrashing with 1 GB).
/o10 selects model order 9.
This gives good results on smaller files when memory
is constrained, but I did not try to optimize it.
There is a /s option to select SSE depth that
gives good results for the default value of /s5
so I did not try to optimize it either. Other results:
ash /m1700 /o10 and /o12 failed to compress enwik9 with 2 GB memory
(error: could not allocate a block).
enwik8 compressed to 19,713,239 using /o10 and
19,446,859 using /o12.
ocamyd LTCB 1.0 is a modification by Mauro Vezzosi on June 20, 2006
of Frank Schwellinger's ocamyd-1.65-final. The option -s0 selects
maximum compression. -m3 selects 300 MB memory (the maximum for
the test machine), but it supports up to -m8.
ocamyd 1.66.final, by Frank Schwellinger, Feb. 1, 2007,
includes the -f option to prevent flushing and rebuilding
the DMC model when memory is exhausted.
The following table shows the effect of the -s and -m options on ocamyd 1.65.final
on enwik8.
Times are in ns/byte, process (kernel+user) time by timer 3.01,
~ indicates global (wall) time.
bee 0.78 build 0154
is an open source (Delphi Object Pascal)
command line archiver (with optional GUI)
by Andrew Filinsky and Melchiorre Caruso, Sept. 23, 2005.
It uses PPM. The -m3 option select maximum compression (default
is -m1). The -d8 option selects 512 MB memory, the maximum that does
not cause disk thrashing (default is -d2 = 10 MB).
bee includes beeopt, a parameter optimizer similar to epmopt.
This was not tested. bee comes preconfigured with parameters
trained on .txt and .xml files (and other types) in file bee.ini. This was tested by renaming
enwik7 (first 107 bytes)
to enwik7.txt and enwik7.xml but compression was worse.
The executable size is a zip archive containing
bee.exe and bee.ini. This is much smaller than the zipped source code download.
Additional results on enwik8:
TC 5.2 dev 2
is an experimental command line file compressor, currently under development
by Ilia Muraviev. It takes no options.
5.0 Dev 1 uses LZP.
Dev 4 includes an improved hash table to conserve memory and a faster range coder compared to dev. 2,
but compression is the same.
Starting with 5.0 dev 6, LZP literals
and match lengths are encoded using PPMC (PPM with fixed escape probabilities to lower orders).
Dev 7 and 9 use order 3-1-0 PPMC.
tc 5.0 dev 11
(July 24, 2006) is the last of this series.
tc 5.1 dev 1
uses ROLZ (reduced offset LZ) with PPM order 1-0 for literals,
offset set reduced with order 2 context, and a 16 MB dictionary.
tc 5.1 dev 2 has improved
parsing and is archive compatible with dev 1.
tc 5.1 dev 5 uses
ROLZ plus context mixing (instead of PPM) for order 2 literals.
tc 5.1 dev 7
uses improved parsing (flexible parsing) and adds SSE.
tc 5.1 dev 7x
uses a larger dictionary.
tc 5.2 dev 2 uses FPW
(fast PAQ weighting).
ppmvc v1.1
is a free, command line file compressor by Przemysław Skibiński, May 12, 2006,
based on PPMd var. J by Dmitry Shkarin. It uses variable length contexts
as described in the paper,
P. Skibinski and Sz. Grabowski. Variable-length contexts for PPM.
Proceedings of the IEEE Data Compression Conference (DCC04), pp. 409-418, 2004
(not available online).
The command line options are the same as in PPMd: -o8 selects order 8, -m256 selects
256 MB memory, -r1 partially rebuilds the model when memory is exhausted. I tuned
the compressor to -o8 on enwik8. There are additional options related to VC
compression (which must be specified during decompression), but I used the
defaults since there is no guidance on how to set them.
chile 0.3d-1 is a free,
command line file compressor as C source code by Alexandru Mosoi, May 29, 2006.
It uses BWT. The option -b40000 selects a block size of 40000 KB, which requires about
785 MB of memory for compression and 240 MB for decompression. Version 0.3d1
is identical to version 0.3d except that the maximum block size
was increased from 2048 KB to 99999 KB. For this test the program was compiled for Windows
using MinGW 3.4.2 as specified in the Makefile.
chile 0.4
(Jan. 27, 2007)
introduces a faster algorithm for building suffix arrays that uses less memory (7N).
The option -b=244141 selects the block size in Kb (to split enwik9 in 4 equal parts).
It was compiled using MinGW gcc 3.4.5 with options -W -Wall -fomit-frame-pointer -g -O3
and tested in WinXP Home with 2 GB memory.
CTXf 0.75 pre-beta 1
is a free, closed source command line archiver by Nikita Lesnikov, Sept. 20, 2003.
It uses PPM with preprocessing for text, exe and multimedia files.
The option -me selects extreme (best) compression. It uses about 78 MB memory
in Windows task manager.
rings 0.1 is a free, closed
source, experimental file compressor by Nania Francesco Antonio, Sept. 21, 2007.
It uses LZP with order-2 coding of literals and arithmetic coding.
It takes no command line options.
rings 0.2 (Nov. 16, 2007) includes improved BMP, WAV, TIFF, and PGM filters.
rings 0.3 was released Dec. 21, 2007.
rings 1.0 was released Feb. 8, 2008. It uses 50 MB for compression and 43 MB
for decompression.
rings 1.1 was released Feb. 13, 2008 with same memory usage. It uses CM with LZP
preprocessing for faster compression.
rings 1.2 was released Mar. 4, 2008 with the same memory usage.
rings 1.3 was released Apr. 2, 2008. It uses 54 MB for compression and 47 MB for decompression.
rings 1.4c was released Apr. 14, 2008. It has an option (1-9) which selects
memory usage. Each increment doubles usage. Memory usage and run time are greater
for decompression than compression. For option 9, compression uses 526 MB and
decompression uses 789 MB. The program uses BWT. The transformed data is encoded
using MTF (move to front), pre-Huffman coding followed by arithmetic coding.
rings 1.5 was released Apr. 21, 2008. It improves compression and is symmetric
with regard to memory usage. Options are like 1.4c.
m03exp-2005-01-27 is an
experimental, closed source GUI file compressor by mij4x, Jan. 27, 2005.
It uses BWT implementing the M03 algorithm by Michael A Maniscalco.
with a maximum block size of 8MB. (Note on the GUI: to compress
or decompress, drop a file on the program window. Right click to
select options).
m03exp-2005-02-15
(Feb. 15, 2005) supports blocks up to 32MB but is otherwise identical.
Stuffit 9.0 is a commercial GUI archiver by Allume Systems,
now Smith Micro. This was the current version as of May, 2006.
Note: their free 30 day trial required registration and a credit card number
which was charged if you forgot to cancel. The options tested were:
Stuffit
12.0.0.17 (compression technology version 12.0.0.21) was released Jan. 31, 2008.
It includes lossless compression of JPEG and MP3 files and lossy recompression
of zip archives, GIF, TIFF, PNG, and PDF files. It supports a native SITX format
as well as zip, gzip, rar, bzip2, compress, tar, cab, and some more obscure
formats. It is multithreaded for multicore support, although I tested it on
a single core processor. I only tested
the native general-purpose formats. For these tests, I used the command
line programs console_stuff.exe and console_unstuff.exe to reduce the executable size
and measure run time more accurately. The options are
-m=1 (LZ77-Huffman), -m=2 (LZ77-arithmetic), -m=4 (PPM), -m=8 (BWT), -l (level 2-16,
higher is slower but better), -x (memory extents, max 30, higher uses more memory).
The best compression for text is -m=4 (PPM) with maximum
memory -x=30. (In the GUI but not the command line, above 29 causes an out of memory
error with 2 GB RAM). The -l option apparently has no effect on PPM.
The decompressor size is based on console_unstuff.exe and the minumum set of
5 .dll files needed to run it (4 common plus Plugins/sitx.dll).
The full GUI installer (without Office plugins)
zips to 17,051,856 bytes. The tested version was a complimentary copy provided
by the company.
Stuffit 2009 13.0.0.19 (compression technology 13.0.0.24) was released Dec. 19, 2008.
I tested as with Stuffit 12, however the technique of finding the minimal set
of .dll files that I used in Stuffit 12 did not work (internal error)
so I had to include the zipped distribution
size (StuffIt2009.exe), which includes many other compression formats and a GUI.
The tested version was a complimentary copy provided by the company.
ppmx 0.02
was released Dec. 2, 2008. It uses order 9 PPM with hashed context tables,
as discussed here.
There is also a
core 2 duo version
which is faster, although it runs on only one core, and has a slightly larger
executable. Note that the table below is misleading because on enwik8 the regular
version compressed at 976 ns/byte (12% longer) and decompressed at 992 ns/byte
(4.5% longer) than the core 2 duo version.
ppmx 0.03
(discussed here) was
released Dec. 22, 2008.
ppmx 0.04
(discussed here)
was released Jan. 5, 2008. It uses order 12-5-3-2-1-0 PPM and 280 MB.
sbc 0.970r2
is a free, closed source command line archiver and file encryptor
by Sami, June 27 2005. Compression options suggest it uses BWT.
The -m3 option selects maximum compression, requiring 32 MB memory
(-m1 is minimum). The -b63 option
selects maximum block size (32 MB, requiring 192 MB additional memory).
-ad disables adaptive block size reduction
for homogeneous data. SBC runs faster with smaller block sizes and minimum
compression as shown:
The model order was tuned on enwik8. Additional results are shown
for order 10,
for -m5 (maximum compression), and for normal compression as a .exe and
.rar file. The decompressor in the last case is zipped unrar.exe.
M1 0.2a
is a free, open source (GPL) file compressor by Christopher Mattern,
released Oct. 3, 2008. It uses context mixing with only two contexts.
The contexts are 64 bits with some bits masked out. The masks and several other
parameters were selected by a combination of a genetic and hill climbing
algorithms running for several hours to 3 days
to optimize compression on this benchmark as discussed
here.
M1 0.3
was released Jan. 2, 2009.
M1 0.3b
was released Apr. 12, 2009. This version takes a configuration file created
by an optimization version of the program. The configuration file is required
by the decompressor (and is included in the program size).
e8-m103b1-mh
is a parameter file for M1 0.3b obtained by mhajicek after about 3 days of CPU time
running M1's genetic optimization program on enwik8.
rzm 0.07h was released
Apr. 24, 2008. Advertised memory usage is unchanged.
pim 2.01 is a free GUI archiver by
Ilia Muraviev, based on PPMd by Dmitry Shkarin, using PPM. Version 2.01
was released June 14, 2007. It has options to model color images and
.exe files. These make no difference on text and were turned off.
It was timed with a watch.
pim 2.04 beta was released July 21, 2007. It has PPMd as its only option.
pim 2.10 was released July 31, 2007. Older versions are no longer supported.
pim 2.50 was released July 22, 2008. It supports 3 compression modes: store,
normal, and best. Only best was tested. It compresses in PPMd, bzip2 and DCL
formats and extracts BALZ, QUAD, ZIP, JAR, PK3, PK4 and QUAKE PAK archives.
CTW 0.1 is a free, command line file compressor with source
code by Erik Franken and Marcel Peeters, Nov. 13, 2002. It uses CTW (context tree weighting),
a type of context-mixing algorithm (with single bit prediction and arithmetic coding) combining
the predictions of different order contexts. Statistics are stored in a suffix tree.
The -d6 option selects order 6 (depth of context tree). -n16M selects the maximum of 16M nodes
for the tree (using 128 MB memory). -f16M selects the maximum 16 MB file buffer
(for rebuilding pruned contexts). The default values of all other options were tested on
enwik6 and found optimal. For -d, there is a tradeoff between compression and memory usage
as with PPM compressors. -d6 was found optimal on both enwik7 and enwik8.
boa 0.58b
is a free, closed source command line archiver by Ian Sutton, Apr. 2, 1998.
It uses PPM. The -m15 option selects maximum memory, 15 MB.
TarsaLZP Aug 8 2007
is a free, experimental file compressor with public domain source code (FASM)
by Piotr Tarsa.
Older versions used order 3 LZP to code the last 16 matches at order 3,
followed by order 2 PPM encoding of literals.
It takes no command line options but compression/decompression settings may be specified in
an initialization file. For this test, default settings were used and others were not tried.
The Jul 30 2007 version uses 2 LZP models, one with a 4 byte context and one 8 byte.
The program selects the one that gives a higher probability of a match. There is no
initialization file.
The Aug 8 2007 version uses 341 MB memory for compression and 333 MB for decompression.
The interim Aug 10 2007
version runs at high priority. (CAUTION, this will make your computer unusable while running).
lzturbo 0.1 (Oct. 5, 2007)
is threaded for parallel execution on multicore machines. The maximum
comprssion level is -59 where it uses 248 MB for compression and a peak
of 72 MB for decompression. Other modes compress much faster. The read-only
bug was fixed.
lzturbo 0.9
was released Feb. 25, 2008. Decompression memory peaks at 79 MB.
lzturbo 0.94
was released Apr. 11, 2009. The option -b59 selects method 5, compression level 9
for maximum compression. -b100 selects a block size of 100 MB for independent
compression in separate threads. The default is 32 MB. -p0 forces the
compressor to run on one core. By default the program runs on on all cores, but
this causes the program to run out of memory with -59 because each thread uses 1450 MB.
Decompression ran on 2 cores with a process time of 20 seconds per core
and wall time of 28 seconds using about 300 MB memory. Faster modes tested
below are run on 2 cores with average process time per core shown.
LZPXj 1.2h, Mar. 6, 2007, uses LZP + PPM with a preprocessor for x86 executables.
It has just one option (1-9) which select memory usage.
The default is 6. The maximum is 9. Each increment doubles usage.
scmppm 0.93.3 is
a GPL open source command line compressor for XML files by James Cheney and
Joaquín Adiego, Oct. 3, 2005, and using PPMd var. I
code by Dmitry Shkarin. It works by grouping XML data by tag, then compressing
with ppmd (similar to XMill). scmppm is distributed as UNIX source code only. For this test
it was compiled and run under WinXP using the latest version of Cygwin, g++, flex, and make as
of May 24, 2006. To compile I had to add the line extern "C" int fileno(FILE*);
to lex.yy.c.
The -l 9 option selects maximum compression.
fpaq0s2 is a
free, open source (GPL) file compressor by Nania Francesco Antonio, Sept, 29, 2006.
It is an order 2 model based on the order 0 compressor fpaq0s by David A. Scott,
which is based on fpaq0 by Matt Mahoney by modifying the arithmetic coder.
fpaq0x is the same order 2 model based directly on fpaq0.
fpaq0x1a is an order 3 model (hashed context) using fpaq0's arithmetic coder.
fpaq0s2b is a similar model based on fpaq0s. Both were released Oct. 1, 2006.
fpaq0x1b (Oct. 6, 2006) switches between different models up to order 3.
fpaq0s3 (Oct. 8, 2006) uses a simple order 0 model on groups of 3 bytes.
fpaq0s4 (Oct. 12, 2006) uses a combined order 0-1-2, PPM and LZ model.
fpaq0s5 (Oct. 15, 2006) improves on fpaq0s4. Memory usage is 200 MB when
run at normal priority and 160 MB when run at below normal priority (WinXP Home).
fpaq2 (Oct. 21, 2006) uses a combination context mixing and PPM algorithm.
fpaq0s6 (Oct. 30, 2006) improves on fpaq0s5.
fastari (Nov. 7, 2006) is an order 2 compressor with an all new arithmetic coder
and greater speed.
fpaq3 (Nov. 20, 2006) is an order 3 compressor.
fpaq3b (Dec. 2, 2006) is a bitwise order 28 compressor.
fpaq3c (Dec. 21, 2006) is an improved bitwise order 28 compressor.
fpaq3d (Dec. 28, 2006) adds an option to fpaq3c to select memory
usage from 16 MB to 2 GB. Option 6 selects 1 GB memory (the highest tested).
All programs are here.
flashzip 0.2 was released Jan. 11, 2008. It is compatible with version 0.1 but faster.
Note: in both versions, CPU utilization during compression is about 28% to 35%. Times
shown are process times.
flashzip 0.3 was released Feb. 4, 2008. It uses ROLZ plus arithmetic coding. It
takes an option x for better compression (slower) and 1 through 5,
where 5 is the slowest (best compression).
flashzip 0.9 was released June 28, 2008. Option -m2 selects method 2 (default
is -m1). -b1 through -b5 select buffer size, which affects memory usage.
Default is -b3. -s1 through -s7 selects match length and speed. Default is
-s1 (fastest, worst compression).
flashzip 0.91 was released Aug. 17, 2008. Options are like version 0.9.
Memory usage was increased
to 198 MB for compression and 138 MB for decompression using settings for
best compression. Minimum requirement is 10 MB and 6 MB.
flashzip 0.93a
was released Mar. 9, 2009.
flashzip 0.94 was released Mar. 25, 2009.
balz 1.02 is a free,
closed source file compressor by Ilia Muraviev, Mar. 8, 2008. It uses LZ77
with arithmetic coding, a 512K buffer with Storer and Symanski parsing.
It takes no options. Memory usage is 346 MB for compression and 18 MB for
decompression.
balz 1.06, May 9, 2008, has two compression
options, e for normal and ex for better but slower compression. Both options use
67 MB for compression and 48 MB for decompression.
balz 1.07
was released May 14, 2008. It uses 132 MB for compression and 95 MB for decompression.
balz 1.08
was released May 20, 2008. It uses 200 MB for compression and 126 MB for decompression.
Only mode ex was tested.
balz 1.09
was released May 21, 2008. It uses 128 MB for decompression. Only mode ex was tested.
balz 1.12
was released June 3, 2008. It uses 123 MB for decompression.
balz 1.13
was released June 11, 2008. It uses 127 MB for decompression.
balz 1.15 was released as open source
on July 8, 2008. It uses 67 MB for compression and 49 MB for decompression.
lzpm 0.02 is a free, closed source
file compressor by Ilia Muraviev, Apr. 19, 2007. It uses LZ77. It takes no options.
lzpm 0.03, Apr. 28, 2007,
uses more memory for compression (181 MB), but still uses 20 MB for decompression.
lzpm 0.04, May 4, 2007,
uses ROLZ. Memory usage is 83 MB for compression and 20 MB for decompression.
The new design uses circular hash chains for better speed on binary files,
but a little slower for text.
lzpm 0.06, May 19, 2007,
improves compression over 0.04 with the same memory usage.
lzpm 0.07, Aug. 6, 2007,
and later versions use 280 MB for compression and 20 MB for decompression.
lzpm 0.08, Aug. 8, 2007.
lzpm 0.09, Aug. 15, 2007.
lzpm 0.10, Aug. 23, 2007.
lzpm
0.11, Sept. 5, 2007,
takes the command 1..9 to choose the compression level (fastest...maximum).
1 uses greedy parsing. 2..8 use 1..7 byte lookahead. 9 uses unbounded lookahead.
All modes use 723 MB for compression and 77 MB for decompression.
lzpmlite 0.11, Sept. 13, 2007,
is a "lite" version of lzpm, using about half as much memory and twice as fast.
Options range from 1..9
with 1 being fastest and 9 for best compression. (3 is a good compromise).
All modes use 362 MB for compression and 39 MB for decompression.
lzpm 0.13 was released
Dec. 1, 2007.
lzpm 0.14 was released
Jan. 1, 2008. It uses 40 MB for decompression.
lzpm 0.15 was released
Jan. 16, 2008. It uses 40 MB for decompression.
The -d9 option selects maximum dictionary size. -x7 selects
maximum hash level (most memory). -l7 selects maximim search level
(slowest).
See ppmonstr above.
turtle 0.01
is a free, experimental, closed source file compressor by
Nania Francesco Antonio, June 1, 2007. It uses PPM. It takes no options.
turtle 0.02
was released June 2, 2007. Compression is identical.
turtle 0.03
was released June 5, 2007. It is faster and improves compression slightly.
The file name is stored in the compressed file.
turtle 0.04
was released June 8, 2007. It recognizes several different file types.
turtle 0.05
was released June 12, 2007. It improves compression at the cost of time and memory.
turtle 0.07
was released June 23, 2007. It includes a model for audio files.
WinTurtle 1.2 is a Windows GUI
version of turtle, released Aug. 16, 2007. It uses PPM with LZP preprocessing.
It detects .tar, .iso, .nrg, .wav, .aiff, .bmp, .exe, .pdf, .log and text files.
Compression times are wall times. Note: the user interface is not fully functional.
To compress a file, click "Drive", click on "Buffer" until it is set to 512 MB (it does not
work until you click "Drive" first, also 1 GB caused program to crash on enwik8),
select "File/compress single file" from the upper menu,
then select the input file and output archive from the two file dialogs.
The program adds a .tur extention to the output archive. To decompress,
select File/open archive, click on the file name, click Select, click Extract,
and select an output folder from the file dialog.
WinTurtle 1.21, Aug. 16, 2007,
fixes an unrelated bug but is otherwise the same as 1.2.
WinTurtle 1.30 was released Aug. 30, 2007.
WinTurtle 1.60 was
released Jan. 1, 2008.
Compression is as follows.
A 20-bit hashed order-4 context is mapped into the last 3 bytes seen
in that context in a move-to-front queue, plus a consecutive hit count.
Queue positions (hits) or literals (misses) are arithmetic coded using
the count and an an order-1 context (order-0 if the count is more than 3)
as secondary context. After a byte is coded, it is moved to the front of the queue.
The hit count is updated as follows: incremented (max 63) if the first byte
is matched, set to 1 if any other byte is matched, or set to 0 in case of a miss.
sr3
(mirror)
is a modification by
Nania Francesco Antonio, Oct. 28, 2007. The context table size is increased
from 4 MB to 64 MB, which effectively increases the context from order-4 to
order-5. This helps compression on larger files, but makes it worse for some
smaller files. The program also detects file type. For .bmp files, the order is
decreased. For .wav files, the input is split into separate 1 byte wide streams
for each audio sample. There is no separate compressor and decompressor program.
bzip2 1.0.2 is an open source command line
single file compressor by Julian Seward, released Dec. 30, 2001.
It uses BWT. The -9 option selects maximum compression.
bzip2 1.0.3
(May 22, 2005) compresses very slightly larger but is faster, as shown by
the following table. The decompressor
size is based on zipped bunzip2.exe. This is smaller than the source
(724,919 bytes as a zip download).
quad is a free file compressor by
Ilia Muraviev. Only the latest version (now open source) is supported, so only that version
appears in the main table.
As described by the author:
QUAD uses ROLZ compression (Reduced Offset LZ). It makes use of an order-2 context to
reduce the offset set that is matched to. This can be regarded as a fast large
dictionary LZ. Literals and Match Lengths fits in a single alphabet which is coded
using an order-2-0 PPM with Full Exclusion. Match indexes are coded using an order-0
model. QUAD uses a 16 MB dictionary. For selectable compression speed and ratio, QUAD
uses different parsing schemes: with Normal mode (Default) QUAD uses a Lazy Matching;
with Max mode (-x option) QUAD uses a variant of Flexible Parsing. In addition, QUAD
has an E8/E9 transformer for better executable compression which is always enabled.
quad 1.01a (Dec. 24, 2006) used LZ77. It was closed source and took no options.
quad 1.04a (Feb. 8, 2007) used LZP. Memory was expanded for this version
only, however it is no longer supported.
quad 1.07beta (Feb. 22, 2007)
included the "x" option for better compression.
quad 1.08 was released Mar. 12, 2007. Quad became open source.
quad 1.10 was released Mar. 19, 2007. -x selects maximum compression.
quad 1.11 (Apr. 4, 2007) uses ROLZ.
quad 1.11HASH2
(Apr. 5, 2007, experimental, no source code) produces the same size archives, but uses
a hash table for faster compression.
quad 1.12 was released Apr. 7, 2007.
WinACE 2.61 is a shareware GUI/command line archiver,
Mar. 8, 2006. It compresses in ACE and ZIP formats and decompresses
many others. ACE decompresses much faster than it compresses,
suggesting it is based on LZ77. The option -m5 selects maximum compression.
-d4096 select maximum dictionary size of 4MB (default is -1024 = 1MB).
-sfx creates a self extracting archive, which adds less space than the
program itself.
tornado 0.1 is a free, open source file
compressor by Bulat Ziganshin, Apr. 16, 2007. It uses LZ77 with arithmetic coding.
The -9 option selects a predefined compression profile for maximum compression.
There are custom options for hash table size, hash chain length, block size, type
of coder, and an option to force or prohibit cache matching. Some of these options
might give better compression, but were not tested.
tornado 0.3 has options -1
through -12. Each increment approximately doubles compression time and memory usage.
Decompression time is fast in all cases, but memory usage is approximately 2/3 that
of compression (for the LZ77 buffer). -12 caused disk thrashing and was not tested
for enwik9. There are several other options that were not tested.
tornado 0.4a
was released June 1, 2008. It includes Windows and Linux versions. There is a small
version (tor-small.exe) which does not include some of the advanced options.
The advanced options were not tested. Option -12 caused disk thrashing (2 GB memory)
when enwik9 reached 80% compression, so -11 was used instead.
lzc 0.03 was released May 11, 2007.
lzc 0.04 was released May 16, 2007. All versions up to 0.04
use 107 MB memory for decompression.
lzc 0.05b was released May 26, 2007. It has options from 1 (fastest)
to 16 (best compression). It uses 771 MB to compress and 390 MB to decompress.
All versions through 0.05b are linked in the above archive.
lzc 0.06b was released Aug. 27, 2007.
It uses 790 MB (peak) for compression and 409 MB (peak) for decompression.
lzc 0.07 was released Oct. 24, 2007.
Options range from 1 (fastest) to 10 (slowest).
lzc 0.08 was released Nov. 15, 2007.
It improves BMP and WAV compression.
packet 0.01 is a free,
experimental file compressor by Nania Francesco Antonio, May 11, 2008.
It uses LZP. It takes no options.
packet 0.02, May 16, 2008,
improves compression for .wav files and supports files over 2 GB.
packet 0.03b, May 20, 2008,
uses LZ77, 3 MB for compression, and 1 MB for decompression. It takes an optional
argument 'x' meaning better but slower compression, and a level 1 through 6, where
6 is slowest with best compression.
packet 0.90b, June 18, 2008,
has options -m1 to -m4 (method) and -s0 to -s9 (intensity). All options use
10 MB for compression and 2 MB for decompression.
lcssr 0.2 (Dec. 3, 2007, same website)
(mirror with .exe)
is derived from symbra. It drops the secondary symbol queue
and instead uses a variable length context based on the length of the
longest match as with LZ77/LZP. The option -b7 selects a 1152 MB buffer
for finding context matches.
csc2 is a free,
experimental, closed source file compressor by ForTheKing, Apr. 18, 2009.
It uses LZP with order 1 modeling of literals and range coding over a 270
size alphabet. The program takes no options. It recognizes whether the input
file is compressed, and if so, decompresses it.
slug 1.27,
May 7, 2007, uses a ROLZ variant with a 8MB non-sliding window and semi-dynamic
Huffman coding trees rebuilt every 4KB (more frequently near the beginning of a file).
uc2 (UltraCompressor II
revision 3 pro) is a commercial (free for noncommercial use) command line and GUI
archiver for DOS by Nico de Vries, June 1, 1995. It uses LZ77 and Huffman coding.
The -tst option selects maximum compression.
uc2 includes a program for converting archives to self extracting
programs (uc2sea) which produced smaller files (enwik8.exe = 35,397,343 bytes,
enwik9.exe = 312,759,499 bytes), but in this mode decompression failed for enwik9,
truncating the last 21 bytes of output. uc2sea works by first extracting the
archive and then recompressing it using a slightly different algorithm.
thor 0.94 alpha
(mirror)
(mirror)
was relesed Apr. 22, 2007. exx is a new mode to select maximum compression.
Times shown are process times excluding disk I/O. Actual times are 96 sec. to compress,
75 sec. to decompress).
thor 0.95
(mirror),
May 8, 2007, has 5 compression options: e1 through e4 are LZP in order of increasing
compression; e5 is LZ77. Note that e5 is best on enwik8 but e4 on enwik9.
thor 0.96a,
Aug. 23, 2007, works like 0.95.
gzip
1.3.5 is an open source single file command line compressor
by Jean-loup Gailly and Mark Adler, Sept. 30, 2002.
It uses LZ77 (flate, but not compatible with zip).
The -9 option selects maximum compression although its effect is small (see below).
Info-ZIP 2.3.1 (Mar. 8, 2005)
is a free, open source
archiver for many operating systems. It uses the standard LZ77 "flate" format, like
gzip and many zip-compatible programs. (The sizes are exactly 125 bytes larger
than gzip). This test was under Linux
(Ubuntu 2.6.15.27-amd64-generic) on a 2.2 GHz Athlon-64.
Uncompression was with UnZip 5.52 (Feb. 28, 2005), both part of the normal
Ubuntu distribution. The -9 option selects maximum compression.
The Windows version 2.32 is dated June 19, 2006.
pkzip 2.04e is a commercial
(free trial) command line archiver by PKWARE Inc.
written Jan 25, 1993. It uses LZ77 (flate format).
The option -ex selects maximum compression. The decompressor is pkunzip 2.04e.
Times are wall times. (Timer doesn't show process times for DOS programs).
There are many programs that produce zip files. I don't plan to test them all.
jar 0.98-gcc is an open
source command line archiver by Bryan Burns, 2002. It uses LZ77 (zip). It is included with Java (1.5.0_06) and
is normally used to create .jar files for compiled Java applications and applets, but it can
also be used as an archiver. It has no compression options.
The cvf options creates an archive. The M option says to not add a manifest file.
Note: this is not the jar compressor from Arjsoft.
PeaZip 1.0 by Giorgio Tani (Nov. 6, 2006)
is a GPL open source GUI archiver
supporting several common formats. The format tested is the native format which uses zlib
(gzip algorithm). The "better" option chooses best compression (equivalent to gzip -9).
Integrity check (checksum) and encryption are turned off.
lzgt1
(click on lzgt3a.zip) is one of a group
of free, open source, experimental file compressors by Gerald R. Tamayo, released
July 17, 2008. It uses LZT (Limpel-Ziv-Tamayo) compression, a LZ77 variant
in which the decompressor rebuilds a list of matches sorted by context match
length and the match length is implied or partially implied by the position
in the list. lzgt implements LZT using a 4K sliding window, 32 byte
look-ahead buffer and 3 bit code length. lzgt1 is like lzgt
but uses a 16K sliding window and 128 byte look-ahead buffer.
lzgt2 eliminates the code length entirely. lzgt3 is an improved version
of lzgt2. All programs have separate decompressors (lzgtd1, etc) and are
compiled for DOS (and Windows).
lzgt3a was added Oct. 25, 2008. It uses a 128K window size, 64K
lookahead buffer, and improved coding.
lzss 0.01 is a free,
experimental file compressor by Ilia Muraviev, Aug. 1, 2008. It uses
LZSS, a byte aligned LZ77 variant with matches encoded with an 18 bit pointer
and 6 bit length field, and 1 bit flags to distinguish matches
from literals. It is discussed here.
Compression options are e (fast) or ex (smaller). The program is designed
for fast decompression. The program uses 625 MB for compression and 33 MB
for decompression.
lzuf is a free, experimental open source
file compressor by Gerald R. Tamayo, Apr. 15, 2009 (but tested the previous day due to
time zone differences between the U.S. and the Philippines). It uses LZ77 with folded unary encoding
of match lengths. It takes no arguments. It has a separate decompression program, lzufd.exe.
The most recent version was written in Visual C and ported to Windows as a
cross compressor intended to produce self extracting archives for the
Commodore. By default, pucrunch appends a 276 byte header containing 6510 code to
extract the file. There are also standalone decompressors written in 6510
assembler and in Z80 assembler. I could not test in these environments, so I
used the -d -c0 options to turn off the self extracting feature, which requires
the (larger) Win32 external compressor/decompressor.
There are two additional limitations. First, the decompressor appends a 2 byte
header to indicate the load address, which is required by the Commodore. To
make the decompressed file bitwise identical, this must be stripped off. Second,
the input file size is limited to 64,936 bytes. The author tested a modified
version without a file size limit on the Calgary corpus, but this modified version
was not posted, so I did not use it.
To overcome these limitations
I wrote the following Perl scripts to compress and decompress. The first script
compresses by splitting the input into blocks of 64,936 bytes, compressing them
separately, and appending the compressed files each with a 2 byte header to indicate
the block size. The second script decompresses each block one at a time, strips
off the 2 byte Commodore header, and appends them. Each script takes the input
and output files as command line arguments. The second script is included in
the decompressor size.
pucrunch suggests using -p1 and -m6 options to improve compression
but these do not help.
Run times are wall times. Using scripts, Timer 3.01 does not provide
useful process times, since it times Perl rather than pucrunch.
The decompression time (463 sec) is probably high because Windows Task Manager
shows that pucrunch is running only a small fraction of the time, perhaps 10%.
Most of the time is probably the overhead of file I/O and running pucrunch
15,400 times.
lzop v1.01 is a free, open source (GPL) command line
file compressor by Markus F.X.J. Oberhumer, Apr. 27, 2003. A newer version, 1.02 rc1
was released July 25, 2005, but no Win32 executable was available for download
as of May 29, 2006. lzop uses LZ77. It is designed for high speed. -9 selects
maximum compression. lzop is I/O bound. timer 3.01 reports the decompression
process time as 12 seconds. The remaining 38 seconds is due to disk access.
lzw v0.2 was released with
public domain source code for the decompressor, which zips to 671 bytes. The file
format is as follows. There is no header or trailer.
Each 16 bit code word is in machine dependent order
(LSB first on x86). Codes 0-255 represent single bytes of the same value.
Codes 256-65535 are assigned in ascending order by concatenating the decoded
values of the previous two codes. After assigning code 65535, new codes are
assigned by replacing the oldest codes first, starting with 256.
Data is decoded into a rotating buffer of size 16 MiB (224 bytes)
by copying a string from elsewhere in the buffer. Neither the original nor
copied string crosses the buffer boundary, and they do not overlap each other.
No new symbol is added after decoding the first byte of the buffer.
arbc2z is a free, experimental command line
file compressor with source code by David A. Scott, June 23, 2006.
It is a bijective order-2 (PPM) arithmetic coder. A bijective
coder has the property that all inputs to the decompressor are valid and produce distinct outputs.
The above archive also contains arbc2, which uses a different method of handling of the zero frequency problem,
arbc1 (order 1), and arbc0 (order 0), all of which are bijective.
The -C8 option selects the maximum number of contexts, 218.
For this test, the C source code was compiled with MinGW 3.4.5:
Version 0.9 (Oct. 22, 2006) is a faster version (quick.exe)
which handles large (64 bit) files.
Version 1.20 (Mar. 15, 2007) is an archiver rather than a file compressor.
Version 1.30 beta
(Apr. 16, 2007) has 4 modes (0-3) with 4 separate executables.
Only version 3 (quick3.exe, max compression) was tested.
Version 1.30 (Aug. 14, 2007) modes 0, 1, and 2 are compatible with version 1.20,
but mode 3 (best compression) is new.
Version 1.40 (Nov. 13, 2007) is an experimental version designed for better speed.
It has only one mode.
compress 4.3d is is the Windows version of the UNIX compress
command, released Jan 18, 1990. It uses LZW and has no compression options.
BriefLZ
1.05
is a free, open source (C and MASM) file compressor by Joergen Ibsen,
Jan. 15, 2005. It uses LZ77. It takes no options.
It uses about 2 MB memory for compression and about 900 KB for decompression.
lzrw1 (Mar. 31, 1991)
is byte-aligned LZ77 with a 12 bit offset and 4 bit length field
allowing lengths 3-16. Each group of 16 phrases (pointers or literals)
is preceded by 2 flag bytes to distinguish pointers from literals.
Matches are found using a 4K hash table without confirmation which is
updated after each phrase. It uses 16K of memory plus the input and
output buffers.
lzrw1-a (June 25, 1991)
is lzrw1 except that the length field represents values 3-18.
lzrw2 (June 29, 1991)
replaces the offset with a 12 bit index into a rotating table
of offsets, allowing the last 4K phrases (rather than 4K bytes) to
be reached. The decompressor must reconstruct the phrase table
(but not the hash table). It uses 24K memory plus buffers.
lzrw3 (June 30, 1991)
replaces the 12 bit length field with a 12 bit index into the
hash table. The decompressor must reconstruct the hash table.
It uses 16K memory plus buffers.
lzrw3-a (July 15, 1991)
uses a deep hash table (8 offsets per hash) with LRU replacement.
It uses 16K memory plus buffers.
lzrw5 (July 17, 1991) uses LZW. The dictionary is implemented as a tree.
It uses up to 384K memory plus buffers.
There is an experimental lzrw4, but it was never fully implemented.
All of the compression algorithms were originally implemented as
memory to memory compression functions in C, not as complete programs.
I wrote a driver program which divides the input into 1 MB blocks (except lzrw5),
compresses them independently by calling the provided functions, and
writing the compressed size as a 4 byte number followed by the compressed
data. However, compression could be improved by using larger blocks at
the cost of more memory. For lzrw5 the block size is 64K because the program
is not guaranteed to work correctly for larger blocks. It did work on this
benchmark for a 192K block size, but not for 256K. The distribution linked
above uses a 64K block size.
runcoder1
is a free, open source (GPL) file compressor by Andrew Polar, Mar. 30, 2009.
It uses an order 1 model with arithmetic coding. It takes no options.
The program is available as source code (C++) only. For this test
it was compiled with MinGW g++ 3.4.2 with options -O2 -march=pentiumpro
-fomit-frame-pointer -s for 32-bit Vista as noted in note 26.
FastLZ is a free, open source compression
library and file compressor by Ariya Hidayat, announced June 12, 2007 with
no date or version number, and downloaded and tested on June 16, 2007.
It uses byte-aligned LZ77. The software was released
as source code only (in C). For this test it was compiled with MinGW gcc 3.4.5
as suggested by README.TXT (plus -s to strip debugging info):
flzp v1 is a free,
open source file compressor by Matt Mahoney, June 18, 2008. It uses byte-oriented LZP.
The input is divided into blocks such that at least 33 byte values never occur, or 64KB,
whichever is smaller, then uses those bytes to code an end of block symbol plus match
lengths from 2 up to the number of unused bytes - 1. A match length is decoded by
finding the most recent context hash match in a 4 MB rotating buffer and outputting
the bytes that follow. It uses a 1M hash table and an order 4 context hash.
Each block begins with a 32 byte bitmap to distinguish symbols for matches from literals.
flzp can be used as a preprocessor to a low order compressor like fpaq0 or ppmd -o3
to improve compression and speed.
fpaq0 uses a 32-bit carryless arithmetic coder to code binary decisions
and output one byte at a time. fpaq1 uses a 64 bit coder. fpaq0b uses
a 32 bit coder but counts carries and outputs a bit at a time to
achieve greater internal precision. fpaq0s improves on fpaq0b by
using the compressed EOF to encode the uncompressed EOF, unlike the
other models which code an extra bit for each byte to indicate the end.
fpaq02 extends this idea to 64 bits.
All programs except fpaq are C++ source code and compiled as follows
with MinGW 3.4.2 (where %1 is the program name):
fpaq0p by Ilia
Muraviev, Apr. 15, 2007, uses an adaptive order 0 model. Instead of keeping
a 0,1 count for each context, it keeps a probability and updates it by
adjusting by 1/32 of the error. This is faster because it avoids a division
instruction.
fpaqa by
Matt Mahoney, Dec. 15, 2007, is the first implementation of Jarek Duda's
asymmetric binary coder, described in section 3 of
Optimal encoding on discrete lattice with translational invariant constrains
using statistical algorithms, 2007.
The model is based on fpaq0p (adaptive order 0), but with probabilities
modeled with 16 bits resolution (instead of 12) to improve compression.
The source (GPL) can be compiled with -DARITH to substitute the arithmetic coder
from fpaq0 and fpaq0p for the asymmetric coder.
An asymmetric coder has a single N-bit integer state variable x, as opposed to
two variables (low and high) in an arithmetic coder, which allows a lookup
table implementation. In fpaqa, N=10. A bit d (0 or 1) with probability q = P(d = 1)
(0 < q < 1, a multiple of 2-N) is coded:
To reduce the size of the coding tables, q is quantized to R=7 bits on a nonlinear
scale with closer spacing near 0 and 1. The quantization is such that ln(q/(1-q))
is a multiple of 1/8 between -8 and 8.
In the source, N, R, and B are adjustable parameters up to N=12, R=7.
Larger values improve compression at the expense of speed and memory.
fpaqa uses 2N+R+2 + 5*B/4 bytes for compression
and 2N+R+1 bytes for decompression.
fpaqb
(Matt Mahoney, Dec. 17, 2007, updated to ver 2 on Dec. 20, 2007)
is a revision of fpaqa, using the same model, but using an asymmetric
coder that uses direct calculations in place of lookup tables to update
the state. This allows higher precision to improve compression (eliminating
a 0.03% penalty), saving memory, and allowing bytewise I/O (x in range
2N to 2N+8-1 for N=12). Compression
is about the same speed as fpaqa but decompression is 28% faster.
Ver. 2 is faster but maintains archive compatibility with ver. 1.
fpaq0m
by Ilia Muraviev, Dec. 20, 2007,
uses arithmetic coding and 2 order 0 models averaged
together, one with fast update (rate 1/16) and one slow (1/64).
fpaq0mw
by Eugene Shelwien, Dec. 21, 2007, modifies fpaq0m by using a weighted
mix of a fast (1/16) and slow (1/256) adapting order 0 model, where
the weight is adjusted dynamically to favor the better model.
fpaqc
(Matt Mahoney, Dec. 24, 2007) is fpaqb with some optimizations
to the asymmetric coder.
fpaq0pv2
(Ilia Muraviev, Dec. 26, 2007) is a speed optimized version of fpaq0p
with arithmetic coding.
fpaq0r
by Alexander Ratushnyak, Jan. 9, 2008, is an order 0 model with arithmetic
coding. The model is tuned for better text compression. When compiled
with -DSLOWER (fpaq0rs.exe), the arithmetic coder uses higher precision
for better compression with a small speed penalty.
fpaq0f
by Matt Mahoney, Jan. 28, 2008, uses an adaptive order 0 model which
includes the bit history (as an 8 bit state) in each context.
(It is controversial whather this is really "order 0").
It uses arithmetic coding with 16 bit probabilities (rather than 12 bits).
fpaq0f2
by Matt Mahoney, Jan. 30, 2008, uses a simplified bit history consisting
of just the last 8 bits, plus some minor improvements.
fpaq0pv3
by Nania Francesco Antonio, Apr 04, 2008, is compatible with fpaq0p but 20-30% faster.
fpaq0pv4
including fpaq0pv4nc and fpaq0pv4nc0, are speed optimizations by Eugene
Shelwien, Apr. 6, 2008, as discussed
here.
fpaq0pv4 is compatible with fpaq0p but faster. The nc and nc0 variants dispense with the
extra EOF flags in each byte.
fpaq0pv5
by Nania Francesco Antonio, Apr 6, 2008, is a modification to fpaq0pv4.
fpaq0pv4a
including fpaq0pv4anc and fpaq0pv4anc0 are bug fixes to fpaq0pv4 by
Eugene Shelwien, Apr. 7, 2008, as discussed above.
fpaq0pv4b by
Eugene Shelwien, Apr. 18, 2008, replaces the arithmetic coder with
sh_v1m port (uses carries), Windows I/O, and other optimizations as discussed
here.
The Intel-compiled .exe only runs on Intel machines. I tested
fpaq0pv4b1 which was
patched on May 19, 2008 to run on AMD machines.
NTFS disk compression is used in Microsoft
Windows when the "compress files to save disk space"
checkbox is checked in the folder properties dialog box. Disk compression was
introduced in NTFS v1.2 in mid 1995 according to
Wikipedia.
The compression format is called LZNT1. The algorithm is propretary. However, it was
reverse engineered
(in Russian, see also here).
The algorithm is LZSS (similar to lzrw1).
The format consists of groups of 8 symbols each preceded by 8 flag bits packed
into a byte. A 0 bit indicates a literal symbol, which is decoded by copying it.
A 1 bit indicates a 2 byte offset-length pair which is decoded by going back 'offset'
bytes in the output and copying the next 'length'+3 bytes. An offset-length pair
uses a variable number of bits allocated
to the offset (from 4 to 12) depending on the position in the file, and any
remaining bits allocated to the length of the match. A 12 bit offset would
correspond to a 4 KB block on disk.
I tested by copying enwik9 between folders with the compression turned on
in one folder, and compared with times to copy between
two folders both with compression turned off.
I tried each copy twice and took the second time, which was at most 1 second
faster than the first copy. I used the test machine in note 26 running Windows
Vista Home Premium SP1 32 bit with 3 GB memory and
a 200 GB disk between folders on the same partition.
Copying between two uncompressed folders takes 41 seconds. Copying to a compressed
folder takes 51 seconds, or a difference of 10 seconds.
Copying from a compressed folder takes 35 seconds. I estimated 9 seconds for
decompression by assuming that copying the compressed file directly
would take 26 seconds based on its size of 636 MB. (This is probably wrong
because the file would be cached in memory uncompressed, but the alternative
is a negative time for decompression. Copying either the compressed or uncompressed
file to NUL: takes 2 seconds on the second try).
Times were recorded with a watch because timer 3.01 will not time built-in commands
like 'copy'. Task Manager does not show any processes consuming CPU time or memory
during copying. However, memory use should be insignificant (under 16 KB) for
LZSS with 4 KB blocks. Sizes are as reported by right clicking on the compressed
file in Explorer as 'size on disk'. The size of the decompression program is not known.
compact
(man page)
is a file compressor by Colin L. Mc Master, Feb. 28, 1979. It was written in K&R C for
VAX/PDP11 and SUN under Berkeley UNIX. It uses adaptive order-0 Huffman coding.
The (separate) decompression program rebuilds the Huffman tree, so it need not be transmitted.
Neither program takes options. compact deletes the input file and creates an output file
with a .C extension. uncompact deletes the compressed file and restores the original.
compact was later superceded by compress, which gives better compression.
For this test, compact was compiled using the provided Makefile and tested under
Ubuntu Linux. Minor source code corrections were needed to compile under gcc.
However, the decompressor size is based on the original code. A port to Windows would
be possible but would require more source code changes.
A similar program, barfest.exe, compresses the million random digits file to
1 byte, rather than the Calgary corpus. The decompressor size is
455,755 bytes (zipped).
hipp v0.5819
is an experimental command line file compressor with source code available by
Bogatov Roman, Aug. 19, 2005. It uses context mixing with ordinary and optionally sparse
(fixed gap) contexts, using a suffix tree with path compression to store statistics.
The options are /m to specify the memory limit in MB (default /m2048),
/o to specify primary context order, i.e. the depth of the suffix tree
with path compression (default /o256), /do to set max
deterministic order (actual order with path decompression) (default /do256, do >= o),
/so to set the number of sparse contexts (default /so0). Sparse contexts
are useful for binary data but generally not text. Memory usage increases
with the size of the file and with /o and /so (but not /do). Also, if the
memory limit is exceeded then an error occurs. Unfortunately enwik9 cannot
be compressed at all because initialization requires more than 800 MB.
Some results for enwik8:
Unfortunately, the compressor will not accept truncated XML files such as this benchmark.
It can be made to work by appending the following 38 bytes to enwik8 or enwik9
to create a properly formed XML file (a trailing newline is optional but was not used):
In theory, using no compression (-N) would allow XMill to be used as a preprocessor to other
compressors. However, the decompressor will not accept either enwik8 or enwik9 (with closing
tags appended) if processed with -N (reports "corrupt file").
xmill 0.9.1
(Mar. 15, 2004) also fails to decompress enwik9 and fails to decompress either file with -N.
All programs report "malloc failed" on enwik9. The LZP algorithms
use very little memory themselves, but these implementations allocate
input and output buffers all at once. This fails for enwik9 because of
the 2 GB process limit in Windows.
lzp1 is both a compressor and decompressor. To decompress, use -d as
the third argument. lzp2 is a compressor only. There is a source code
decompressor "lzp2d" but I was unsuccessful in compiling it.
It allows an unexplained option "HuffType" which I did not experiment with.
lzp3o2 has a separate decompressor "lzp3o2d.exe" included in the distribution.
This page is maintained by Matt Mahoney, mmahoney (at) cs.fit.edu
Notes about compressors
2. Decompression size and time for pkzip 2.0.4. kzip only compresses.
3. Reported by Ilia Muraviev (author of PX, TC, pimple), June 10-July 18, 2006. Timed on a P4 3.0 GHz, 1GB RAM, WinXP SP2.
4. enwik9 reported by Johan de Bock, May 19, 2006. Timed on Intel Pentium-4 2.8 GHz 512KB L2-cache, 1024MB DDR-SDRAM.
5. Compressed with paq8h (VC++ compile) and decompressed with paq-8h (Intel compile of same source code).
Normally compression and decompression are the same speed.
6. ocamyd 1.65.final and LTCB 1.0 reported by Mauro Vezzosi, May 30-June 20, 2006.
Timed on a 1.91 GHz AMD Athlon XP 2600+, 512 MB, WinXP Pro 2002 SP2
using timer 3.01. ocamyd 1.66.final reported Feb. 3, 2007.
Times are process times.
7. Under development by Mauro Vezzosi, May 24, 2006.
8. Reported by Denis Kyznetsov (author of qazar), June 2, 2006.
9. Reported by sportman, May 24, 2006. Timed on a Intel Pentium D 830 dual core 3.0GHz,
2 x 512MB DDR2-SDRAM PC4300 533Mhz memory timing 4-4-4-12 (833.000KB free),
Windows XP Home SP2. CPU was at 52% so apparently only one of 2 cores was used.
Decompression verified on enwik8 only (not timed, about 2.5 hours).
WinRK compression options: Model size 800MB,
Audio model order: 255,
Bit-stream model order: 27,
Use text dictionary: Enabled,
Fast analyses: Disabled,
Fast executable code compression: Disabled
10. Reported by Malcolm Taylor (author of WinRK), May 24, 2006.
Timed on an Athlon X2 4400+ with 2GB, running WinXP 64. Decompression not tested.
Decompressor size is based on SFX stub size reported by Artyom (A.A.Z.), Sept. 2, 2007,
although it was not tested this way.
11. Reported by sportman, May 25, 2006. CPU as in note 9.
12. Reported by sportman, May 30, 2006. CPU as in 9 (50% utilized).
13. xwrt 3.2 options are -2 -b255 -m250 -s -f64. ppmonstr J options are -o10 -m1650.
14. Reported by Michael A Maniscalco, June 15, 2006.
15. Reported by Jeremiah Gilbert on the Hutter group, Aug. 18, 2006. Tested under Linux on a dual Xeon
1.6 GHz(lv) (overclocked to 2.13 GHz) with 2 GB memory. Time is user+sys (real=196500 B/ns).
16. Reported by Anthony Williams, Aug. 19-22. 2006. Timed on a 2.53 GHz Pentium 4 with 512 MB under WinXP Home SP2.
17. Tested Aug. 20, 2006 under Ubuntu Linux 2.6.15 on a 2.2 GHz Athlon-64 with 2 GB memory. Time is approximate
wall time due to disk thrashing. User+sys time is 153600 ns/byte compress, 148650 decompress.
18. Reported by Dmitry Shkarin (author of durilca4linux), Aug. 22-23, 2006 for durilca4linux_1;
and Oct. 16-18, 2006 for durilca4linux_2. 3 GB memory usage is RAM + swap.
Tested on AMD Athlon X2 4400+, 2.22 GHz, 2 GB memory under SuSE Linux AMD64 v10.0.
durilca4linux_3 reported Feb. 21, 2008 using 4 GB RAM + 1 GB swap. v2 reported Apr. 22, 2008.
v3 reported May 22, 2008.
19. enwik8 confirmed by sportman, Sept. 20, 2006. Compression time 61480 ns/byte timed on a
2 x dual core (only one core active) Intel Woodcrest 2GHz with 1333MHz fsb and 4GB 667MHz CL5 memory under
SiSoftware Sandra Lite 2007.SP1 (10.105). Drystone ALU 37,014 MIPS, Whetstone iSSE3 25,393 MFLOPS,
Integer x8 iSSE4 220,008 it/s, Floating-point x4 iSSE2 119,227 it/s.
20. Reported by Giorgio Tani (author of PeaZip) on Nov. 10, 2006. Tested on a MacBook Pro,
Intel T2500 Core Duo CPU (one core used),
with 512 MB memory under WinXP SP2. Time is combined compression and decompression.
21. enwik9 -8 reported by sportman, Dec. 12-13, 2006. Hardware as note 19. enwik9
decompression not verified. paq8hp7 -8 enwik8 compression was reported as 16,417,650
(4 bytes longer; the size depends on the length of the input filename, which was
enwik8.txt rather than enwik8).
I verified enwik8 -7 and -8 decompression.
22. paq8hp8 -8 enwik9 reported by sportman, Jan. 18, 2007.
paq8hp10 -8 enwik9 on Apr. 2, 2007. paq8hp11 -8 enwik9 on May 10, 2007.
paq8hp12 -8 enwik8/9 on May 20, 2007.
Hardware as in note 19. Decompression verified for enwik8 only.
23. 7zip 4.46a options were -m0=PPMd:mem=1630m:o=10 -sfx7xCon.sfx
24. paq8o8-intel (intel compile of paq8o8) -1, paq8o8z-jun7 (DOS port of paq8o8) -1
reported by Rugxulo on Jun 10, 2008.
Timed on a AMD64x2 TK-53 Tyler 1.7 GHz laptop with Vista Home Premium SP1.
25. paq8o8z -1 enwik8 (DJGPP compile) reported by Rugxulo on Jun 17, 2008.
Tested on a 2.52 Ghz P4 Northwood, no HTT, WinXP Home SP2.
26. Tested on a Gateway M-7301U laptop with 2.0 GHz dual core Pentium T3200
(1MB L2 cache), 3 GB RAM, Vista SP1, 32 bit. Run times are similar to my
older computer.
27. enwik9 size reported by Eugene Shelwien, Mar. 5, 2009.
enwik8 size and all speeds are tested as in note 26.
28. Reported by Eugene Shelwien on a Q6600, 3.3 GHz, WinXP SP3, ramdrive:
bcm 0.06 on Mar. 15, 2009, bcm 0.08 on June 1, 2009.
29. Reported by kaitz (KZ): paq8p3 on Apr. 19, 2009, v2 on Apr. 21, 2009.
30. Reported by Sami Runsas (author of bwmonstr), July 14, 2009. Tested on an Athlon XP 2200 (Win32).
31. Reported by Dmitry Shkarin, July 21, 2009. Tested on a 3.8 GHz Q9650 with 16 GB
memory under Windows XP 64bit Pro SP2. Requires msvcr90.dll.
About the Compressors
.1280 durilca
./DURILCA d EnWiki.dur
./DURILCA e -m1800 -o10 -t2 enwik9
To decompress:
./UnDur EnWiki.dur
./UnDur enwik9.dur
The first step extracts a compressed dictionary. It is organized in a similar
manner to paq8hp2-paq8hp5 in that
syntactically related words and words with the same suffix are grouped together.
Results are reported by the author under Suse Linux 10.0.
I verified enwik8 only (6480 ns/b to compress on a 2.2 GHz Athlon 64 with
2 GB memory under Ubuntu Linux). enwik9 caused disk thrashing.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- -----
durilca'light 0.5 -m650 -o12 21,089,993 178,562,475 1,495,422 x 180,057,897 1227 (fails)
durilca 0.5 -m700 -o12 -t0 19,227,202 162,117,578 74,292 x 162,191,870 4140 (fails)
-m800 -o128 19,321,003 164,298,178 74,292 x 165,372,470 7718 (fails)
-m700 -o12 -t2(3) 18,520,589 (fails) 1,507,312 x 3330 3940
durilca 0.5(Hutter) -m700 -o13 -t2 18,128,339 (fails) 77,295 x 5905
-m1650 -o21 -t2 17,958,687 (fails) 77,295 x 6140 6140
durilca4linux_1 -m700 -o13 -t2 18,128,334 23,375 xd 5950 5880
-m1750 -o12 -t2 18,027,888 146,521,559 23,375 xd 146,544,934 5500 7301 18
-m1750 -o24 -t2 17,949,422 23,375 xd 6190 6780
durilca4linux_2 -m1800 -o10 '-t2(11)' 17,002,831 136,536,189 241,322 xd 136,777,511 4249 4827 18
-m1800 -o10 -t2 16,998,300 136,596,818 241,322 xd 136,838,140 4405 4894 18
durilca4linux_3 v1 -m3600 -o14 -t2 16,356,063 129,933,145 345,957 xd 130,279,102 3649 3715 18
-m1200 -o32 -t2 16,348,796 4170 4178 18
durilca4linux_3 v2 -m3600 -o14 -t2 16,323,581 129,670,441 344,525 xd 130,014,966 3628 3639 18
-m1200 -o32 -t2 16,316,255 4148 4157 18
durilca4linux_3 v3 -m3600 -o14 -t2 16,292,414 129,469,384 339,990 xd 129,809,374 3624 3627 18
-m1200 -o32 -t2 16,285,285 4135 4138 18
-m1500 -o6 -t2 16,517,051 133,674,565 3852
-m1500 -o7 -t2 16,418,799 132,239 495 4006
-m1500 -o8 -t2 16,368,632 131,722,213 4149
-m1500 -o9 -t2 16,335,259 131,549,901 339,990 xd 131,889,891 4261 4344
-m1500 -o10 -t2 16,316,775 131,574,739 4405
-m1500 -o11 -t2 16,306,086 131,707,901 4544
-m1500 -o12 -t2 16,299,411 131,807,298 4554
-m1500 -o14 -t2 16,292,414 132,238,662 4763
-m1500 -o16 -t2 16,289,512 132,516,825 4879
-m1500 -o32 -t2 16,285,285 134,238,759 5440
durilca'kingsize -m13000 -o40 -t2 16,258,380 127,695,666 333,790 xd 128,029,456 1413 1805 31
.1323 paq8hp12any
To compress: paq8hp11 -7 enwik8.paq8hp11 enwik8
To decompress: paq8hp11 enwik8.paq8hp11
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ----
p5 31,255,092 9,298 s 3421 1 6
p6 25,377,998 9,421 s 4190 16 6
p12 24,714,219 9,598 s 4160 16 6
paq1 22,156,982 16,436 s 7800 7790 50
paq6 v2 -8 19,589,267 26,548 s 47624 808
paqar 4.5 -7 18,388,609 414,164 s 118690 119010 470
paq8f -7 18,289,559 34,371 x 68960 854
-8 18,075,265 34,371 x 69170 1693
paq8g -7 17,817,246 804,867 s 44130 854
paq8h -7 17,674,700 147,195,723 801,612 s 147,997,335 56511 57278 854 5
raq8g -7 18,132,399 33,483 x 84555 84793 1089
-8 17,923,022 27,660 x 337430~330000 2095 17
-8 17,923,022 27,660 x 196540~196000 2095 15
paq8hp1 -7 17,566,769 205,783 x 60170 60660 748
-8 17,397,023 142,477,977 205,783 x 142,683,760 63317 1595
paq8hp2 -7 17,390,490 204,557 x 62000 62330 747
-8 17,223,661 141,145,684 204,557 x 141,350,241 65323 1584
paq8hp3 -7 17,241,280 177,477 x 61360 59690 742
-8 17,085,021 139,905,045 177,477 x 140,082,522 63420 1586
paq8hp4 -7 17,039,173 198,525 x ~65000 65110 755
-8 16,889,237 138,188,695 198,525 x 138,387,220 67956 68120 1598
paq8hp5 -7 16,898,402 161,887 x 76300 77710 900 19
-8 16,761,044 137,017,311 161,887 x 137,179,198 ~85153 75162 1787
paq8hp6 -7 16,731,800 138,828,889 166,715 x 138,995,604 74953 73707 941
-8 16,568,451 135,281,289 166,715 x 135,448,004 60865 1807 21
paq8j -7 18,208,284 39,366 s 138030 138260 959
-8 17,991,628 39,366 s 138990 136500 1896
paq8ja -7 18,184,224 39,781 s 148560 143200 993
-8 17,968,233 39,781 s 154700 153990 1965
paq8jb -7 18,180,081 39,982 s 148570 148200 1009
-8 17,964,363 39,982 s 188590 190190 1999
paq8jc -7 18,185,705 40,064 s 150910 152080 1017
-8 17,970,943 40,064 s 224410 234900 2015
paq8hp7a -7 16,592,672 137,441,743 150,678 x 137,592,421 79795 940
-8 16,431,239 150,678 x 76940 77600 1790
paq8hp7 -7 16,579,500 151,633 x 79620 79660 940
-8 16,417,646 133,835,408 151,633 x 133,987,041 66074 1850 21
paq8jd -7 18,158,159 40,460 s 157340 156350 1030
-8 17,943,042 40,460 s 406730 2028
paq8hp8 -7 16,528,353 151,711 x 79580 79970 940
-8 16,372,960 133,271,398 151,711 x 133,423,109 64639 1849 22
paq8k -8 18,239,915 41,881 s 457150 1463
paq8hp9 -7 16,516,789 136,676,674 111,653 x 136,788,327 84529 85957 940
paq8l -6 18,518,485 35,955 x 133910 435
-7 18,168,563 35,955 x 134770 837
-8 17,916,450 35,955 x 136000 136390 1643
paq8hp10 -7 16,490,947 102,256 x 86720 88890 940
paq8hp10any -8 16,335,197 132,979,531 333,925 x 133,313,456 55639 1849 22
paq8hp11 -7 16,459,515 98,851 x 129540 128530 947
paq8hp11any -8 16,304,862 132,757,799 327,608 s 133,085,407 57503 1850 22
paq8hp12 -7 16,381,959 98,745 x 130820 131480 936
paq8hp12any -7 16,381,959 330,700 x 78860 76190 941
-8 16,230,028 132,045,026 330,700 x 132,375,726 56993 1850 22
paq8fthis2 -8 18,075,265 34,846 x 69100 69310 1693
paq8n -8 17,916,420 37,402 x 134880 135480 1643
paq8o -8 17,916,451 42,389 s 135850 135260 1643
paq8osse -8 17,916,451 42,290 s 125260 124570 1778
paq8o3 -8 17,916,450 43,745 s 134580 134530 1636
paq8o4 v1 -8 17,916,450 43,876 s 126780 126560 1636
paq8o6 -8 17,904,721 44,883 s 139530 139520 1712
paq8o7 -8 17,904,756 45,979 s 139140 138530 1574
paq8o8 -8 17,904,756 46,381 s 139370 139150 1574
paq8o8-intel -1 22,260,679 46,381 s 24687 37 24
paq8o8z-jun7 -1 22,260,679 49,085 s 25919 37 24
-1 22,260,680 29639 37 25
paq8o10t -8 17,772,821 50,865 s 144250 143720 1591
decomp8 15,970,425 16,252 xd 78180 936 26
paq8p3 -7 18,044,229 150,709,834 57,288 s 150,767,122 72412 803 29
paq8p3 v2 -7 17,990,788 86891 803 29
-8 17,759,875 87305 1574 29
decomp8b 15,942,290 16,384 xd 74790 934 26
decmprs8 15,932,968 16,720 xd 76080 936 26
Preprocessor Compressor enwik8 dict total dict+enwik8
------------ ---------- ---------- ------- ---------- ---------
paq8hp1 -0 | ppmonstr J -m1650 -o64 18,322,077 81,190 18,403,267 18,403,991
paq8hp2 -0 | ppmonstr J -m1650 -o64 18,266,424 81,190 18,347,614 18,349,587
paq8hp3 -0 | ppmonstr J -m1650 -o64 18,197,797 107,583 18,305,380 18,306,690
paq8hp4 -0 | ppmonstr J -m1650 -o64 18,170,944 107,590 18,278,534 18,280,098
paq8hp5 -0 | ppmonstr J -m1650 -o64 18,154,921 111,935 18,266,856 18,267,556
xml-wrt 2.0 | ppmonstr J -m1650 -o64 18,625,624
xml-wrt 3.0 | ppmonstr J -m1650 -o64 18,494,374
(none) ppmonstr J -m1650 -o16 19,062,555
ppmonstr J -m1650 -o32 19,084,964
ppmonstr J -m1650 -o64 19,098,634
WRT has additional capabilities depending on input, such as skipping encoding if little or no
text is detected. The dictionary format is one word per line (linefeed only) with a 13 line header.
.1440 drt|lpaq9m
Prog Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note
---- --- ---------- ----------- ---- ----------- ---- ---- ---- --- ----
lpaq1 9 19,755,948 164,508,919 6,676 x 164,515,595 3646 3594 1539 CM
lpaq2 9 19,755,471 164,496,295 6,888 x 164,503,183 3260 3354 1539 CM
lprepaq 1.2 9 19,755,989 164,509,300 189,891 x 164,699,191 8696 7888 1582 CM
lpaq3 9 19,580,276 165,600,121 7,514 x 165,607,635 3695 3735 1542 CM
elpaq3 9 19,392,604 160,081,507 7,377 x 160,088,884 3411 3454 1542 CM
lpaq3a 9 19,585,951 165,661,890 12,004 s 165,673,894 4177 4163 1542 CM
lpaq3e 9 19,392,604 160,081,507 12,004 s 160,093,511 3967 3932 1542 CM
lpaq4 9 19,583,905 165,603,612 7,117 x 165,610,729 3693 3697 1542 CM
lpaq4e 9 19,358,662 159,675,213 6,990 x 159,682,203 3383 3422 1542 CM
lpaq5 9 19,455,395 161,410,276 8,382 x 161,418,658 3614 3630 1542 CM
lpaq5e 9 19,078,767 156,194,860 7,841 xd 156,202,701 3428 3605 1542 CM
lpaq6 9 19,562,861 165,224,012 8,848 x 165,232,860 3586 3624 1542 CM
lpaq6e 9 19,054,076 155,943,020 8,866 x 155,951,886 3420 3478 1542 CM
lpaq7 9 19,557,894 162,359,435 9,078 x 163,368,513 3922 3850 1542 CM
lpaq7e 9 19,039,516 155,840,757 8,570 x 155,849,327 3477 3490 1542 CM
lpaq8 9 19,523,803 161,987,713 9,676 x 161,997,389 3682 3718 1542 CM
lpaq8e 9 18,982,007 155,232,477 8,888 x 155,241,365 3424 3475 1542 CM
lpaq1a 9 19,759,778 164,547,926 8,558 x 164,556,484 3462 3423 1540 CM
lpq1 19,888,399 168,467,267 9,151 x 168,476,408 3389 3402 387 CM
drt|lpaq9e 9 18,151,024 145,628,635 110,844 x 145,739,479 3006 2975 1542 CM
drt|lpaq9f 9 18,079,247 144,877,844 110,864 x 144,988,708 2858 2859 1542 CM
drt|lpaq9g 9 18,069,107 144,838,636 110,318 x 144,948,954 2744 2722 1542 CM
drt|lpaq9h 9 18,067,711 144,763,248 110,376 x 144,873,624 2585 2575 1542 CM
drt|lpaq9i 9 18,065,347 144,752,858 110,149 x 144,863,007 2486 2501 1542 CM
drt|lpaq9j 9 18,056,997 144,687,646 110,135 x 144,797,781 2425 2408 1542 CM
drt|lpaq9k 9 18,007,677 144,277,379 110,785 x 144,388,164 2397 2395 1542 CM
drt|lpaq9l 9 17,979,724 144,082,479 110,479 x 144,192,958 2398 2474 1542 CM
drt|lpaq9l 9 17,979,724 144,082,479 110,479 x 144,192,958 2175 2221 1542 CM 26
drt|lpaq9m 9 17,964,751 143,943,759 110,579 x 144,054,338 2107 2151 1542 CM 26
Compressors options enwik8 enwik9 Comp Mem8
------------------- ---------------- ---------- ----------- ---- ----
drt 9i | ppmonstr J -m1650 -r1 -o10 18,185,633 147,936,682 2509 825
-m1650 -r1 -o11 18,166,961 147,899,374 2634 895
-m1650 -r1 -o12 18,152,982 147,907,628 2661 953
-m1650 -r1 -o16 18,142,625 148,306,179 2888 1109
-m1650 -r1 -o32 18,124,722 149,857,650 3361 1371
-m1650 -r1 -o64 18,122,785 151,343,426 3870 1554
-m1650 -r1 -o128 18,130,333 1650
drt 9j | ppmonstr J -m1650 -r1 -o11 18,165,440 147,859,151 2636
-m1650 -r1 -o64 18,120,770 2603
.1489 xwrt | ppmonstr
Compressed size Decompressor Total size Time (ns/byte)
Program/options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------------------------------------------------------------------- ---------- ----------- ----------- ----------- ----- ----- --- ---
xml-wrt 3.0 -l0 -b255 -m255 -3 -s -e20000 | ppmonstr J -m1650 -o10 18,592,499 150,004,636 82,466 sx 150,087,102 3067 2708 1650 PPM
xml-wrt 3.0 -l0 -b255 -m255 -3 -s -e7000 | ppmonstr J -m1650 -o64 18,494,374 82,466 sx 3500 3340 1650 PPM
xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e10000 | ppmonstr J -m1700 -o10 18,794,295 150,651,873 67,309 sx 150,719,182 2715 ~2650 1700 PPM
xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e2300 | ppmonstr J -m1650 -o64 18,625,624 67,309 sx 3550 3360 1650 PPM
xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e10000 | ppmonstr J -m800 -o8 18,863,790 154,223,582 67,309 sx 154,290,891 2820 800 PPM
xml-wrt 1.0 -f800 | ppmonstr J -m800 -o8 19,043,178 154,749,585 56,837 sx 154,806,422 2702 ~2700 800 PPM
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- -----
xml-wrt 1.0|ppmonstr J -f1800 | -m800 -o10 18,965,658 155,066,074 56,837 sx 155,122,911 2905 2809
xml-wrt 1.0|slim23d -f1800 | -m700 -o12 19,163,987 156,734,571 69,453 x 156,804,024 4702 4717
xml-wrt 1.0|ppmd J1 -f1800 | -m256 -o8 -r1 21,128,019 178,154,529 25,917 s 178,180,446 717 722
Program Options enwik8 enwik8.xwrt Ratio Alg
------- ------- ----------- ---------- ------ ---
paq8h -7 17,674,700 18,341,959 1.0378 CM
ppmonstr J -o10 -m800 19,338,065 18,886,224 0.9766 PPM
slim23d -m700 -o10 19,264,094 18,938,602 0.9830 PPM
WinUDA 2.91 mode 3 (194 MB) 20,332,366 20,859,165 1.0259 CM
ppmd J1 -o10 -m256 -r1 21,388,296 20,945,220 0.9793 PPM
uhbc 1.0 -m3 -b100m 20,930,838 21,171,204 1.0115 BWT
M03exp 32 MB 21,948,192 21,583,059 0.9834 BWT
sbc -ad -m3 -b63 22,470,539 22,216,425 0.9887 BWT
WinRAR 3.60b3 -mc7:128t+ -sfxWinCon.sfx 22,713,569 22,457,785 0.9887 PPM
PX 1.0 24,971,871 22,818,070 0.9137 CM
uharc 0.6b -mx -md32768 23,911,123 22,915,299 0.9583 PPM
chile 0.3d-1 -b=40000 23,408,335 22,884,519 0.9776 BWT
cabarc 1.00.0601 -m lzx:21 28,465,607 25,739,214 0.9042 LZ77
WinACE -sfx -m5 30,919,182 27,112,651 0.8769
bzip2 1.0.3 29,008,758 27,339,845 0.9425 BWT
gzip 1.3.5 -9 36,445,248 30,403,738 0.8342 LZ77
pkzip 2.0.4 36,934,712 30,729,525 0.8432 LZ77
thor 0.9a ex 41,670,916 32,586,444 0.7820
compress 4.3d 45,763,941 38,485,494 0.8409 LZW
Original size 100,000,000 52,174,989 0.5217
-f -o7 -o8 -o9 -o10 -o11 -o12 -o16 -o32
--- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
100 155,908,621
200 155,775,164
300 155,653,815
500 154,884,542 155,367,681 155,465,355 155,547,660
600 154,787,455 155,497,645
800 154,749,585
1000 154,909,136 154,794,501 154,951,751 155,122,278 155,306,526 155,409,926 155,948,066 157,901,320
1500 155,092,513 154,895,455 154,999,654 155,073,186 155,306,526 155,301,322
1800 155,191,178 154,924,936 155,036,534 155,066,074 155,366,281 155,297,828
2000 154,998,528 155,296,112
3000 155,379,959
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
xml-wrt 1.0 -f1800 (70,826,140)(532,089,443) (14,818 s)(532,104,261) (115) (103)
+ ppmd J -m256 -o8 -r1 21,128,019 178,154,529 41,653 sx 178,196,182 712 723
xml-wrt 1.0 -f180 (52,174,989)(468,964,104) (14,818 s)(468,978,922) (113) (103)
+ ppmd J -m256 -o8 -r1 20,910,527 178,215,315 41,653 sx 178,256,968 690 699
ppmd J -m256 -o10 -r1 21,388,296 183,964,915 26,835 x 183,991,750 880 895
xml-wrt -f1800 enwik9 | ppmonstr -m800 -o12
-------------------------------------------
(default) 154,924,936
-s 155,040,558
-t 155,421,035
-s -t 155,542,575
xml-wrt -l0 -w -s -c -b255 -m100 -e10000 enwik9
ppmonstr e -o8 -m800 enwik9.xwrt
xml-wrt 2.0 options ppmonstr J enwik9
-------------------------------- ---------- -----------
-l0 -w -s -c -b255 -m100 -e10000 | -m800 -o8 154,223,582
-l0 -w -s -c -b255 -m100 -e8000 | -m800 -o8 154,234,621 (smaller -e)
-l0 -w -s -c -b255 -m100 -e12000 | -m800 -o8 154,239,769 (larger -e)
-l0 -w -s -c -b255 -m50 -e10000 | -m800 -o8 154,259,117 (smaller -m)
-l0 -w -s -c -b255 -m100 -e10000 | -m800 -o7 154,322,272 (smaller -o)
-l0 -w -s -c -b255 -m150 -e10000 | -m800 -o8 154,426,554 (larger -m)
-l0 -w -s -c -b255 -m100 -e10000 | -m800 -o9 154,445,811 (larger -o)
xwrt 3.2 options ppmonstr J opt enwik8 enwik9 program size total Comp Decomp Mem
---------------------- -------------- ---------- ----------- ----------------- ----------- -------- ------- ----
-2 -b255 -m255 -s -f64 -o10 -m1650 18,456,706 148,915,761 52,569s + 26,835x 148,995,165 475+2512 43+2503 1650
-2 -b255 -m255 -s -f64 -o64 -m1650 18,397,126 210+2810 50+2884 1527
.1512 xwrt
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
xml-wrt 2.0 -l6 -b255 -m255 -s -f8 23,199,202 196,914,328 25,354 s 196,939,682 905 70 525 LZ77
xml-wrt 3.0 -l11 -b255 -m255 -f24 19,663,305 165,274,422 40,447 s 165,314,869 4398 4317 416 CM
xwrt 3.2 -l14 -b255 -m96 -s -e40000 -f200 18,679,742 151,171,364 52,569 s 151,223,933 2537 2328 1691 CM
.1514 nanozip
Program Options enwik8 enwik9 zip size Total Comp Deco Cmem Dmem (reported) Alg Note
-------- ----------- ---------- ----------- --------- ----------- ---- ---- ---- ---- ---- ---- --- ----
nz 0.01a -cf 46,381,713 24 24 96 404 404 LZP
-cf -m1500m 46,381,713 417,351,980 266,797 x 417,618,777 26 31 975 978 1476 1476 LZP
-cF 40,733,125 62 43 155 404 404 LZP
-cF -m1500m 40,733,125 359,192,720 359,459,517 63 40 1040 1045 1476 1476 LZP
-cd 33,241,150 127 28 89 422 402 LZ77
-cd -m1500m 33,001,952 292,180,617 292,447,414 156 28 768 687 1546 1474 LZ77
-cD 29,384,997 288 27 282 466 258 LZ77
-cD -m1500m 29,253,158 258,513,190 258,779,987 323 31 1020 693 1314 994 LZ77
-co 21,838,721 391 186 333 431 336 BWT
-co -m1500m 20,503,629 176,470,974 176,737,771 448 221 1667 1160 1810 1294 BWT
-co -m1500m -txt 20,503,629 170,711,387 170,978,184 336 234 1074 1120 1471 1463 BWT
-cO 21,623,801 465 247 333 431 266 BWT
-cO -m1500m 20,306,489 174,770,662 175,037,459 511 269 1378 1135 1810 1294 BWT
-cO -m1500m -txt 20,306,489 169,092,652 169,359,449 393 280 1074 1274 1471 1463 BWT
-cO -m1670m -txt 20,306,489 167,509,921 167,776,718 403 284 1170 1325 1633 1625 BWT
-cc 18,994,349 2975 2910 360 436 435 CM
-cc -m1500m 18,723,413 152,654,332 152,921,129 3147 3091 1556 1556 1524 1523 CM
nz 0.03a -cc -m1670m 18,679,094 151,668,563 263,953 x 151,932,516 3058 3003 1700 1700 1700 1699 CM
nz 0.05a -cf -m1670m 46,381,713 18 22 100 LZP
-cF -m1670m 40,608,638 66 41 164 LZP
-cd -m1670m 31,555,257 96 29 289 LZ77
-cD -m1670m 27,811,031 182 35 170 LZ77
-co -m1670m 20,499,411 351 177 626 BWT
-cO -m1670m 20,302,501 422 240 642 BWT
-cc -m1670m 18,638,419 151,176,555 288,449 x 151,465,004 3032 2975 1668 CM
nz 0.06a -co -m1670m 20,499,412 250 183 441 BWT 26
-cO -m1670m 20,302,502 300 243 457 BWT 26
-cc -m1670m 18,636,515 151,177,510 336,273 x 151,513,783 2143 2137 1670 CM 26
w32c -cc -m1670m 18,754,787 151,295,782 0 xd 151,295,782 2156 2173 1670 CM 26
.1563 WinRK
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- -----
WinRK 3.03 PWCM (800MB +td) 18,612,453 156,291,924 3,017,362 x 159,309,286 68555 CM 10
WinRK 3.03 PWCM 18,612,551 156,349,910 3,017,362 x 159,367,272 102973~90000 CM 9
WinRK 3.03 FPW1 (800MB +td) 19,035,564 24950 10
WinRK 3.03 PWCM (800MB -td) 19,060,620 88310 CM 10
WinRK 3.03 Efficient 21,157,165 5380 PPM 10
WinRK 3.03 Normal (PPMd) 22,322,981 620 PPM 10
WinRK 3.03 PWCM (800MB +td) 18,612,453 156,291,924 99,665 xd 156,391,589 68555 CM 10
.1570 ppmonstr, ppmd, ppms
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
ppmonstr J -m1700 -o16 19,055,092 157,007,383 42,019 x 157,049,402 3574 ~3600
ppmonstr J -m800 -o16 19,230,657 161,496,685 42,019 x 161,538,704 3783 ~3800
ppmd J -m256 -o10 -r1 21,388,296 183,964,915 11,099 s 183,976,014 880 895
ppmd J -m10 -o4 -r0 26,275,353 236,509,791 11,099 s 236,520,890 194 206
ppms J -o5 26,310,248 233,442,414 16,467 x 233,458,881 330 354
-o2 36,866,748 102
-o3 30,242,535 135
-o4 27,030,761 246
-o6 26,644,863 449
-o7 27,028,318 492
-o8 27,343,283 532
.1598 slim
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
slim23d -m1700 -o12 19,077,276 159,772,839 69,453 x 159,842,292 5232 ~5400
slim23d -m700 -o32 19,226,339 (failed) 69,453 x 6530 6770
slim23d -m700 -o10 19,264,094 162,529,098 69,453 x 162,598,551 5175 5360
.1605 bwmonstr
bwmonstr 0.00 is a free, experimental,
closed source
file compressor by Sami Runsas, Mar. 10, 2009. It uses BWT. The program takes no
options. It loads the input file into a single block and allocates 1.25 times the
block size in memory for either compression or decompression (like BBB).
Thus, it is able to transform enwik9 in a single block.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- --- ----
bwmonstr 0.00 20,401,888 161,249,951 27,772 x 161,277,723 15638 13028 1224 BWT 26
bwmonstr 0.01 20,379,365 161,026,258 32,163 x 161,058,420 15695 14135 1224 BWT 26
bwmonstr 0.02 20,307,295 160,468,597 69,401 x 160,537,998 331801 156147 590 BWT 30
reorder2|bwmonstr 0.02 20,229,555 590 BWT 30
drt|bwmonstr 0.02 19,750,461 450 BWT 30
.1640 bbb
bbb ver. 1
is a free, open source (GPL) command line file compressor by Matt Mahoney, Aug. 31, 2006.
It uses a memory efficient BWT allowing blocks up to 80% of available memory.
The transformed data is compressed with an order 0 PAQ like model: the previous
bits of the current byte are mapped first to a bit history, then through a 6 level
probability correcting adaptive chain before bitwise arithmetic coding.
g++ -Wall -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o bbb.exe
upx bbb.exe
bbb cfm100 foo foo.bbb (c = compress, f = fast, m100 = 100 MB blocks)
bbb df foo.bbb foo.out (d = decompress, f = fast)
If the file is 20% to 80% of available memory, use one block in slow mode. If foo
is 500 MB:
bbb cm500 foo foo.bbb
bbb d foo.bbb foo.out
If the file is over 80% of memory, use 80% of memory as the block size in slow mode.
If foo is 1 GB:
bbb cm640 foo foo.bbb
bbb d foo.bbb foo.out
The model requires about an additional 6 MB that should be subtracted
from available memory.
Block enwik8 enwik9 Gain Comp ns/b
---- ---------- ----------- ---- ----
101 66,414,034 646,449,572 4359
102 56,241,619 542,912,447 .191 2169
103 45,500,201 435,597,745 .246 1907
104 37,006,646 343,663,203 .267 1802
105 30,946,413 275,172,983 .249 1838
106 26,661,555 233,555,297 .178 2095
107 23,460,457 204,355,672 .142 2499
108 20,847,290 182,162,626 .122 3106
109 20,847,290 164,032,650 .110 4524
.1651 paq9a
paq9a is a free,
open source, command line archiver by Matt Mahoney, Dec. 31, 2007.
It is a context mixing compressor with an LZP preprocessor to improve speed
for highly redundant files. Matches to a context length of 12 or more are
coded as 1 bit, and literals as 9 bits. Context mixing differs from paq8 in
that it uses a chain of 2-input mixers rather than one mixer with many inputs.
It mixes sparse order-1 contexts with gaps of 3, 2, 1, 0, then orders 2 through 6,
then text word orders 0 and 1. Option -9 selects maximum memory.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
paq9a -9 19,974,112 165,193,368 13,749 s 165,207,117 3997 4021 1585 CM
.1662 uda
.1664 nanozipltcb
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ---------------- ---------- ----------- ----------- ----------- ----- ----- --- ---
nanozip 0.01 -cO -m1670m -txt 20,306,489 167,509,921 266,797 x 167,776,718 403 284 1325 BWT
nanozipltcb 20,494,670 166,251,135 239,124 x 166,490,259 348 185 1729 BWT
.1686 bcm
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
bcm 0.03 22,007,655 192,194,478 67,988 x 192,262,466 517 437 164 BWT 26
bcm 0.04 21,450,604 185,368,446 69,553 x 185,455,999 578 486 329 BWT 26
bcm 0.05 -b327680 20,770,671 172,180,796 69,040 x 172,249,836 684 535 1642 BWT 26
-b406991 171,857,720 69,040 x 171,926,760 2030 BWT 27
bcm 0.07 -b327680 20,770,673 172,180,037 60,990 x 172,241,027 818 578 1642 BWT 26
-b488282 169,396,680 60,990 x 169,457,670 472 341 2440 BWT 28
bcm 0.08 e370 20,744,613 171,891,509 61,666 x 171,953,175 948 709 1900 BWT 26
e477 20,744,613 169,179,098 61,666 x 169,232,764 545 418 2385 BWT 28
reorder_v2|bcm 0.08 e477 20,677,205 168,694,909 80,149 x 168,775,058 548 422 2385 BWT 28
reorder_V2|bcm 0.08 e477 xlt 20,665,536 168,598,121 80,661 x 168,678,782 552 420 2385 BWT 28
.1727 cmm4
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Opt enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- --- ---------- ----------- ----------- ----------- ----- ----- --- ---
cmm1 23,495,627 207,266,867 18,785 x 207,285,652 1165 1198 50 CM
cmm2 23,477,008 208,268,161 17,901 x 208,286,062 1756 1849 32 CM
cmm2 080113 22,303,128 191,477,052 18,263 x 191,495,315 2180 2127 329 CM
cmm3 080207 21,212,766 179,633,451 18,700 x 179,652,151 2328 ~2609 395 CM
cmm4 v0.0 21,459,665 186,395,591 18,042 x 186,413,633 1807 1849 116 CM
cmm4 v0.1e 96 20,569,034 172,669,955 31,314 x 172,701,269 2052 2056 1321 CM
.1741 ccm
ccm 1.03a
is one
of 3 versions of a free file compressor by Christian Martelock, Feb. 11, 2007.
It uses context mixing. The 3 versions are ccm (fastest, uses 17 MB memory),
ccm_high (slower but better compression), and ccm_extra (best compression, uses
100 MB memory). The programs take no options.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ---------- ----------- ----------- ----------- ----- ----- --- ---
ccm 1.0.3a 27,667,346 240,296,736 7,217 x 240,303,953 676 679 17 CM
ccm_high 1.0.3a 25,412,726 221,177,776 7,229 x 221,185,005 1119 1171 17 CM
ccm_extra 1.0.3a 24,027,805 207,273,926 7,230 x 207,281,156 1341 1353 100 CM
ccm 1.1.1a 22,824,629 197,271,467 9,019 x 197,280,486 1247 1252 82 CM
ccm 1.1.2a 22,675,768 195,965,427 8,502 x 195,973,929 1161 1183 83 CM
ccm 1.20a 21,350,295 182,784,655 13,346 x 182,798,001 1794 1801 210 CM
ccmx 1.20d 21,310,303 182,379,461 13,468 x 182,392,929 1383 1485 210 CM
ccmx 7 1.21 20,819,656 174,161,536 21,139 x 174,182,675 1521 1493 1324 CM
ccmx 7 1.30 20,857,925 174,142,092 15,014 x 174,157,106 1313 1338 1332 CM
.1744 bit
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ----
bit 0.1 31,186,930 271,705,328 35,400 x 271,740,728 535 83 35 ROLZ
bit 0.2b -m lwcm -mem 9 21,971,587 189,881,180 63,665 x 189,944,845 2708 2747 1052 CM
bit 0.7 -p=5 20,823,204 174,425,039 62,493 x 174,487,532 2050 2100 663 CM 26
.1745 mcomp
LibMComp Demo Compressor (v2.00).
Copyright (c) 2008 M Software Ltd.
mcomp [options] pofile(s)
Options:
-m[..] Compression method:
b - BZIP2.
c - Experimental DMC codec.
d - Optimised deflate (df - fast, dx - max)
d64 - Optimised deflate64 (d64f - fast, d64x - max)
lz - Optimised LZ (lzf - fast, lzx - max)
f - Optimised ROLZ (ff - fast, fx - max)
f3 - Optimised ROLZ3 (f3f - fast, f3x - max)
p - PPMd var.J.
sl - Bitstream (LSB first).
sm - Bitstream (MSB first).
w - Experimental BWT codec.
-MNN[k,m] Model size (in kb (default) or Mb, default 64M).
-oNN Order (for Bitstream and PPMd).
-np Display no progress information.
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ----
mcomp_x32 -mb 29,997,076 2070 970 4 BWT -M has no effect
-mc 23,546,185 1350 1410 50 DMC
-mc -M512m 22,561,089 1520 322 DMC max memory
-mdf fails
-md 35,436,114 2140 1421 4 LZ77 fails
-mdx 35,383,881 2240 1420 4 LZ77 fails
-md64f fails
-md64x 32,983,178 28930 1310 4 LZ77 fails
-mlz 24,648,445 3090 50 595 LZ77
-mf 24,331,132 2240 78 149 ROLZ
-mf -M1800m 23,187,091 3320 77 414 ROLZ
-mfx -M1800m 23,182,541 3410 81 414 ROLZ
-mf3x -M1800m 23,098,116 3850 112 415 ROLZ
-mp -M1800m -o10 21,039,213 177,948,781 172,531 x 178,121,312 4580 12180 1847 PPM
-mp -M1800m -o12 20,917,657 179,193,238 172,531 x 179,365,769 5180 1847 PPM
-mp -M1800m -o16 20,868,127 181,150,814 172,531 x 181,323,345 5750 1847 PPM
-msl -M1800m -o12 54,428,147 6510 6480 1 CM? -M has no effect
-msm 59,731,673 5880 5810 1 CM? -M has no effect
-mw 21,805,857 188,095,082 172,531 x 188,267,613 356 232 660 BWT 2 cores
-mw -M180m 21,103,670 179,838,392 172,531 x 180,010,923 329 284 1850 BWT 2 cores
-mw -M320m 21,103,670 174,388,351 172,531 x 174,560,882 473 399 1643 BWT 1 core
.1749 epmopt | epm
epmopt -m800 -n20 --fixedorder:12 enwik6 .
epm c01286014321245957352513 enwik9 enwik9.epm -m800
epm d01286014321245957352513 enwik9.epm enwik9.tmp -m800
The optimization data was enwik6, the first 106 bytes
of the input file. epmopt compressed this about 100 times in
368 seconds with different options, making 35 passes through
the list of 20 undocumented parameters, adjusting each one up
or down one at a time. The fixed parameters
were -m800 (800 MB memory limit) and PPM order 12 (--fixedorder:12,
also the first 3 digits of the parameter string. Allowing epmopt
to set the PPM order on a smaller training file will cause it to
choose too large a value, hurting compression. I only tested
orders 10, 12, and 20 on enwik8 and 12 gave the best compression).
The -n20 option tells epm to tune all 20 parameters. The parameter
string is written to the file enc.ini. The -m800 option need
not be the same for epmopt and epm but must be the same
for epm during compression and decompression.
.1749 WinUDA
.1755 dark
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
dark 0.32b July 9, 2006 -b128m 21,414,479 185,844,554 31,076 x 185,875,590 481 407 790 BWT
dark 0.40b Aug. 14, 2006 -b128mf0 21,243,259 184,271,115 34,688 x 184,305,803 471 316 790 BWT
dark 0.46 Aug. 23, 2006 -b160mf0 21,231,325 181,904,374 40,780 x 181,945,154 488 404 813 BWT
-b333mf0 21,231,325 175,955,412 40,780 x 175,996,192 432 425 1692 BWT
opendark A Nov. 14, 2006 -b333m 21,432,727 (fails) 10,089 s 450 390 1692 BWT
-b127m 21,432,727 185,985,101 10,089 s 185,995,190 389 331 652 BWT 26
dark 0.51 Jan. 2, 2007 -b333mf 21,169,819 175,471,417 34,797 x 175,506,214 533 453 1692 BWT
.1760 FreeArc
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
FreeArc 0.36 -m9 -lc1600000000 21,153,231 184,498,111 372,457 s 184,870,568 665 517 1600 PPM
FreeArc 0.40 pre-4 -mppmd:1012m:o13:r1 20,931,605 175,254,732 748,202 x 176,002,934 1175 1216 1046 PPM
.1766 hook
hook v0.2 is a free,
open source (GPL) command line file
compressor by Nania Francesco Antonio, Jan. 8, 2007. It uses DMC: a state machine
in which each state represents a bitwise context. Each state has 2 outgoing
transitions corresponding to next bits 0 and 1, and a count n0 or n1 associated
with each transition. Bit y (0 or 1) is compressed by arithmetic coding with probability
ny/(n0+n1) (where ny is n0 or n1 according to y), and then ny is incremented.
n0 ----> 1100 n0*(1-w) ----> 1100
ny / / /
1111 -----> 110 1111 110 /
(y=0) \ | \ /
n1 ----> 1101 | n1*(1-w) ----> 1101
| / /
| n0*w / /
| ny / /
+----> 11110 /
\ /
n1*w --
Before cloning After cloning 110 to 11110
hook08 params enwik9
------------ -----------
c 1700 1 1 6 183,175,857
c 1700 2 1 6 181,578,888
c 1700 3 1 6 181,220,553
c 1700 4 1 6 181,268,867
c 1700 5 1 6 181,197,310
c 1700 6 1 6 181,567,697
c 1700 7 1 6 181,813,763
c 1700 8 1 6 182,360,391
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
hook v0.2 c 7 2 6 23,628,061 208,211,084 2,556 s 208,213,640 772 779 1052 DMC
hook v0.3 c 9 2 6 23,548,017 202,024,740 3,567 s 202,028,307 849 864 1764 DMC
hook v0.3a c 9 2 6 23,499,700 201,934,976 3,555 s 201,938,531 862 832 1764 DMC
hook v0.4 c 9 2 6 23,349,695 199,829,234 4,112 s 199,833,346 934 959 1764 DMC
hook v0.5b c 1600000000 2 64 1 6 22,806,402 193,227,085 5,113 s 193,232,198 1084 1029 1764 LZP+DMC
hook v0.6 c 1600 4 1 6 22,472,884 191,733,561 5,112 s 191,738,673 1146 1034 1600 LZP+DMC
hook v0.6b c 1600 4 1 6 22,535,069 189,932,778 5,174 s 189,937,952 1040 1600 LZP+DMC
c 1600 6 1 6 22,776,927 188,384,238 5,174 s 188,389,412 1090 1026 1600
hook v0.6c c 1600 6 1 6 22,561,621 188,081,694 5,878 s 188,087,572 1131 1092 1600 LZP+DMC
hook v0.7 c 1000 6 1 6 22,410,669 191,516,313 6,195 s 191,522,508 1360 1353 1375 LZP+DMC
hook v0.7b c 1700 6 1 6 22,404,817 184,765,030 6,195 s 184,771,225 1516 1655 1794 LZP+DMC
hook v0.8 c 1700 5 1 6 22,290,033 181,197,310 6,686 s 181,203,996 1110 1118 1700 LZP+DMC
hook v0.8b c 1700 5 1 6 22,399,354 180,335,788 6,944 s 180,342,732 988 1033 1700 LZP+DMC
hook v0.8c c 1700 5 1 6 22,399,355 180,335,789 7,071 s 180,342,860 1043 1005 1700 LZP+DMC
hook v0.8d c 1700 5 1 6 22,399,027 180,319,203 7,037 s 180,326,240 928 915 1700 LZP+DMC
hook v0.8e c 1700 3 1 6 22,039,935 178,140,788 7,263 s 178,148,051 952 1009 1700 LZP+DMC
hook v0.9 c 1800 2 1 6 21,969,342 178,932,435 10,069 x 178,942,435 869 1860 LZP+DMC
c 1800 3 1 6 22,077,883 178,599,478 10,069 x 178,609,547 833 916 1860 LZP+DMC
freehook 0.2 c 1700 3 1 6 22,039,914 178,141,036 7,386 s 178,148,422 813 855 1860 LZP+DMC
hook v0.9b c 1700 3 1 6 22,496,910 180,582,601 9,278 x 180,591,879 810 810 1721 LZP+DMC
freehook 0.3 c 1600 3 1 6 22,039,914 178,619,149 7,352 s 178,626,501 789 818 1713 LZP+DMC
hook v0.9c c 1700 3 1 6 22,496,910 180,582,601 8,506 x 180,591,107 774 791 1721 LZP+DMC
hook v1.0 c 1700 22,122,484 177,843,658 11,163 x 177,854,821 865 879 1739 LZP+DMC
hook v1.1 c 1700 22,122,484 177,843,658 25,854 x 177,869,512 877 872 1739 LZP+DMC
hook v1.3 c 1700 22,030,108 178,216,980 13,870 x 178,230,850 825 835 1736 LZP+DMC
hook v1.4 c 1700 21,990,502 176,648,663 37,004 x 176,685,667 741 695 1777 LZP+DMC
.1789 7zip
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- -----
7zip 4.42 -m0=ppmd:mem=768:o=10 -sfx7xCon.sfx 21,375,060 185,043,783 0 xd 185,043,783 505 ~500 PPM
7zip 4.42 -m0=ppmd:mem=293m:o=7 21,791,628 647 655 PPM 6
7zip 4.42 -mx=9 -sfx7zCon.sfx 24,996,113 213,490,979 0 xd 213,490,979 2286 63 LZMA
7zip 4.42 -tbzip2 -mpass=2 29,003,844 1974 176 BWT 6
7zip 4.42 -tzip -mm=deflate64 -mfb=153 -mpass=8 33,727,442 2803 28 LZ77 6
7zip 4.42 -tzip -mm=deflate -mfb=171 -mpass=8 35,056,389 2672 27 LZ77 6
7zip 4.42 -tzip -mm=deflate -mfb=258 -mpass=8 35,057,040 2664 29 LZ77 6
7zip 4.42 Zip/Ultra (in GUI) 35,057,347 4307 LZ77 1
7zip 4.46a -m0=ppmd:mem=1630m:o=10 -sfx7xCon.sfx 21,197,559 178,965,454 0 xd 178,965,454 503 546 PPM
7zip 4.46a was announced May 21, 2007.
(The improved compression is due to testing with more memory).
.1789 M99
M99.exe e|d -switches blocksize input output
switches are:
-r = post BWT run length encoding
-a = arithmetic coding instead of M99 style bit packing
-f = fast mode
-m = max compression mode (implies -a).
Blocksize can be specified in bytes (like 10000), kb, mb etc as 100m or 100k.
Memory requirement for compression is 6 times the blocksize maximum, although in most cases only
a little over 5 times blocksize is used. Blocksize 239m divides enwik9 into 4 approximately
equal parts and requires about 1500 MB memory.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
M99 e -m 239m 21,431,211 180,477,144 67,697 x 180,544,841 674 496 1500 BWT
M99 v2.1 e -m 239m 21,251,170 178,910,174 68,052 x 178,978,226 713 535 1500 BWT
M99 v2.2.1 e -m 239m 21,251,171 178,910,175 72,245 x 178,982,420 704 520 1500 BWT
.1803 pimple2
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
pimple 1.43 beta 512MB, order 8, match 32 20,992,830 181,998,817 353,472 x 182,352,259 9638 10112 512 CM 3
pimple2 (none) 20,871,457 180,251,530 78,642 x 180,330,172 18474 17992 128 CM
.1807 ash
ash04a options enwik9 Comp (ns/byte)
---------- ----------- ----
/m700 /o8 (order 7) 180,830,523 5883
/m700 /o10 (order 9) 180,735,542 6011
Note: the acutal memory usage (commit charge) for enwik9 /m700 /o8
was 1910 MB at the end of compression, minus 257 MB for
other programs, according to Windows task manager.
This is generally not a problem if your
swap file is large enough. It appears to be a slow memory
leak (recovered when program exits) and does not cause thrashing.
.1823 ocamyd
ocamyd 1.65.final is a free,
open source command line file compressor by Frank Schwellinger, May 25, 2006.
It uses DMC. The -s0 selects slowest (maximum) compression. The -m8 option
selects 800 MB memory (maximum is -m9 = 900 MB).
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
ocamyd 1.65.final -s0 -m8 21,456,536 185,727,437 20,618 x 185,748,055 50782 50935 800 DMC
ocamyd LTCB 1.0 -s0 -m3 21,285,121 182,359,986 21,030 x 182,381,016 108960~110000 300 DMC 6
ocamyd 1.66.final -s0 -m3 -f 21,123,280 182,410,035 20,636 x 182,430,561 59130 59637 300 DMC 6
Options enwik8 Comp Decomp Notes
------- ---------- ----- ----- -----
-s0 -m8 21,456,536 42030 42010
-s0 -m4 22,073,527 70482 70538 6 (400 MB) (~101015 ~92921 global time)
-s1 -m4 23,944,647 ~33535 6
-s2 -m4 26,345,297 ~1940 6
-s3 -m4 28,060,900 ~1826 6
-s0 -m3 22,296,826 ~70960 6 (300 MB)
-s1 -m3 24,114,574 ~33818 6
-s2 -m3 26,911,154 ~1603 6
-s3 -m3 28,278,662 ~1514 6
-s0 -m2 22,688,950 ~70172 6 (200 MB)
-s1 -m2 24,511,065 ~33771 6
-s2 -m2 27,614,083 ~1562 6
-s3 -m2 28,928,850 ~1448 6
-s0 -m1 23,487,047 ~68522 6 (100 MB)
-s1 -m1 25,280,406 ~33277 6
-s2 -m1 29,045,902 ~1509 6
-s3 -m1 30,080,719 ~1408 6
-s0 -m0 24,210,216 ~66463 6 (64 MB)
-s1 -m0 25,882,226 ~33121 6
-s2 -m0 30,591,255 ~1481 6
-s3 -m0 31,276,535 ~1377 6
.1824 bee
.1829 uhbc
uhbc 1.0 is
an experimental, closed source command line file compressor
by Uwe Herklotz, June 30, 2003. It uses BWT. The -b100m option
selects 100 MB block size, which requires 800 MB for compression
and 500 MB for decompression. -m3 selects maximum compression
for the entropy coding stage, which consists of run length coding
(RLE) + DWFC (double weighted frequency counting) + entropy coding.
WFC is described in
Deorowicz, S.,
Improvements to Burrows–Wheeler compression algorithm,
Software–Practice and Experience, 2000; 30(13):1465–1483.
Options enwik8 size Comp Decomp (ns/byte)
----------------------------------------- ----------- ---- ------
-m3 -b100m (one 100 MB block) 20,930,838 1145 858
-m3 (default block size is 5 MB) 24,296,345 914 733
-m2 (RLE + WFC + entropy coding, default) 24,411,843 806 644
-m2 -cp (prefix sort, default is suffix) 24,589,110 813 578
-m1 (RLE + MTF (move to front) + entropy) 25,021,683 680 547
-m0 (RLE + direct entropy coding) 25,341,274 603 500
.1839 ppmd
See ppmonstr (above).
.1849 tc
Compressed size Decompressor Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
tc 5.0 dev 1 (May 26 2006) 33,774,535 295,836,604 23,681 x 295,860,285 236 204 LZP 3
tc 5.0 dev 2 (June 10 2006) 32,417,139 283,039,249 22,659 x 283,061,908 270 244 LZP 3
tc 5.0 dev 4 (June 21 2006) 32,417,139 283,039,249 22,496 x 283,061,745 224 206 LZP 3
tc 5.0 dev 6 (July 6 2006) 29,544,971 257,416,397 28,528 x 257,444,925 279 279 PPM 3
tc 5.0 dev 7 (July 9 2006) 28,111,955 250,077,573 30,058 x 250,107,631 285 325 20 PPM 3
tc 5.0 dev 9 (July 18 2006) 27,801,253 246,923,158 30,106 x 246,953,264 363 385 24 PPM 3
tc 5.0 dev 11 (July 24 2006) 27,293,396 242,199,762 31,074 x 242,230,836 446 393 56 PPM 3
tc 5.1 dev 1 (Oct. 1 2006) 31,708,176 280,007,538 26,578 x 280,034,116 289 154 25 LZ
tc 5.1 dev 2 (Oct. 2 2006) 31,155,963 274,831,393 24,620 x 274,856,013 344 147 25 LZ
tc 5.1 dev 5 (Oct. 13 2006) 28,567,681 247,853,181 26,659 x 247,879,840 951 439 148 CM
tc 5.1 dev 7 (Dec. 18 2006) 27,934,960 241,898,216 40,104 x 241,938,320 1864 639 148 CM
tc 5.1 dev 7x (Jan. 13 2007) 27,888,899 241,088,655 41,265 x 241,129,920 1974 638 609 CM
tc 5.2 dev 2 (Feb. 7 2007) 21,481,399 184,939,711 41,112 x 184,980,823 3637 3655 230 CM
.1862 ppmvc
.1869 chile
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
chile 0.3d-1 -b40000 23,408,335 203,451,387 11,298 s 203,462,685 4957 435 785 BWT
chile 0.4 -b=244141 22,218,917 186,979,614 11,530 s 186,991,144 2513 512 1426 BWT
.1910 CTFx
.1911 rings
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---
rings 0.1 35,693,969 314,161,660 11,271 x 314,172,931 187 179 16 LZP
rings 0.2 35,693,969 314,161,660 25,832 x 314,187,492 192 167 16 LZP
rings 0.3 35,151,555 309,179,126 32,132 x 309,211,258 188 154 16 LZP
rings 1.0 26,384,013 235,897,616 25,585 x 235,923,201 221 321 50 CM
rings 1.1 26,793,247 238,353,988 27,513 x 238,381,501 151 255 50 CM
rings 1.2 25,873,235 229,695,548 30,484 x 229,726,032 120 175 50 CM
rings 1.3 25,873,235 229,695,548 43,329 x 229,738,877 104 163 54 CM
rings 1.4c 9 24,591,826 217,427,384 39,149 x 217,466,533 103 287 789 BWT
rings 1.5 9 21,848,093 191,067,972 44,565 x 191,112,537 172 189 426 BWT
.1912 M03exp
Block size enwik8 Comp Decomp (ns/byte approx)
---------- ---------- ---- ------
8 MB 23,461,984 3860 1840
32 MB 21,948,192 4800 2100
.1930 Stuffit
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- -----
Stuffit 9.0.0.21 Method 4 (best text) 24,310,583 210,801,103 1,015,808 x 211,816,911 542 503 36 12
Method 6 (auto-pick best) 24,419,299 212,392,465 1,015,808 x 213,408,273 2149 68 12
Stuffit 12.0.0.17 -m=1 -l=16 -x=30 25,926,107 2540 420 298 LZ77
-m=2 -l=16 -x=27 24,874,987 3080 90 881 LZ77
-m=8 -l=16 -x=30 25,574,676 560 230 229 BWT
-m=4 -l=16 -x=28 23,482,855 730 694 274 PPM
-m=4 -l=16 -x=29 22,744,155 770 720 537 PPM
-m=4 -l=16 -x=30 22,105,654 190,372,707 2,658,122 xd 193,030,829 628 658 1062 PPM
Stuffit 13.0.0.19 -m=4 -l=16 -x=30 22,105,658 190,372,711 21,611,401 x 211,984,112 567 604 1060 PPM 26
.1936 ppmx
ppmx 0.01
is a free, experimental, closed source file compressor by Ilia Muraviev,
released Nov. 25, 2008. It uses PPM with no filters. It takes no options.
Compressed size Decompressor Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
ppmx 0.01 24,369,312 213,206,926 51,454 x 213,258,380 557 515 550 PPM 26
ppmx 0.02 22,580,291 194,298,469 53,511 x 194,351,980 874 888 609 PPM 26
ppmxcore2duo 0.02 22,580,291 55,824 x 871 949 609 PPM 26
ppmx 0.03 22,572,808 193,643,464 54,964 x 193,698,428 777 784 609 PPM 26
ppmx 0.04 23,150,510 201,384,355 52,406 x 201,436,761 791 801 280 PPM 26
.1956 enc
enc 0.15 is an experimental,
closed source command line archiver by Serge Osnach, Feb. 14, 2003. It uses PPM and CM (in PaQ mode).
It tries up to 5 different compression
methods (depending on options) and chooses the best one. The methods are ("a" means "add to archive"):
Methods ae and ab with options -o8 -d256 were found to give the best compression on enwik7 (first 107
bytes). These methods discard the model when the memory limit is reached, and this was observed to happen
(in task manager), so these options should hold for larger files. However with -d127 (necessary to decompress),
method aq gives the best compression.
.1971 sbc
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
sbc 0.970r2 -ad -m3 -b63 22,470,539 197,066,203 99,094 xd 197,165,297 1733 313
sbc 0.970r2 -ad -m1 -b31 23,288,217 99,094 xd 620 230
sbc 0.970r2 -ad -m1 -b1 27,087,118 99,094 xd 300 180
.1984 WinRAR
WinRAR 3.60 beta 3 is a commercial (free trial)
Windows GUI and command line archiver by Eugene Roshal, May 8, 2006.
It produces rar and zip archives
and decompresses many other formats. It also encrypts and performs other functions.
The best compression mode uses PPM (actually ppmd var. I, an earlier version of ppmd J)
with optimizations for text and other
formats (exe, wav, bmp). The -mc7:128t+ option says to use PPM order 7,
128 MB memory (maximum) and force text preprocessing. The -sfxWinCon.sfx
option says to produce a self extracting console executable
(adding 79,360 bytes).
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- -------------------------- ---------- ----------- ----------- ----------- ----- -----
WinRAR 3.60b3 -mc7:128t+ -sfxWinCon.sfx 22,713,569 198,454,545 0 xd 198,454,545 506 415
WinRAR 3.60b3 -mc10:128t+ -sfxWinCon.sfx 23,233,523 0 xd 770
WinRAR 3.60b3 -m5 -sfxWinCon.sfx 24,832,649 0 xd 680 520
WinRAR 3.60b3 -sfxWinCon.sfx 29,828,890 0 xd 780 40
WinRAR 3.60b3 29,749,530 98,888 xd 780 40
.1986 quark
quark v0.95r beta is a free,
closed source command line file compressor by Frederic Bautista, Mar. 10, 2006.
It uses LZ. It is characterized by high compression and fast decompression.
The -m1 option selects relative mode compression, which is normally best, but slowest. The
-d25 option selects a dictionary size of 225 which is the largest that will
run without thrashing with 1 GB RAM. The -l8 option selects the search depth.
Higher values normally improve compression (up to -l13, default -l4), but -l8 was the highest
practical value for reasonable compression speed (7.5 hours). Also, larger values were
found to hurt compression on enwik5.
Compression time increases approximately exponentially with the -l value.
The compression speed with -l13 is 6,100,000 ns/byte.
.2018 bssc
bssc 0.95a is a free command line file compressor
by Sergeo Sizikov, 2005.
It uses BWT. The -m16383 option selects the maximum block size of 16383 KB (uses 140 MB memory).
.2079 M1
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- -----
M1 0.2a 24,656,008 219,115,069 25,336 s 219,140,405 452 447 33 CM 26
M1 0.3 24,004,989 215,101,056 24,596 s 215,125,652 395 404 33 CM 26
M1 0.3b text2.txt 23,506,215 209,057,165 23,150 s 209,080,315 377 403 33 CM 26
M1 0.3b text.txt 23,558,990 360 390 33 CM 26
M1 0.3b e8-m103b1-mh 23,456,037 207,931,967 23,150 s 207,955,117 383 412 33 CM 26
.2081 uharc
uharc 0.6b is a free (for noncommercial
use) closed source command line archiver by Uwe Herklotz, Oct. 1, 2005.
In maximum compression mode (-mx) it uses PPM. In modes -m1 (fastest) to
-m3 (best) it uses ALZ: LZ77 with arithmetic coding. -mz uses LZP.
-md32768 selects maximum dictionary size (uses 50 MB memory, default is -m4096).
Additional results for enwik8:
Options enwik8 Comp Decomp (ns/byte)
------- ---------- ---- ------
-mx -md32768 23,911,123 1830 1510
-mx 23,952,039 1832 1546
-m3 27,957,245 1840 110
-m2 28,459,084 1726 110
-m1 29,660,279 1242 121
-mz 30,429,795 191 236
.2090 GRZipII
GRZipII
0.2.4 is a free, open source (LGPL)
command line file compressor by Grebnov Ilya, Feb. 12, 2004. It uses BWT.
The -b8m option selects the maximum block size of 8 MB.
.2091 4x4
4x4
0.2a
is a free, open source file compressor by Bulat Ziganshin,
June 2, 2008. It is a wrapper around GRZipII, tornado, and LZMA (7zip),
and a subset of the FreeARC archiver.
Source code is included in the FreeARC distribution. The program
allows arguments to be passed to each compressor, plus 16 preset
options. Only the fastest and slowest preset option for each compressor
was tested. Options 1-7 are tornado, 8-12 are LZMA, and 1t-4t are GRZipII.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- -------------------------- ---------- ----------- ----------- ----------- ----- ----- --- ---
4x4 0.2a 1 (tor:1:4m) 59,711,544 17 13 54 LZ77
7 (tor:7:64m) 32,433,532 197 24 230 LZ77
8 (lzma:fast:128m:ht4:mc8) 32,698,603 292 43 230 LZ77
12 (lzma:128m:ht4:mc128) 27,307,504 4354 43 230 LZ77
1t (grzip:m4) 26,576,294 167 232 128 BWT
4t (grzip:m1:h18) 23,833,244 208,787,642 317,097 x 209,104,739 386 240 269 BWT
.2101 rzm
rzm 0.06c
(mirror)
is a free file compressor by Christian Martelock, Mar. 4, 2008.
It uses order-1 ROLZ as discussed
here.
It takes no options.
Memory usage is advertised as 258 MB for compression and 130 MB for decompression.
Measured values (shown) are 180 MB for compression and 104 MB for decompression.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
rzm 0.06c 24,429,597 210,719,085 12,903 x 210,731,988 2216 92 180 ROLZ
rzm 0.07h 24,361,070 210,126,103 17,667 x 210,143,770 2336 81 160 ROLZ
.2104 pim
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
pim 2.01 PPMd, no exe, no color 24,303,638 210,124,895 340,951 x 210,465,846 ~600 639 92 PPM
pim 2.04b PPMd 24,303,638 210,124,895 335,004 x 210,459,899 900 780 84 PPM
pim 2.10 PPMd 24,303,638 210,124,895 335,374 x 210,460,269 895 ~900 84 PPM
pim 2.50 best 24,303,638 210,124,895 330,901 x 210,455,796 764 ~764 88 PPM
.2120 CTW
Option enwik7 enwik8 enwik9 Comp (ns/byte)
------ --------- ---------- ----------- -----
-d5 2,490,460 24,174,511 11340
-d6 2,438,708 23,670,293 211,995,206 19221
-d7 2,455,765 23,689,423 24680
-d9 2,494,767
-d12 2,531,284
.2139 boa
.2153 TarsaLZP
Compressed size Decompressor Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
TarsaLZP Jul 4 2006 35,745,297 334,661,013 2,255 sd 334,663,268 149 163 54 LZP
TarsaLZP Jul 30 2006 34,321,697 320,160,237 1,455 xd 320,161,692 110 117 54 LZP
TarsaLZP Aug 5 2006 32,270,002 295,312,202 1,579 xd 295,313,781 110 127 70 LZP
TarsaLZP May 6 2007 32,461,606 297,130,840 1,580 xd 297,132,420 97 121 71 LZP
TarsaLZP Jun 17 2007 31,233,381 283,895,945 1,604 xd 283,897,549 100 122 71 LZP
TarsaLZP Jul 18 2007 31,363,533 285,248,058 2,365 xd 285,250,423 88 105 71 LZP
TarsaLZP Jul 30 2007 26,664,933 233,613,937 2,472 xd 233,616,409 247 255 42 LZP
TarsaLZP Aug 8 2007 25,134,862 215,301,412 2,843 xd 215,304,255 249 287 341 LZP
TarsaLZP Aug 10 2007 25,135,357 215,301,079 3,546 xd 215,304,626 269 322 341 LZP
.2174 lzturbo
lzturbo 0.01 is
a free, experimental, closed source file compressor by Hamid Bouzidi, Aug. 15, 2007.
It uses LZ77 with arithmetic coding. The option -49 selects method 4 (1, 2, 4)
and level 9 (1..9) for best compression. Other combinations were not tested.
There is also a Linux version which was not tested.
Memory usage fluxuates but peaks at 654 MB for compression and 90 MB for decompression.
The Windows version produces read-only output files that must be set with
"attrib -r" before they can be modified or deleted.
Prog Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note
------------ --- ---------- ----------- ------ ----------- ---- ---- --- ---- ----
lzturbo 0.01 -49 26,678,709 233,322,999 68,561 x 233,391,560 1412 50 654 LZ77
lzturbo 0.1 -59 26,616,816 232,708,136 129,344 x 232,837,480 1385 49 248 LZ77
lzturbo 0.9 -59 26,616,278 232,701,587 116,508 x 232,818,095 1420 52 248 LZ77
lzturbo 0.94 -59 -b100 -p0 24,763,542 217,342,694 152,254 x 217,494,948 5196 20 1450 LZ77 26
-10 51,426,368 10 8 78 LZ77 26
-14 38,325,178 74 10 171 LZ77 26
-39 -b50 26,123,933 1290 16 1450 LZ77 26
-41 36,615,397 325,577,604 152,254 x 325,729,858 29 23 203 LZ77 26
.2178 LZPXj
LZPXj 1.1d
is an experimental open source (GPL) command line file compressor by
Ilia Muraviev and Jan Ondrus, May 21, 2006. The -m3 option selects maximum compression.
The -e0 option turns off the exe filter (has no effect on text). The -r3 and -a0 options
were tuned experimentally on enwik7. -r sets the rescale rate (range 1-5, default 3).
-a0 turns off the alternate one byte matcher (default -a1 = on).
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- -----
LZPXj 1.1b -s (best, = -r4 in 1.1d) 28,387,611 674 LZP
LZPXj 1.1b (default) 28,440,958 677 LZP
LZPXj 1.1d -m3 -r4 -a0 -e0 28,386,512 246,468,866 6,534 s 246,475,400 362 402 216 LZP
LZPXj 1.2h 9 25,205,783 217,880,584 4,853 s 217,885,437 783 717 1316 PPM
.2179 scmppm
.2190 PX
PX v1.0 is a free command line
file compressor by Ilia Muraviev, Feb. 17, 2006. It is a context mixing
compressor based on PAQ1 with fixed weight models.
.2196 DGCA
DGCA v1.10 is a free, closed source
GUI archiver, Aug. 8, 2006. The installer is in Japanese but the program runs
in several languages including English. It was tested with default settings
except for producting a self extracting archive. This adds 189,936 bytes
to enwik8.
.2200 Squeez
Squeez 5.20.4600 is a commercial
(60 day trial) GUI archiver by SpeedProject, Apr. 11, 2006.
It supports 13 different formats, but only
the native .sqx (possibly LZ77) format was tested. The options used were 2.0 format (newest),
32 MB dictionary (largest, actually uses 365 MB memory), Ultra compression (best),
and all checkboxes off (including no exe or multimedia compression). There is a SFX
option but using UnSqueez to decompress instead gives a smaller size.
.2212 fpaq2
Program Opt enwik8 enwik9 prog (zip) enwik9+prog Comp Decomp Mem Alg
------- --- ---------- ----------- ----------- ----------- ----- ----- --- --
fpaq2 25,287,775 221,242,386 3,429 s 221,245,815 20183 20186 131 CM
fpaq3d 6 26,656,082 233,750,402 3,309 s 233,753,711 1922 1938 1050 o28b
fpaq3c 27,978,995 248,253,886 2,535 s 248,256,421 1446 1456 268 o28b
fpaq0s6 30,012,650 263,438,012 4,150 s 263,442,162 547 505 174 PPM
fpaq0s5 30,374,122 266,244,843 4,027 s 266,248,870 480 419 200 PPM
fpaq3b 29,992,583 270,804,549 2,926 s 270,807,475 1526 1517 256 o28b
fpaq3 31,176,104 282,922,749 8,820 x 282,931,569 1770 1807 250 o3
fpaq0x1b 30,860,828 283,001,299 2,727 s 283,004,026 1178 1180 1094 PPM
fpaq0s4 33,327,611 311,104,858 3,528 s 311,108,386 477 473 147 PPM
fpaq0x1a 36,186,433 339,131,763 2,561 s 339,134,324 621 623 1052 o3
fpaq0s2b 35,934,548 343,603,459 3,029 s 343,606,488 599 605 1052 o3
fastari 39,392,220 371,909,475 2,287 s 371,911,762 224 261 133 o2
fpaq0s2 38,812,873 375,050,952 2,982 s 375,053,934 591 595 131 o2
fpaq0x 38,845,305 375,276,899 2,482 s 375,279,381 631 631 263 o2
fpaq0s3 49,728,923 490,781,136 3,000 s 490,784,136 525 475 32 o2
.2226 dmc
dmc is the original DMC
compressor written by Gordon V. Cormack in 1987 and described in
"Data Compression using Dynamic Markov Modelling",
by Gordon Cormack and Nigel Horspool in Computer Journal 30:6 (December 1987).
The algorithm is the same as described in hook with the
last 2 arguments fixed at "2 2". The dmc argument "c 1800000000" means to
compress with 1.8 GB memory. The memory size must also be given for decompression.
Thus, 10 bytes (the size of the argument) was added to the decompressor size
(source zipped with Info-Zip 2.31 -9).
Because dmc compresses and decompresses
from stdin to stdout, it was tested in Linux (Ubuntu
2.6.15.27-amd64-generic), compiled in gcc 4.0.3 x86-64 as follows:
gcc -O -s -Dexp=expand dmc.c
and tested on a 2.2 GHz Athlon-64 with 2 GB memory. The compiler argument
"-Dexp=expand" removes a compiler error due to a K&R style redefinition of exp().
.2270 flashzip
flashzip 0.1
is a free, closed source file compressor by Nania Francesco Antonio, Jan. 10, 2008.
It uses LZP and arithmetic coding.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
flashzip 0.1 34,053,198 299,443,551 25,734 x 299,469,285 67 51 47 LZP
flashzip 0.2 34,053,198 299,443,551 25,257 x 299,468,808 62 52 47 LZP
flashzip 0.3 5 28,541,292 248,094,851 26,738 x 248,121,589 297 73 86 ROLZ
x 5 27,845,033 241,997,412 26,738 x 242,024,150 673 72 86 ROLZ
flashzip 0.9 (-m1 -s1 -b3) 31,856,012 141 124 83 ROLZ
-b1 32,088,940 148 125 70 ROLZ
-b5 31,764,213 143 119 132 ROLZ
-s4 29,235,064 269 99 83 ROLZ
-s7 28,370,670 928 87 83 ROLZ
-m2 31,641,305 188 121 83 ROLZ
-m2 -s7 27,665,526 2081 97 83 ROLZ
-m2 -s7 -b5 26,737,801 230,987,395 30,052 x 231,017,447 2476 75 132 ROLZ
flashzip 0.91 -m2 -s7 -b5 26,068,507 227,945,252 34,222 x 227,979,474 3560 112 198 ROLZ
-m1 -s7 -b5 26,851,582 1305 127 198 ROLZ
flashzip 0.93a -m2 -s7 -b5 26,243,745 227,048,196 36,367 x 227,084,563 1458 95 132 ROLZ
-m1 -s7 -b5 27,004,639 1030 140 198 ROLZ 26
flashzip 0.94 -m2 -s7 -b5 26,236,095 226,981,882 35,996 x 227,017,878 2451 87 132 ROLZ 26
-m1 -s7 -b5 26,662,405 230,985,291 35,996 x 231,021,287 1275 84 198 ROLZ 26
.2282 balz
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg
--------- --- --------- ----------- ------- ----------- ---- ---- --- ----
balz 1.02 30,634,726 268,552,062 48,030 x 268,600,092 21804 58 346 LZ77
balz 1.06 e 28,674,640 1580 79 67 ROLZ
balz 1.06 ex 28,234,913 245,288,229 48,937 x 245,337,166 2440 75 67 ROLZ
balz 1.07 e 28,271,200 1060 96 132 ROLZ
balz 1.07 ex 27,416,245 237,492,151 49,082 x 237,541,233 2106 77 132 ROLZ
balz 1.08 ex 26,534,890 229,477,116 49,351 x 229,526,467 4431 126 200 ROLZ
balz 1.09 ex 26,534,257 229,476,459 49,928 x 229,526,387 4049 128 201 ROLZ
balz 1.12 e 27,522,348 1800 177 201 ROLZ
balz 1.12 ex 26,522,258 229,347,434 48,400 x 229,395,834 3989 148 201 ROLZ
balz 1.13 e 27,405,650 1670 221 206 ROLZ
balz 1.13 ex 26,421,416 228,337,644 49,024 x 228,286,668 3700 190 206 ROLZ
balz 1.15 ex 28,232,824 245,218,274 4,045 s 245,222,319 1064 95 67 ROLZ
.2291 lzpm
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Opt enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- --- ---------- ----------- ----------- ----------- ----- ----- --- ----
lzpm 0.02 29,274,461 254,596,796 26,078 x 254,622,874 612 59 83 LZ77
lzpm 0.03 29,248,641 254,378,973 26,089 x 254,405,062 749 59 181 LZ77
lzpm 0.04 29,297,905 254,793,933 25,333 x 254,819,266 665 60 83 ROLZ
lzpm 0.06 28,896,680 251,111,835 25,369 x 251,137,204 852 58 83 ROLZ
lzpm 0.07 28,385,939 246,426,198 46,692 x 246,472,890 2185 56 280 ROLZ
lzpm 0.08 28,259,984 245,221,254 48,122 x 245,269,376 2754 59 280 ROLZ
lzpm 0.09 27,986,111 242,929,442 46,933 x 242,976,375 2451 56 280 ROLZ
lzpm 0.10 27,849,915 241,719,857 46,871 x 241,766,728 2598 57 280 ROLZ
lzpm 0.11 1 29,728,112 1162 76 723 ROLZ
2 27,967,747 3746 66 723 ROLZ
3 27,424,937 5204 68 723 ROLZ
4 27,239,304 6488 66 723 ROLZ
5 27,134,495 7446 63 723 ROLZ
6 27,038,405 8143 64 723 ROLZ
7 26,962,337 8761 63 723 ROLZ
8 26,890,422 9330 62 723 ROLZ
lzpm 0.11 9 26,501,542 229,083,971 46,824 x 229,130,795 15395 57 723 ROLZ
lzpmlite 0.11 1 30,136,214 627 69 362 ROLZ
3 27,918,695 2620 64 362 ROLZ
lzpmlite 0.11 9 27,096,516 235,135,224 48,144 x 235,183,368 6235 59 362 ROLZ
lzpm 0.12 9 27,391,197 237,915,048 47,030 x 237,962,078 4501 57 280 ROLZ
lzpm 0.13 9 27,318,013 237,241,658 47,129 x 237,288,787 4543 59 280 ROLZ
lzpm 0.14 9 27,091,358 235,074,141 48,790 x 235,122,931 6467 73 428 ROLZ
lzpm 0.15 9 27,145,224 235,567,823 48,401 x 235,616,224 6557 62 427 ROLZ
.2299 qazar
qazar 0.0pre5 is a free, closed source
command line file compressor by
Denis Kyznetsov, Jan. 31, 2006. It uses LZP, an LZ77 variant where
the decompressor dynamically computes the same sequence of context
matches as the compressor. The compressor uses a single bit flag
to indicate if the pointer computed by the decompressor should be
followed. In qazar, the output symbols are arithmetic coded.
.2328 qc
qc 0.050 is a free, closed source,
command line file compressor by Denis Kyznetsov, Sept. 17, 2006.
The -8 option selects maximum compression (slowest and most memory).
.2334 ppms
.2453 turtle
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg
--------- --- --------- ----------- ------- ----------- ---- ---- --- ----
turtle v0.01 31,314,961 274,696,820 5,079 x 274,701,899 187 178 122 PPM
turtle v0.02 31,314,961 274,696,820 4,637 x 274,701,457 196 175 122 PPM
turtle v0.03 31,287,161 274,649,069 7,111 x 274,656,180 142 129 122 PPM
turtle v0.04 31,137,531 273,100,225 7,808 x 273,108,033 141 128 122 PPM
turtle v0.05 28,860,689 251,626,176 9,779 x 251,635,955 242 203 174 PPM
turtle v0.07 28,669,320 250,600,644 10,625 x 250,611,269 217 175 206 PPM
WinTurtle 1.2 8MB 29,601,717 258,927,402 238,080 x 259,164,482 248 242 31 PPM
512MB 28,814,475 250,364,644 238,080 x 250,598,724 264 240 548 PPM
WinTurtle 1.21 512MB 28,814,475 250,364,644 225,123 x 250,589,767 255 219 548 PPM
WinTurtle 1.30 512MB 28,814,478 250,364,647 239,247 x 250,603,594 243 240 597 PPM
WinTurtle 1.60 512MB 28,379,612 245,217,944 160,090 x 245,378,034 273 237 583 PPM
.2508 cabarc
cabarc 1.00.0601
is a command line archiver available for free download by Microsoft, Mar. 18, 1997
(SDK released Jan. 8, 2002). It produces .cab files, which are often used to distribute Microsoft software.
It is designed for very fast decompression.
It uses LZX, a variant of LZ77 with fixed Huffman coding, but with shorter symbols reserved for the
three most recent matches. The option -m lzx:21 selects a window size of 221
(2 MB) for maximum compression.
There is a separate extraction program, "extract". The actual (global) decompression time of 32 sec. includes
15 sec. of CPU (process) time and the rest for disk I/O.
.2530 sr3
sr2 is a free,
open source (GPL) file compressor by Matt Mahoney, Aug. 3, 2007. It uses
symbol ranking. It takes no options. There are separate programs for
compression and decompression.
Program enwik8 enwik9 prog Total Comp Deco Mem Alg
------- ---------- ----------- ---- ------------ ---- ---- --- ---
sr2 30,432,506 273,906,319 2,831 sd 273,909,150 99 111 6 SR
sr3 28,926,691 253,031,980 5,611 x 253,037,591 130 146 68 SR
.2540 bzip2
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129
bzip2 1.0.3 -9 29,008,758 253,977,891 56,082 xd 254,033,973 334 120
.2561 quad
Compression Compressed size Decompressor Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ---------- ----------- ----------- ----------- ----- ----- --- ----
quad v1.01a 29,930,547 263,137,995 26,927 x 263,164,922 1281 168 33 LZ77
quad v1.04a 27,712,832 239,596,416 38,552 x 239,634,968 933 748 165 LZP
quad v1.07b x 29,360,404 258,361,092 61,067 x 258,422,159 1282 146 33 LZP
quad v1.08 x 29,171,593 256,664,803 13,042 s 256,677,845 1206 164 33 LZP
quad v1.10 -x 29,152,166 256,486,470 13,288 s 256,499,758 1007 117 34 LZP
quad v1.11 -x 29,110,579 256,145,858 13,387 s 256,159,245 956 116 34 ROLZ
quad v1.11HASH2 -x 29,110,519 256,145,858 30,129 x 256,175,987 705 117 42 ROLZ
quad v1.12 -x 29,110,519 256,145,858 13,516 s 256,159,334 527 120 34 ROLZ
.2572 WinACE
Compression Compressed size Decompressor Total size Time (ns/byte)
Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ---------- ----------- ----------- ----------- ----- -----
-sfx -m5 -d4096 29,481,470 257,237,710 0 xd 257,237,710 1080 77
-sfx -m5 30,919,182 270,578,538 0 xd 270,578,538 738 79
-sfx 30,937,342 ~770 ~40
.2588 tornado
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ----
tornado 0.1 -9 34,491,218 303,034,530 20,336 s 303,054,866 204 25 210 LZ77
tornado 0.3 -1 59,790,826 18 LZ77
-2 44,570,662 22 LZ77
-3 40,173,986 28 LZ77
-4 37,849,654 60 LZ77
-5 34,206,892 81 LZ77
-6 33,319,753 130 LZ77
-7 32,346,652 195 96 LZ77
-8 31,659,225 304 192 LZ77
-9 30,967,871 506 384 LZ77
-10 30,614,648 802 768 LZ77
-11 30,274,896 259,412,590 45,833 s 259,458,423 1646 25 1510 LZ77
-12 30,057,549 3700 28 1768 LZ77
tornado 0.4a -11 30,157,610 258,761,459 42,516 s 258,803,975 783 25 1513 LZ77
-12 30,026,843 3200 29 >1800 LZ77
.2660 sr3c
sr3c 1.0 is a free,
open source (MIT license) file compressor and library by Kenneth Oksanen,
released Nov. 27, 2008. It uses symbol ranking, based on ideas from SR3, but
completely rewritten in C. The distribution contains a portable compression
engine and source code for drivers for UNIX/Linux. To test, I wrote a simple driver
for Windows (sr3cw) and compiled it using gcc 3.4.5 -O3 -fomit-frame-pointer -march=pentiumpro
-s and included sr3cw.exe in the distribution. The driver takes no options.
.2665 lzc
lzc v0.01
is a free, closed source file comprssor by
Nania Francesco Antonio, May 8, 2007. It uses an LZ77 like algorithm.
The option 4 selects the maximum memory mode, 1 GB + 100 MB for compression and
16 + 100 MB for decompression. The actual memory usage indicated by Windows
Task Manager in this mode was 360 MB for compression and 107 MB for decompression.
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg
--------- --- --------- ----------- ------- ----------- ---- ---- --- ----
lzc v0.01 4 40,312,925 363,504,638 7,656 x 363,512,294 238 61 360 LZ77
lzc v0.03 4 37,908,748 341,811,895 8,268 x 341,820,163 182 61 515 LZ77
lzc v0.04 4 37,779,426 340,628,765 8,869 x 340,637,634 142 59 540 LZ77
lzc v0.05b 1 44,893,624 117 54 LZ77
lzc v0.05b 16 30,611,315 267,784,591 9,158 x 267,793,749 365 82 771 LZ77
lzc v0.06b 16 30,611,315 267,784,590 12,170 x 267,796,760 347 68 790 LZ77
lzc v0.07 1 40,554,444 110 60 70 LZ77
lzc v0.07 10 30,611,315 266,565,255 28,997 x 266,594,252 309 67 584 LZ77
lzc v0.08 10 30,611,315 266,565,255 11,364 x 266,576,619 302 63 550 LZ77
.2732 packet
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ----
packet 0.01 37,637,275 334,473,465 30,508 x 334,503,973 50 43 4 LZP
packet 0.02 37,637,276 334,473,466 27,900 x 334,501,366 58 42 4 LZP
packet 0.03b 1 35,576,495 140 20 3 LZ77
x 1 34,792,199 170 20 3 LZ77
6 34,563,297 450 20 3 LZ77
x 6 33,752,502 297,266,174 26,435 x 297,292,609 594 18 3 LZ77
packet 0.90b -m1 -s0 35,426,140 199 28 10 LZ77
-m1 -s9 32,780,039 2887 26 10 LZ77
-m2 -s0 34,281,503 274 24 10 LZ77
-m2 -s9 31,968,711 4527 25 10 LZ77
-m3 -s0 34,966,621 236 56 10 LZ77
-m3 -s9 32,199,212 2965 51 10 LZ77
-m4 -s0 33,612,046 307 61 10 LZ77
-m4 -s3 32,033,412 861 57 10 LZ77
-m4 -s6 31,367,386 2411 57 10 LZ77
-m4 -s9 31,208,752 273,176,127 32,305 x 273,208,432 3871 48 10 LZ77
.2839 bzp
bzp 0.2 is a free file
archiver by Nania Francesco Antonio, Sept. 16, 2008. It uses LZP
and arithmetic coding. It takes no options. Earlier versions (0.0, 0.1)
were not tested.
.2857 ha
ha 0.98 is a free
command line archiver by Harry Hirvola, Jan. 7, 1993. A later version,
0.999b, is available for UNIX with source code and ports to DOS. It uses order-5 PPMC
(PPM with fixed escape probabilities for dropping to a lower order context.
Newer PPM compressors (PPMZ, PPMII) use adaptive escape probabilities given a small context.)
The command a2 selects compression method HSC (default is a1 = ASC). a21 automatically
chooses the best method. Time is ns/byte.
Version Options enwik8 Comp Decomp Notes
-------- ----- ---------- ---- ---- -----
ha 0.98 a1 36,379,137 873 257 ns/byte
ha 0.98 a2 31,250,524 2080 1850
ha 0.999b a21 31,250,523 2447 16 DOS compile, 1995
ha 0.9991a a21 31,250,524 1551 16 DOS (.com) compile, 1995
ha 0.999b a21 31,250,524 1290 16 Compiled for NT by Michael Markowsky at Apr 30 1997
lgha v1.1 a21 31,250,524 1110 16 ha v.0999c DOS compile by Lyapko George, 1999
lgha v1.1 31,250,524 1068 1114 16
.2961 lcssr
symbra 0.2 is a free, open source (GPL)
(mirror with .exe)
file compressor by Frank Schwellinger, Nov. 29, 2007. It uses symbol ranking.
Only source code (C++) is provided. For the test, the program was compiled
as indicated in the source comments and tested in Windows XP (32 bit).
The option -c4 or -c5 selects order 4 or 5 context. -m5 turns on suffix
matching with maximum buffer size, which greatly slows compression. -p2 selects
2 passes, which reorders the alphabet by descending frequency. The defaults
are -c4 -m0 -p1.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
symbra 0.2 -c4 -m0 -p1 38,308,164 352,524,859 11,299 s 352,536,158 245 282 68 SR
symbra 0.2 -c4 -m5 -p2 34,644,072 302,948,753 11,299 s 302,960,062 4669 4633 112 SR
symbra 0.2 -c5 -m5 -p2 34,683,661 302,656,095 11,299 s 302,667,394 4700 4622 112 SR
lcssr 0.2 -b7 -l9 34,549,048 296,160,661 8,802 x 296,169,463 8186 8281 1184 SR
.2983
.3092 slug
slug v1.1b
(mirror)
is a free, closed source file compressor by Christian Martelock,
Apr. 26, 2007. It uses an LZ type algorithm with
a 128K non-sliding window and Huffman coding.
It is designed for high speed and low memory usage.
System (wall) times for enwik9: 18 (51) seconds for compression,
14 (30) for decompression.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
slug 1.1b 45,274,048 404,250,979 5,836 x 404,256,815 18 14 1 LZ77
slug 1.27 35,093,954 309,201,454 6,809 x 309,208,263 32 28 14 ROLZ
.3102 kzip
kzip is a free, closed source
command line compressor by Ken Silverman, compiled May 13, 2006,
released May 18, 2006. It is an optimizing compressor producing
zip-compatible archives but with better compression. The option /b512 sets the
block splitting threshold. The default is /b256, but /b512 was found optimal
on enwik8. /s0 (default) selects maximum compression and ranges from /s0
to /s3. No decompressor is included, but archives can be read with any
program that reads zip files (pkzip, unzip, 7zip, WinRAR, WinACE, etc).
Options enwik8 Comp (ns/B) enwik9
------- ---------- ----------- ----------
/s0 /b0 35,029,924 2490 (one large block)
/s0 /b256 35,025,767 5220 310,281,906 (default, s0 = extreme mode)
/s0 /b512 35,012,219 5410 310,248,404 (best enwik8)
/s0 /b1024 35,016,649 4440 310,188,783 (best enwik9)
/s1 35,028,473 5240 (s1 = intense mode)
/s2 42,370,689 860 (s2 = longest run)
/s3 63,191,700 820 (s3 = Huffman code only)
pkzip 204 36,934,712 123 (for comparison)
.3128 uc2
.3141 thor
thor 0.9a is an experimental,
closed source, command line file compressor by Oscar Garcia, Mar. 19, 2006.
It is the fastest compressor on the maximumcompression
benchmark. It has 3 modes: ef (fastest), e (normal) and ex (best). However in this test it
appears speed may be limited by disk I/O.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---
thor 0.9a ex 41,670,916 368,669,696 61,556 x 368,731,252 54 51 5.5
thor 0.9a e 45,842,692 412,096,696 61,556 x 412,157,852 44 50
thor 0.9a ef 55,063,944 490,400,720 61,556 x 490,461,876 45 53
thor 0.94a exx 35,696,028 315,611,168 68,922 x 315,680,090 82 32 2
thor 0.95 e1 55,138,792 21 27
thor 0.95 e2 45,714,740 21 23
thor 0.95 e3 41,528,948 29 29
thor 0.95 e4 35,795,184 314,092,324 49,925 x 314,142,249 64 34 16
thor 0.95 e5 35,696,032 315,611,172 49,925 x 315,661,097 80 22 2
thor 0.96a e1 54,915,456 488,397,982 50,071 x 488,448,053 17 20 1.6
thor 0.96a e2 45,714,724 411,416,252 50,071 x 411,466,323 23 19 1.5
thor 0.96a e3 41,531,628 367,671,220 50,071 x 367,721,291 27 24 6
thor 0.96a e4 35,795,184 314,092,324 50,071 x 314,142,395 62 30 16
thor 0.96a e5 35,696,032 315,611,172 50,071 x 315,661,243 80 18 2
.3211 gzip124hack
gzip124hack
is a modified version of gzip 1.2.4 by Ilia Muraviev, Aug. 13, 2007.
It uses LZ77.
It is a file compressor like gzip, except that it does not delete the input file.
It improves compression by using LZ77 lazy matching with 2 byte lookahead.
The compressed format is compatible with gzip. -9 selects maximum compression.
.3226 gzip
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
gzip 1.3.5 -9 36,445,248 322,591,995 38,801 x 322,630,796 101 17
gzip 1.3.5 36,518,329 323,742,882 38,801 x 323,781,683 85 19
.3226 Info-ZIP
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- --
Info-ZIP 2.31 (Linux) -9 36,445,373 322,592,120 57,583 x 322,649,703 104 35 0.1 LZ77
Info-ZIP 2.32 (DOS) -9 (unset TZ) 36,445,333 178 101 LZ77 16
Info-ZIP 2.32 (DOS) -9 36,445,351 179 LZ77 16
Info-ZIP 2.32 (Win32) -9 36,445,474 183 LZ77 16
Info-ZIP 2.32 (Win32) -9 36,445,443 322,592,190 75,806 xd 322,667,996 96 13 1.2 LZ77
.3234 pkzip
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ----
pkzip 2.0.4 36,934,712 327,607,376 29,184 xd 327,636,560 123 44 1.7 LZ77
pkzip 2.0.4 -ex 36,556,552 323,403,526 29,184 xd 323,432,710 171 50 2.5 LZ77
.3237 jar
.3244 PeaZip
.3344 lzgt3a
Program enwik8 enwik9 prog size Total Comp Decomp Mem Alg
------- ---------- ----------- ---------- ----------- ---- ----- --- ----
lzgt 47,560,234 1,989 sd 634 234 2 LZ77
lzgt1 43,928,072 403,385,292 2,025 sd 403,387,317 3390 865 2 LZ77
lzgt2 57,268,099 1,935 sd 982 274 1 LZ77
lzgt3 54,253,334 1,963 sd 889 280 1 LZ77
lzgt3a 37,444,440 334,405,713 4,387 xd 334,410,100 1581 2886 2 LZ77
.3375 lzss
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ----
lzss 0.01 e 48,615,051 426,009,994 44,555 x 426,054,549 193 15 625 LZ77
ex 38,254,303 337,565,308 337,609,863 9708 14 625 LZ77
.3388 lzuf
.3502 pucrunch
pucrunch is a free,
open source file compressor by Pasi Ojala, last updated Mar. 8, 2002.
It uses a combination of run length encoding (RLE) and LZ77 with Elias Gamma coding
of the offsets and run lengths.
The original version was written on Mar. 14, 1997 for the Commodore series
(Vic 20, Commodore 64, Commodore 128 and Commodore Plus 4/C16) in 6510
assembly language, with updates on Dec. 17, 1997 and Oct. 14, 1998.
The 6510 is a 1 MHz, 8 bit microprocessor with 3 registers,
16 bit (64K) address space, no cache, no pipelining, 8 bit ALU, no multiply or
floating point instructions, and no support for multitasking or virtual memory.
The decompressor was designed to execute quickly
in this environment with only a few hundred bytes of memory.
#!/usr/bin/perl
# compress with pucrunch: perl p input output
open(IN,"$ARGV[0]")||die "$!: $ARGV[0]";
open(OUT,">$ARGV[1]")||die "$!: $ARGV[1]";
binmode(IN);
binmode(OUT);
while ($n=read(IN, $s, 64936)) {
open(TMP1,">tmp1")||die "$!: tmp1";
binmode(TMP1);
syswrite(TMP1, $s, $n);
close(TMP1);
`pucrunch -d -c0 tmp1 tmp2`;
open(TMP2,"tmp2")||die "$!: tmp2";
binmode(TMP2);
$size=(stat(TMP2))[7];
print("$n -> $size\n");
$n=read(TMP2,$s,$size);
printf(OUT "%c%c%s", $size/256, $size%256, $s);
close(TMP2);
}
#!/usr/bin/perl
# unpack with pucrunch: perl up input output
open(IN,"$ARGV[0]")||die "$!: $ARGV[0]";
open(OUT,">$ARGV[1]")||die "$!: $ARGV[1]";
binmode(IN);
binmode(OUT);
while (($c1=getc(IN)) ne "") {
$c2=getc(IN);
$size=unpack("C",$c1)*256+unpack("C",$c2);
$n=read(IN, $s, $size);
if ($size!=$n) {die "size=$size n=$n\n";}
open(TMP1,">tmp1")||die "$!: tmp1";
binmode(TMP1);
syswrite(TMP1, $s, $n);
close(TMP1);
`pucrunch -u tmp1 tmp2`;
open(TMP2,"tmp2")||die "$!: tmp2";
binmode(TMP2);
read(TMP2,$s,2);
read(TMP2,$s,64936);
printf(OUT "%s", $s);
close(TMP2);
}
.3663 lzop
.3676 lzw
lzw v0.1 is a free, experimental
file compressor by Ilia Muraviev, Jan. 30, 2008. It uses LZW with 16 bit
code words. It takes no options.
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
lzw 0.1 42,554,530 380,782,976 42,215 x 380,825,191 1917 27 17 LZW
lzw 0.2 41,960,994 367,633,910 671 s 367,634,581 3597 31 18 LZW
.3790 arbc2z
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
arbc2z 38,756,037 379,054,068 6,255 sd 379,060,323 2659 2674 68 PPM2
arbc2 38,780,256 379,093,120 6,070 sd 379,099,190 2528 2646 67 PPM2
arbc1 48,586,591 486,892,000 6,047 sd 486,898,047 2439 2611 1.8 PPM1
arbc0 63,501,994 644,561,590 5,988 sd 644,567,578 2459 2606 1.5 o0
.3894 xdelta
xdelta 3.0u is a free, open source command line
file compressor by Joshua McDonald, Oct. 12, 2008. It uses LZ77. The program is a delta
coder, meaning it will output the compressed difference between two files, and then
decompress the second file when given the first file uncompressed. It allows the first
file to be omitted, in which case it simply compresses. This is how the test was done.
-9 specifies maximum compression.
.4092 srank
srank 1.1 is a free,
open source file compressor by P. M. Fenwick, originally written Sept. 5, 1996
and last updated Apr. 10, 1997. It uses symbol ranking, like MTF (move to front)
in BWT, but in order 3 contexts without a BWT transform. When a symbol is encountered
it is encoded with 1, 3, or 4 bits according to its position in a queue of length 3,
then moved to the front. Long runs of first place symbols are run length encoded
using 12 bits to encode the length of the length of the run.
A miss is coded using pseudo-MTF in an order-0 context using 7 bits for
the first 32 symbols and 12 bits for the rest. It is pseudo-MTF because after a
symbol is found it is swapped with another symbol about half way to the front,
with some dithering. The algorithm is designed for speed rather than good compression.
gcc -O2 -march=pentium4 -fomit-frame-pointer -s srank.c -o srank.exe
.4106 QuickLZ
QuickLZ v0.1 is an open source (GPL)
compression library designed for high speed by Lasse Mikkel Reinhold,
Sept. 24, 2006. Tests were performed with demo.exe. Speed is I/O bound.
Times shown are process times, but wall times can be 2-4 times greater.
On enwik9 compression, the program reports "file too big".
Version enwik8 enwik9 prog size Total Comp Decomp Mem Alg
------- ---------- ----------- ---------- ----------- ---- ----- --- ----
QuickLZ 0.1 57,331,969 (fails) 45,361 x 19 21 154 LZ77
QuickLZ 0.9 56,900,177 507,806,141 45,086 x 507,851,227 11 11 10 LZ77
QuickLZ 1.20 57,147,067 510,018,447 43,501 x 510,061,948 17 12 2 LZ77
quick3 1.30b 46,378,438 410,633,262 44,202 x 410,677,464 48 12 3 LZ77
QuickLZ 1.30 -3 46,445,704 411,493,051 47,304 x 411,540,355 49 12 2 LZ77
-2 51,941,357 23 11
-1 57,153,015 12 11
-0 52,803,919 20 16
quickLZ 1.40 47,728,849 417,653,684 43,922 x 417,697,606 28 13 13 LZ77
.4246 compress
.4253 BriefLZ
.4382 lzrw3-a
lzrw3-a is one of a series
of public domain (open source) memory to memory compressors by
Ross Williams in 1991. The programs were
implemented
as file compressors by Matt Mahoney on Feb. 14, 2008. The programs
are as follows:
Compressor enwik8 enwik9 prog Total Comp Deco Mem ALg
------- ---------- ----------- ------- ----------- ---- ---- --- ---
lzrw1 59,692,493 564,053,011 3,142 s 564,056,153 24 17 2 LZ77
lzrw1-a 59,471,657 560,457,545 4,328 x 560,461,873 23 15 2 LZ77
lzrw2 55,360,907 511,142,568 4,420 x 511,146,988 22 16 2 LZ77
lzrw3 52,616,827 483,918,830 4,622 x 483,923,452 21 17 2 LZ77
lzrw3-a 48,009,194 438,253,704 4,750 x 438,258,454 38 17 2 LZ77
lzrw5 (64K) 59,375,192 570,387,858 4,544 x 570,392,402 146 14 1 LZW
lzrw5 (192K) 50,721,610 479,044,732 174 14 1 LZW
.4473 fcm1
fcm1 is a free, open source file compressor
by Ilia Muraviev, May 23, 2008. It mixes order 0 and order 1 models and uses bitwise
arithmetic coding as in fpaq0 and paq. The bit predictions are combined by weighted averaging,
with the order 1 model weighted 15/16 unless the model is in its initial state, in which
case the order 0 model prediction is used. Each context is mapped to 2 16-bit counters
in initial state 1/2. One counter is updated by 1/8 of the prediction error and the
other by 1/32. The model prediction is the average of these two values.
The compressed file has a 4 byte header containing the file size.
Compressor enwik8 enwik9 prog Total Comp Deco Mem ALg
------- ---------- ----------- ------- ----------- ---- ---- --- ---
fcm1 45,402,225 447,305,681 1,116 s 447,306,797 228 261 1 CM1
.4581 runcoder1
.4930 FastLZ
gcc -march=pentium -O3 -fomit-frame-pointer -mtune=pentium 6pack.c fastlz.c -o 6pack -s
gcc -march=pentium -O3 -fomit-frame-pointer -mtune=pentium 6unpack.c fastlz.c -o 6unpack -s
6pack and 6unpack are the compressor and decompressor, respectively. They take no
options. The compressed file name is stored without a path in the archive.
.4975 flzp
.5586 fpaq0f2
fpaq is
a free, experimental command line file compressor with source code
(in assembler) by Nikolay Petrov, Feb. 20, 2006. It is a faster
implementation of fpaq0 by Matt Mahoney (Sept. 3, 2004) maintaining
archive compatibility. fpaq is an order-0 arithmetic coder which
models independent, identically distributed (i.i.d.) characters, and is not
intended as a general purpose compressor. Its purpose is
to test the efficiency of different arithmetic coding algorithms.
There are several variants.
Compressor enwik8 enwik9 Comp Decomp Author Date
---------- ---------- ---------- ---- ---- -------------- ----
fpaq0 63,391,013 641,421,110 336 351 Matt Mahoney Sep 03 2004
fpaq1 63,502,003 477 489 Matt Mahoney Jan 10 2006
fpaq0b 63,375,460 457 437 Fabio Buffoni Jan 10 2006
fpaq0s 63,375,457 427 417 David A. Scott Jan 16 2006
fpaq 63,391,013 641,421,110 255 246 Nicolay Petrov Feb 20 2006
fpaq0p 61,457,810 622,237,009 131 131 Ilia Muraviev Apr 15 2007
fpaq02 63,501,997 644,561,596 1345 1325 David Anderson May 27 2007
fpaqa 61,340,408 620,681,885 262 237 Matt Mahoney Dec 15 2007
fpaqb 61,270,458 620,278,361 264 171 Matt Mahoney Dec 20 2007
fpaq0m 61,389,879 621,285,504 153 135 Ilia Muraviev Dec 20 2007
fpaq0mw 61,271,869 618,959,309 455 457 Eugene Shelwien Dec 21 2007
fpaqc 61,270,455 620,278,358 252 177 Matt Mahoney Dec 24 2007
fpaq0pv2 61,280,398 620,379,449 116 133 Ilia Muraviev Dec 26 2007
fpaq0r 61,234,684 620,169,855 129 142 Alexander Ratushnyak Jan 09 2008
fpaq0rs 61,202,171 619,839,546 139 138 Alexander Ratushnyak Jan 09 2008
fpaq0f 58,088,230 581,053,251 265 251 Matt Mahoney Jan 28 2008
fpaq0f2 56,916,872 558,645,708 222 207 Matt Mahoney Jan 30 2008
fpaq0pv3 61,457,810 622,237,009 103 119 Nania Francesco Antonio Apr 04 2008
fpaq0pv4 61,457,810 622,237,009 70 79 Eugene Shelwien Apr 06 2008
fpaq0pv4nc 61,350,834 621,169,159 64 69 Eugene Shelwien Apr 06 2008
fpaq0pv4nc0 61,287,662 620,506,072 68 74 Eugene Shelwien Apr 06 2008
fpaq0pv5 61,457,810 622,237,009 81 87 Nania Francesco Antonio Apr 06 2008
fpaq0pv4a 61,457,810 622,237,009 70 75 Eugene Shelwien Apr 07 2008
fpaq0pv4anc 61,323,986 621,169,159 64 65 Eugene Shelwien Apr 07 2008
fpaq0pv4anc0 61,287,662 620,506,072 66 66 Eugene Shelwien Apr 07 2008
fpaq0pv4b1 61,287,234 620,488,244 56 60 Eugene Shelwien Apr 18 2008
g++ -Wall %1.cpp -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o %1.exe
if d = 0 then x := ceil((x+1)/(1-q)) - 1
if d = 1 then x := floor(x/q)
To decode, given x and q
d = ceil((x+1)*q) - ceil(x*q) (1 if fract(x*q) >= 1-q, else 0)
if d = 0 then x := x - ceil(x*q)
if d = 1 then x := ceil(x*q)
x is maintained in the range 2N to 2N+1-1 by writing the
low bits of x prior to encoding d and reading into the low bits of x after
decoding. Because compression and decompression are reverse operations of
each other, they must be performed in reverse order.
The encoder divides the input into blocks of size B=500K bits, saves the predictions (q)
in a stack, then encodes the bits in reverse order to a second stack.
The block size and final state x are then written, followed by the compressed bits in
the second stack in reverse order that they were
coded. The decompressor runs everything in the forward direction,
reading the saved x at the beginning of each block.
.5793 ppp
ppp is
the public domain file compressor specified in
RFC 1978 for
datagram compression using the Point-to-Point Protocol.
The RFC includes an implementation in C written by Dave Rand with modifications
by Ian Donaldson and Carsten Bormann, published in Aug. 1996.
The program uses order-4 symbol ranking with a queue length of 1
with a 64K hash table without collision detection. Match flags
are packed 8 to a byte, followed by up to 8 literals for each incorrect guess.
The 16 bit context hash is updated by shifting left 4 bits and XORing with the
current byte. The program reads from a file and outputs to stdout like this:
ppp enwik9 > enwik9.ppp (compress)
ppp -d enwik9.ppp > enwik9 (decompress)
The original code opens both files in text mode, which does not work in Windows.
For testing, I modified 3 lines of code to open the input and output files
in binary mode as follows:
#include <fcntl.h> // added
setmode(fileno(stdout), O_BINARY); // added
FILE *f = fopen(*p, "rb"); // changed "r" to "rb"
I compiled using gcc 3.4.2 -O3 -fomit-frame-pointer
-march=pentiumpro and packed with UPX (linked above, Feb. 11 2008).
Times are wall times. I did not use timer 3.01
because its output would be redirected to the output file. Process times
are about 50% of wall time based on watching Task Manager.
.5902 lzbw1
lzbw1
0.8 is a free, command line file compressor by Bruno Wyttenbach, Apr. 26, 2009.
It uses LZP and is derived from LZP2. It takes no options.
.6368 NTFS
.6373 shindlet
shindlet is
a series of 3 free command line file compressors by Piotr Tarsa. All are
order-0 arithmetic coders with identical models written in assembler (included).
The three variants are fs (frequency sorting), bt (binary tree), and
sl (linear search). All three produce identical sized compressed files.
In addition, the compressed output of bt and sl are identical.
Results for all 3 variations are below. Comp and Decomp show global
times including disk I/O in ns/byte, with CPU (process) times in parenthesis.
Date is the latest program timestamp in the distribution, not the release date.
Compressor Date enwik8 enwik9 prog Total size Comp Decomp
----------- ------------ ----------- ---------- ------- ----------- --------- ---------
shindlet_fs May 7, 2006 62,890,267 637,390,277 1,275 xd 637,391,552 185 (113) 123 (103)
shindlet_bt May 27, 2006 62,890,267 637,390,277 1,387 xd 637,391,664 163 (85) 118 (96)
shindlet_sl Apr 12, 2006 62,890,267 637,390,277 2,415 xd 637,392,692 166 (94) 121 (102)
.6445 arb255
arb255 is a
free, experimental command line file compressor with source code availalbe
by David A. Scott, July 28, 2004.
It is a bijective order-0 arithmetic coder, best suited for i.i.d. bytes
(like fpaq). It takes no arguments
except the input and output filenames. The decompressor is unarb255.exe.
.6483 compact
.6557 lzp2
lzp2
is a free file compressor by Yann Collet, Apr. 17, 2009. It uses LZP. There are
no compression options. There is a smaller, separate program (unlzp2) that only
decompresses.
.7594 barf
barf is a free,
open source file compressor by Matt Mahoney, Sept. 21, 2003. It was written
as a joke to debunk claims of recursive compression. The algorithm is as
follows:
The main table shows the size and total process time after 2 compression passes.
Further passes will "compress" by one byte. The decompressor source code
size includes the Calgary corpus, which is needed to build the executable.
(barf.exe is 1,009,274 bytes after packing with UPX and zip). Results by pass
are shown below. Times are process times (Timer 3.01) with actual wall times
in parenthesis.
Pass enwik8 enwik9 size (zip) enwik9+prog Comp (wall) Decomp Mem Alg Filename
---- ---------- ----------- ----------- ----------- ---------- ------- --- ---- --------
1 76,450,126 763,918,762 983,782 s 764,902,544 315 (330) 30 (73) 4 LZ77 enwik9.x
2 76,074,327 758,482,743 983,782 s 759,466,525 439 (462) 23 (60) 4 LZ77 enwik9.x.x
3 76,074,326 758,482,742 983,782 s 759,466,524 488 (551) 18 (44) 4 copy enwik9.x.x.x9v
.9956 arb2x
arb2x v20060602 is a
free, experimental command line file compressor with source code availalbe
by David A. Scott, updated June 2, 2006.
It is a bitwise bijective order-0 arithmetic coder, best suited
for i.i.d. bits. It takes no arguments
except the input and output filenames. The decompressor is unarb2x.exe.
Failed and Pending Tests
hipp
hipp5819 enwik8 MB Mem Comp (ns/byte)
------- ---------- ------ ----
/o5 22,390,366 248.5 ~3710
/o8 20,555,951 719.5 ~4300
Zipped size: C++ source (commented in Russian) = 98,765, exe = 36,724.
XMill
XMill 0.8 is an open source
command line XML preprocessor/compressor by AT&T, written by Dan Suciu,
Hartmut Liefke, and Hedzer Westra in March, 2003.
It works by sorting by XML tags to bring similar content together, then
compressing with gzip, bzip2, or ppmd. Optionally it can (in theory) output the
preprocessed data as input to another compressor.
"</text></revision></page></mediawiki>
However, decompression succeeds for enwik8 but fails for enwik9. (Failed
values in parenthesis, timed for enwik8). The decompressor (xdemill) reports "corrupt file".
Compression Compressed size Decompressor Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---
xcmill 0.8 -w -P -9 -m800 26,579,004 (230,934,622) 114,764 xd (231,049,386) 616 (530) 800
xcmill 0.9.1 -w -P -9 -m1700 26,579,004 (230,914,289) 108,845 xd (231,023,134) 711 984
The -w option preserves whitespace. Otherwise compression is lossy. -P selects ppmdi compression
(bzip2, gzip and no compression are also available). -9 selects maximum compression. -m800 allows
800 MB of memory.
lzp3o2
lzp3o2 (LZP 3 with order 2 literal
coding) is one of a family of open source file compressors by
Charles Bloom, originally written in 1995. The algorithm is described in
a paper submitted to DCC'96.
lzp3o2 uses LZP compression with order 2 modeling of literals and arithmetic
coding. The tested version of the source code
is dated Aug. 25, 1996 and compiled for Windows Oct. 10, 1998. The compiled
distribution from here was tested.
Program enwik8 Comp Deco Mem Alg
------- ---------- ---- ---- --- ---
lzp1 56,013,656 23 20 153 LZP
lzp2 40,350,594 80 280 LZP
lzp3o2 33,041,439 230 270 151 LZP
History
May 10 2006 - benchmark began with 1 month of testing about 2 compressors per day.
Jun 10 2006 - began test data analysis.
Jun 14 2006 - updated xml-wrt 2.0 14.06.06 | ppmonstr.
Jun 17 2006 - reorganized website from 1 big page to 4 smaller pages.
Jun 19 2006 - added xml-wrt 2.0 19.06.06 (standalone LZMA mode).
Jun 20 2006 - added ocamyd 1.65 LTCB 1.0.
Jun 21 2006 - updated TC 5.0 to dev 4 (compression unchanged but faster).
Jul 19 2006 - updated TC 5.0 to dev 9, added dark 0.32b.
Jul 20 2006 - added arbc2z.
Jul 21 2006 - added TarsaLZP (July 4 2006).
Jul 22 2006 - added uda 0.300.
Jul 23 2006 - verified uda 0.300 decompression.
Jul 24 2006 - updated TC 5.0 to dev 11.
Jul 29 2006 - added CTW 0.1.
Aug 01 2006 - updated TarsaLZP (July 30 2006), added ppmvc v1.1.
Aug 06 2006 - added the Hutter Prize, renamed Large Text Compression Benchmark to Human Knowledge Compression Contest,
added rules for the Hutter Prize, and updated rationale to add a section on AIXI.
Aug 07 2006 - added link to paq8f, updated prize formula (Z might not decrease), and that prize committee members
are not elibible for prize money. Added logo. Minor edit to rationale.
Aug 08 2006 - the prize fund (Z) does not decrease.
Aug 11 2006 - added a lexcial and string repetition analysis to the data study.
Aug 13 2006 - typo in Rationale.
Aug 14 2006 - updated dark v0.40. Edited Rationale (AIXI, compression does not seem like AI, lossy compression).
Aug 16 2006 - raq8g and durilca 0.5(Hutter) submitted for Hutter prize, neither verified yet.
Aug 17 2006 - verified durilca 0.5(Hutter) claim. Posted raq8g.exe for Windows.
Aug 18 2006 - verified raq8h -7 on enwik8 under Windows. Tested paq8f -8 on enwik8 (not verified).
Reported raq8h -8 result (Linux).
Aug 19 2006 - updated ha, added Info-ZIP, ESP. Clarified rules 5 and 6.
Aug 20 2006 - Removed rules and results for the Hutter prize. These may be found on the Hutter Prize website.
Updated ha and Info-ZIP.
Aug 22 2006 - added paq8hp1. Updated Info-ZIP. Added submission times and unzipped .exe sizes for Hutter prize candidates.
Aug 23 2006 - updated paq8hp1 for enwik9 -8 (compress only). Tuned xml-wrt|ppmonstr for enwik8 at 2 GB. Added durilca4linux.
Aug 26 2006 - updated dark 0.46. Fixed link to durilca4linux. Posted enwik8.bz2 and enwik9.bz2 on the data page.
Aug 28 2006 - added paq8hp2 (enwik8, 1 GB, not checked). Updated ppmonstr, xmlwrt|ppmonstr, slim, and ash for 2 GB memory.
Aug 29 2006 - verified paq8hp2 for enwik8 (1 GB and 2 GB).
Aug 31 2006 - added bbb.
Sep 01 2006 - updated bbb, TarsaLZP, paq8hp2 (as a preprocessor).
Sep 02 2006 - corrected error in lexical analysis table on data page (found by Szymon Grabowski).
Sep 03 2006 - added paq8hp3 -7 for enwik8 (Hutter prize candidate, verified).
Sep 05 2006 - updated paq8hp3 (enwik9 -8, not verified).
Sep 10 2006 - updated paq8hp4 (verified for enwik8), fixed links to PX and pimple.
Sep 11 2006 - updated paq8hp4 for enwik9 (compression only), added paq1 and expanded PAQ series documentation.
Sep 12 2006 - minor edits in paq8hp1, raq8g descriptions.
Sep 13 2006 - updated paq8hp2 for enwik9.
Sep 14 2006 - updated xml-wrt 3.0.
Sep 15 2006 - updated xml-wrt 3.0|ppmonstr.
Sep 20 2006 - updated paq8hp5 -7 enwik8. Verified paq8hp4 -8 enwik9.
Sep 21 2006 - updated paq8hp5 -8 enwik8.
Sep 23 2006 - updated paq8hp5 -8 enwik9 (not verified).
Sep 24 2006 - added QuickLZ.
Sep 29 2006 - added fpaq0x, fpaq0s2.
Sep 30 2006 - clarified submission dates for paq8hp2 through paq8hp5. Posted paq8hp2 source code.
Oct 01 2006 - updated fpaq0x1a, fpaq0s2b, tc 5.1 dev 1.
Oct 02 2006 - updated tc 5.1 dev 2.
Oct 06 2006 - posted paq8hp3 source code (now top ranked). Added fpaq0x1b.
Oct 08 2006 - added fpaq0s3.
Oct 10 2006 - posted paq8hp4 source code (now top ranked).
Oct 12 2006 - added fpaq0s4.
Oct 13 2006 - added tc 5.1 dev 5.
Oct 15 2006 - verified paq8hp5 -8 enwik9 decompression. Added fpaq0s5.
Oct 16 2006 - added durilca4linux_2 (now top ranked, not yet verified for enwik9).
Oct 18 2006 - updated duricla4linux_2 (-t2(11) option).
Oct 21 2006 - added fpaq2.
Oct 22 2006 - updated QuickLZ 0.9.
Oct 27 2006 - posted paq8hp5 source code (now ranked #2).
Oct 30 2006 - updated fpaq0s6.
Nov 03 2006 - mirrored enwik8.bz2 and enwik9.bz2 to mattmahoney.net/text
Nov 05 2006 - updated paq8hp6. Linked to FV results on data page.
Nov 06 2006 - verified paq8hp6 -7 enwik9 decompression.
Nov 07 2006 - updated fastari.
Nov 10 2006 - added PeaZip.
Nov 15 2006 - added paq8j.
Nov 17 2006 - added paq8ja.
Nov 20 2006 - added fpaq3.
Nov 22 2006 - added paq8jb.
Nov 29 2006 - added paq8jc.
Dec 02 2006 - added fpaq3b.
Dec 08 2006 - added paqh8p7a (enwik8 only), posted paq8hp6 source.
Dec 10 2006 - updated paq8hp7a for enwik9 (not verified).
Dec 12 2006 - added paq8hp7.
Dec 13 2006 - updated paq8hp6 -8 enwik9.
Dec 17 2006 - posted enwik8.pmd and enwik9.pmd (PPMD var. J format).
Dec 21 2006 - added fpaq3c.
Dec 24 2006 - added quad v1.01a, tc 5.1 dev 7.
Dec 28 2006 - added fpaq3d.
Jan 01 2007 - added paq8jd (enwik8 -7).
Jan 02 2007 - updated paq8jd -8 enwik8 (not verified).
Jan 08 2007 - added hook v0.2.
Jan 11 2007 - added hook v0.3.
Jan 12 2007 - added hook v0.3a.
Jan 13 2007 - added tc 5.1dev7x. Fixed hook.zip archive.
Jan 15 2007 - posted paq8hp7 source code. Added hook v0.4.
Jan 17 2007 - completed dmc and Info-Zip 2.3.1.
Jan 19 2007 - added paq8hp8.
Jan 22 2007 - added hook v0.5b.
Jan 27 2007 - added chile 0.4.
Feb 03 2007 - added ocamyd-1.66.final (merged with ocamyd LTCB)
Feb 07 2007 - added hook v0.6.
Feb 08 2007 - added hook v0.6b, quad v1.04a, tc 5.2 dev 2.
Feb 09 2007 - corrected error in tc 5.2 dev 2.
Feb 12 2007 - added ccm_extra 1.03a.
Feb 14 2007 - added hook v0.6c.
Feb 15 2007 - added paq8k -8 enwik8 (not verified).
Feb 20 2007 - added paq8hp9 -7 enwik8 (verified).
Feb 22 2007 - updated paq8hp9 -7 enwik9.
Feb 23 2007 - added link to paq8hp9any (revised paq8hp9, not tested), added quad 1.07b, ccm 1.1.1a.
Mar 02 2007 - added ccm 1.1.2a.
Mar 06 2007 - added LZPXj 1.2h.
Mar 10 2007 - added paq8l enwik8.
Mar 11 2007 - added hook v0.7.
Mar 13 2007 - added hook v0.7b.
Mar 14 2007 - added quad 1.08.
Mar 17 2007 - added hook v0.8.
Mar 18 2007 - added hook v0.8b.
Mar 19 2007 - added hook v0.8c.
Mar 21 2007 - added hook v0.8d, FreeArc 0.36.
Mar 24 2007 - added quad 1.10.
Mar 27 2007 - added paq8hp10 -7 enwik8, posted paq8hp9 source code, added hook v0.8e, M99.
Mar 28 2007 - corrected M99 enwik8 result, updated FreeArc description, removed unsupported quad versions from main table.
Mar 31 2007 - added paq8hp10any -8 enwik8.
Apr 01 2007 - added dark 0.51, opendark.
Apr 02 2007 - updated paq8hp10any -8 enwik9 (decompression not verified), added DGCA 1.10.
Apr 05 2007 - added quad 1.11, quad 1.11HASH2, ccm 1.20a, updated FreeArc description.
Apr 06 2007 - added hook v0.9.
Apr 08 2007 - added freehook 0.2, ccm 1.20d.
Apr 09 2007 - added xmill 0.9.1 (fails), barf, quad 1.12.
Apr 10 2007 - added hook 0.9b, freehook 0.3.
Apr 19 2007 - added M99 v2.1, QuickLZ 1.20 and 1.30beta, lzpm 0.02, tornado 0.1.
Apr 22 2007 - added thor 0.94a.
Apr 23 2007 - added ccm (ccmx) 1.21.
Apr 27 2007 - added slug 1.1b.
Apr 30 2007 - added paq8hp11 -7 enwik8. Posted paq8hp10any source code.
May 03 2007 - added paq8hp11any -8 enwik8, fpaq0p.
May 05 2007 - added lzpm 0.03 and 0.04. Fixed misleading description of DMC algorithm in hook.
May 08 2007 - added lzc 0.01, hook0.9c.
May 09 2007 - added pucrunch, TarsaLZP May 6 2007, thor 0.95, srank 1.1.
May 10 2007 - added paq8hp11any -8 enwik9 (decompression not verified).
May 11 2007 - added lzc 0.03, updated table description (time, memory, algorithms).
May 14 2007 - added paq8hp12 -7 enwik8.
May 16 2007 - added uc2, lzc 0.04.
May 18 2007 - added BriefLZ 1.05.
May 20 2007 - added paq8hp12any -8 enwik8/9 (decompression not verified), lzpm 0.06. Updated times in main table to process times.
May 21 2007 - added paq8hp12any -7/-8 enwik8 (decompression verified), 7zip 4.46a.
May 26 2007 - added lzc 0.05b.
May 29 2007 - added fpaq02.
Jun 01 2007 - added turtle 0.01.
Jun 02 2007 - added turtle 0.02.
Jun 05 2007 - added turtle 0.03.
Jun 08 2007 - added turtle 0.04.
Jun 12 2007 - posted paq8hp11any source code, added turtle 0.05.
Jun 16 2007 - added TarsaLZP ver. Jun 17 2007, FastLZ ver. Jun 12 2007, pim 2.01.
Jun 23 2007 - added turtle 0.07.
Jul 24 2007 - added lpaq1, pim 2.04b, TarsaLZP Jul 18 2007, posted paq8hp12any source code.
Jul 30 2007 - added TarsaLZP Jul 30 2007. Updated rules to allow 1800 MB memory.
Jul 31 2007 - added pim 2.10.
Aug 03 2007 - added sr2.
Aug 07 2007 - added lzpm 0.07. Underlined times and memory to indicate records.
Aug 08 2007 - added pimple2.
Aug 09 2007 - added lzpm 0.08, TarsaLZP Aug 8 2007.
Aug 11 2007 - added TarsaLZP Aug 10 2007.
Aug 13 2007 - added gziphack, retested gzip 1.3.5, Info-ZIP 2.32 Win32.
Aug 14 2007 - added QuickLZ 1.30, compact.
Aug 15 2007 - added lzturbo 0.01, WinTurtle 1.2.
Aug 16 2007 - added paq8fthis2 -8 enwik8, WinTurtle 1.21, lzpm 0.09.
Aug 23 2007 - added paq8n -8 enwik8, paq8osse -8 enwik8, thor 0.96a, lzpm 0.10.
Aug 24 2007 - added paq8o -8 enwik8.
Aug 29 2007 - added lzc 0.06b.
Aug 30 2007 - added HKCC-2 enwik8 decompressor, added link to paq8o ver. 2, added WinTurtle 1.30, qazar 0.0pre5.
Aug 31 2007 - added qc 0.050.
Sep 02 2007 - added HKCC-2 Sep 01 2007 version, WinRK 3.03 SFX.
Sep 06 2007 - added lzpm 0.11.
Sep 13 2007 - added lzpmlite 0.11.
Sep 14 2007 - added paq8o3 -8 enwik8.
Sep 20 2007 - added lpaq2, hook 1.0.
Sep 22 2007 - added paq8o4 v1, rings 0.1.
Sep 29 2007 - added paq8o6 -8 enwik8.
Sep 30 2007 - added lpaq3, elpaq3, lprepaq 1.2.
Oct 01 2007 - added lpaq3a, lpaq3e.
Oct 04 2007 - added lpaq4, lpaq4e.
Oct 05 2007 - added lzturbo 0.1.
Oct 16 2007 - added lpaq5, lpaq5e, withdrew HKCC-2.
Oct 20 2007 - added paq8o7 -8 enwik8.
Oct 23 2007 - added lpaq6, lpaq6e.
Oct 24 2007 - added paq8o8 -8 enwik8.
Oct 25 2007 - added lzc 0.07.
Oct 28 2007 - added rule that benchmark results will be delayed 30 days after the latest version of the program is published.
Nov 09 2007 - added lpaq7, lpaq7e*, xwrt 3.2*, sr3*.
Nov 22 2007 - added quickLZ 1.40, rings 0.2, hook 1.1, lzc 0.08*.
Nov 23 2007 - added lzpm 0.12.
Dec 03 2007 - ranked lpaq7e, xwrt 3.2, sr3, lzc 0.08.
Dec 04 2007 - added and ranked xwrt 3.2|ppmonstr J.
Dec 05 2007 - added symbra 0.2*.
Dec 11 2007 - added lpaq8*, lpaq8e*.
Dec 13 2007 - added lcssr 0.2*.
Dec 16 2007 - uploaded symbra 0.2, lcssr 0.2 mirrors, added fpaqa*, hook 1.3, lzpm 1.3, cmm1, cmm2.
Dec 17 2007 - corrected cmm1, cmm2, ranked cmm1.
Dec 18 2007 - added fpaqb*.
Dec 20 2007 - updated fpaqb v2*, added fpaq0m, bit 0.1*.
Dec 21 2007 - added lpaq1a.
Dec 24 2007 - added fpaqc*.
Dec 25 2007 - added lpq1, rings 0.3*.
Dec 26 2007 - added FreeArc 0.40-pre-4*.
Jan 09 2008 - added fpaq0r, fpaq0rs*, ranked lpaq8e, lcssr 0.2.
Jan 11 2008 - added flashzip 0.01, flashzip 0.02*, winturtle 1.60*, ccmx 1.30*.
Jan 13 2008 - added lzpm 0.14, cmm 080113*. Updated pkzip 2.04 -ex.
Jan 17 2008 - added lzpm 0.15.
Jan 25 2008 - added fpaq0pv2, ranked FreeArc 0.40-pre-4, bit 0.1, rings 0.3, fpaq0mw.
Jan 28 2008 - added fpaq0f*.
Jan 30 2008 - added fpaq0f2*.
Jan 31 2008 - added lzw 0.1, paq9a. Repealed 30 day wait rule and ranked pending compressors marked with *.
Feb 04 2008 - added flashzip 0.3.
Feb 08 2008 - added lzw 0.2, rings 1.0.
Feb 09 2008 - added cmm3 080207.
Feb 11 2008 - added ppp.
Feb 12 2008 - added lzp3o2, updated ppp description.
Feb 13 2008 - added rings 1.1, lzrw1.
Feb 14 2008 - added lzrw1-a, lzrw2, lzrw3, lzrw3-a, lzrw5, updated lzrw1.
Feb 17 2008 - updated lzrw1-a, lzrw2, lzrw3, lzrw3-a, lzrw5 (new .exe sizes).
Feb 21 2008 - added durilca4linux_3.
Feb 22 2008 - added drt|lpaq9e.
Feb 25 2008 - added lzturbo 0.9.
Mar 04 2008 - added rings 1.2.
Mar 09 2008 - added balz 1.02, rzm 0.06c, tornado 0.3.
Mar 13 2008 - added Stuffit 12.0.0.17.
Mar 14 2008 - added cmm4 v0.0.
Apr 02 2008 - added rings 1.3.
Apr 04 2008 - added fpaq0pv3.
Apr 06 2008 - added fpaq0pv5.
Apr 14 2008 - added rings 1.4c.
Apr 15 2008 - updated rings 1.4c description.
Apr 21 2008 - added rings 1.5.
Apr 22 2008 - added durilca4linux_3 v2 (new dictionary).
Apr 28 2008 - added lpaq9f.
May 09 2008 - added balz 1.06.
May 11 2008 - added packet 0.01, slug 1.27, rzm 0.07h.
May 14 2008 - added balz 1.07.
May 18 2008 - added packet 0.02.
May 19 2008 - added fpaq0pv4, fpaq0pv4nc, fpaq0pv4nc0, fpaq0pv4a, fpaq0pv4anc, fpaq0pv4and0.
May 20 2008 - added packet 0.03b, balz 1.08, fpaq0pv4b1.
May 21 2008 - added balz 1.09.
May 22 2008 - added durilca4linux3 v3, cmm4 v0.1e.
May 23 2008 - updated cmm4 v0.1e description, lpaq9g, fcm1.
Jun 03 2008 - added balz 1.12.
Jun 04 2008 - added lpaq9h.
Jun 10 2008 - added paq8o8-intel -1, paq8o8z-jun7 -1.
Jun 12 2008 - added paq8o10t (enwik8 only), balz 1.13.
Jun 13 2008 - added lpaq9i.
Jun 14 2008 - added drt|ppmonstr (under lpaq9i).
Jun 17 2008 - updated paq8o8z (note 25), durilca4linux_3 v3 (2 GB).
Jun 18 2008 - added flzp v1.
Jun 19 2008 - added packet 0.90b.
Jul 17 2008 - added lzgt, lzgt1, lzgt2, lzgt3.
Jul 19 2008 - added nanozip 0.01a, balz 1.15.
Jul 20 2008 - updated nanozip 0.01a -txt, clarified method of creating zip archive of decompressor.
Jul 22 2008 - added pim 2.50, tornado 0.4a, M99 v2.2.1.
Jul 24 2008 - added 4x4 0.2a, bit 0.2b.
Jul 25 2008 - added nanozipltcb.
Jul 26 2008 - added flashzip 0.9.
Jul 28 2008 - corrected Pareto frontier.
Aug 02 2008 - added nanozip 0.03a, lzss 0.01.
Aug 18 2008 - added flashzip 0.91, lpaq9j.
Sep 05 2008 - added size vs. speed and memory graphs.
Sep 26 2008 - added bzp 0.2, ppms J.
Oct 02 2008 - added lpaq9k.
Oct 27 2008 - added nanozip 0.05a.
Oct 28 2008 - added lzgt3a.
Nov 21 2008 - added bit 0.7. Updated test computer (note 26).
Nov 27 2008 - added ppmx 0.01, sr3c.
Nov 28 2008 - added mcomp 2.00.
Dec 02 2008 - added lpaq9l, ppmx 0.02.
Dec 22 2008 - added ppmx 0.03.
Dec 29 2008 - added M1 0.2a.
Jan 02 2009 - added M1 0.3.
Jan 05 2009 - added ppmx 0.04.
Jan 09 2009 - updated link to paq8hp12any.
Jan 28 2009 - added xdelta 3.0u.
Feb 09 2009 - added bcm 0.03.
Feb 11 2009 - added bcm 0.04.
Feb 21 2009 - added drt|lpaq9m.
Mar 02 2009 - added Stuffit 2009 13.0.0.19, nanozip 0.06a, NTFS (LZNT1).
Mar 05 2009 - added bcm 0.05.
Mar 06 2009 - updated bcm 0.05.
Mar 10 2009 - added flashzip 0.93a, fixed links to winturtle, flashzip, rings, hook, packet, bzp.
Mar 12 2009 - added bwmonstr 0.00.
Mar 15 2009 - added bcm 0.07.
Mar 20 2009 - added bwmonstr 0.01.
Mar 26 2009 - added flashzip 0.94, decomp8.
Apr 01 2009 - added runcoder1.
Apr 13 2009 - added lzturbo 0.94, M1 0.3b.
Apr 14 2009 - added lzuf.
Apr 16 2009 - added M1 0.3b parameter e8-m103b1-mh.
Apr 17 2009 - added lzp2.
Apr 18 2009 - added csc2.
Apr 21 2009 - added paq8p3, paq8p3 v2.
Apr 22 2009 - added decomp8b.
Apr 22 2009 - added lzbw1 0.8.
Apr 29 2009 - added hook 1.4.
May 08 2009 - updated opendark-A.
May 26 2009 - added decmprs8.
Jun 01 2009 - added bcm 0.08.
Jun 02 2009 - added reorder_v2|bcm 0.08.
Jun 05 2009 - updated reorder_v2|bcm 0.08 xlt.
Jul 14 2009 - added bwmonstr 0.02
Jul 16 2009 - updated bwmonstr 0.02 comments.
Jul 21 2009 - added durilca'kingsize