/* PAQ7 file compressor/archiver. (C) 2005, Matt Mahoney under GPL, http://www.gnu.org/licenses/gpl.txt PAQ7 is a file compressor and archiver. To use: paq7 archive files... If the archive does not exist, then it is created and the files are compresed and stored in the archive. File names are stored in the archive exactly as entered (with or without a path). If the archive already exists, then the files are extracted in the same order as they were stored. If the file names are omitted, then the stored names are used instead. PAQ7 never clobbers files. If the file to be extracted already exists, then PAQ7 compares it with the archived copy and reports whether they differ. Example: paq7 xyz foo bar Compresses foo and bar to archive xyz. paq7 xyz baz Extracts baz as a copy of foo, compares bar. If PAQ7 is run under UNIX or Linix, or compiled with g++ for Windows, then wildcards may be used. You can also enter the file names, one per line to standard input ending with a blank line or EOF. Example: dir/b *.txt | paq7 archive is equivalent to: paq7 archive *.txt The archive has a human-readable header. To view the stored file names and sizes: more < archive PAQ7 cannot update an archive. You must create a new one instead. You cannot extract a subset of the stored files, but you can interrupt it with ^C after the files you want are extracted. Neither the archive nor the total size of all files can be larger than 2 gigabytes, even if your file system supports such large files. File names (including path) cannot be longer than 511 characters or contain carriage returns, linefeeds, ^Z (ASCII 26) or NUL (ASCII 0). If any file names contain 8-bit (European) extended ASCII, then do not pipe from dir/b, use wildcards. File names with spaces on the command line should be quoted "like this". TO COMPILE MINGW g++: nasm paq7asm.asm -f win32 --prefix _ g++ paq7.cpp -O2 -s -o paq7.exe paq7asm.obj DJGPP g++: nasm paq7asm.asm -f coff --prefix _ gxx paq7.cpp -O2 -s -o paq7.exe paq7asm.o Borland: nasm paq7asm.asm -f obj --prefix _ bcc32 -O -w-8027 paq7.cpp paq7asm.obj Mars: nasm paq7asm.asm -f obj --prefix _ dmc paq7.cpp paq7asm.obj UNIX/Linux (PC): nasm -f elf paq7asm.asm g++ paq7.cpp -O2 -s -o paq7 paq7asm.o Non PC (UNIX): g++ paq7.cpp -O2 -DNOASM -s -o paq7 The program runs a little faster compiled under g++ than Borland or Mars. I did not test with Microsoft or Intel C++. Either nasm "-f obj" or "-f win32" should link. Always use "--prefix _" under Windows. Some older versions of Linux would use "-f aout" instead of "-f elf". You can compile without NASM in any of the above examples by adding -DNOASM, but it will run about 1/3 slower, so you really should get it. (http://nasm.sourceforge.net). It is easy to install, just one executable program put in your path. The NASM code requires a Pentium-MMX or higher, so it should work on any Windows or Linux PC made since about 1997. ARCHIVE FILE FORMAT An archive has the following format. It is intended to be both human and machine readable. The header ends with CTRL-Z (Windows EOF) so that the binary compressed data is not displayed on the screen. paq7 CR LF size TAB filename CR LF size TAB filename CR LF ... CTRL-Z compressed binary data Decompression will fail if the first 4 bytes are not "paq7". Sizes are stored as decimal numbers. CR, LF, TAB, CTRL-Z are ASCII codes 13, 10, 9, 26 respectively. The binary data is arithmetic coded as the shortest base 256 fixed point number x = SUM_i x_i 256^-1-i such that p(= 8. Furthermore there are no states where both n0 and n1 are large: n0 < 64. n1 < 64. If n0 = 1 then n1 < 32. If n1 = 1 then n0 < 32. If n0 = 2 then n1 < 16. If n1 = 2 then n0 < 16. If n0 = 3 then n1 < 8. If n1 = 3 then n0 < 8. The primaty output is t_i := stretch(sm(n0,n1,h)), where sm(.) is a stationary map with K = 1/256, initiaized to sm(n0,n1,h) = (n1+1)/(n+2). Four additional inputs may also be computed to improve compression slightly: p1_i = sm(n0,n1,h) p0_i = 1 - p1_i t_i := stretch(p_1) t_i+1 := K1 (p1_i - p0_i) t_i+2 := K2 (-p0_i if n1 = 0, p1_i if n0 = 0, else 0) t_i+3 := K3 (t_i if n1 = 0, -t_i if n0 = 0, else 0) t_i+4 := K4 (-p0_i if lb(h) = 0, p1_i if lb(h) = 1) where K1..K4 are ad-hoc constants, and lb(h) is the last bit as stored in h. The update rule discards counts from n1 when y_j = 0 and vice versa. If n1 > 2 then n1 := 2 log2(n1) and vice versa. This rule tends to favor newer data over old. If an increment leads to a nonexistent state, then go to the nearest state with about the same ratio (n0+1/4)/(n1+1/4). For example, state (7,3) on bit 0 would lead to nonexistent state (8,3), so go to (5,2) instead. State (31,1) on 0 would go to nonexistent state (32,1) so go to (8,0) instead. h is updated as follows: for n < 4, store y_j in h. For n < 8, set h := y_j. For n >= 8, h = 0. - Match Model. The state is (c,b), initially (0,0), where c is 1 if the context was previously seen, else 0, and b is the next bit in this context. The prediction is: t_i := (2b - 1)Kc log(m + 1) where m is the length of the context. The update rule is c := 1, b := y_j. A match model can be implemented efficiently by storing input in a buffer and storing pointers into the buffer into a hash table indexed by context. Then c is indicated by a hash table entry and b can be retrieved from the buffer. CONTEXTS High compression is achieved by combining a large number of contexts. Most (not all) contexts start on a byte boundary and end on the bit immediately preceding the predicted bit. The contexts below are modeled with both a run map and a nonstationary map unless indicated. - Order n. The last n bytes, up to about 16. For general purpose data. Most of the compression occurs here for orders up to about 6. An order 0 context includes only the 0-7 bits of the partially coded byte and the number of these bits (255 possible values). - Sparse. Usually 1 or 2 of the last 8 bytes preceding the byte containing the predicted bit, e.g (2), (3),..., (8), (1,3), (1,4), (1,5), (1,6), (2,3), (2,4), (3,6), (4,8). The ordinary order 1 and 2 context, (1) or (1,2) are included above. Useful for binary data. - Text. Contexts consists of whole words (a-z, converted to lower case and skipping other values). Contexts may be sparse, e.g (0,2) meaning the current (partially coded) word and the second word preceding the current one. Useful contexts are (0), (0,1), (0,1,2), (0,2), (0,3), (0,4). The preceding byte may or may not be included as context in the current word. - Formatted text. The column number (determined by the position of the last linefeed) is combined with other contexts: the charater to the left and the character above it. - Fixed record length. The record length is determined by searching for byte sequences with a uniform stride length. Once this is found, then the record length is combined with the context of the bytes immediately preceding it and the corresponding byte locations in the previous one or two records (as with formatted text). - Context gap. The distance to the previous occurrence of the order 1 or order 2 context is combined with other low order (1-2) contexts. - FAX. For 2-level bitmapped images. Contexts are the surrounding pixels already seen. Image width is assumed to be 1728 bits (as in calgary/pic). - Image. For uncompressed 24-bit color BMP and TIFF images. Contexts are the high order bits of the surrounding pixels and linear combinations of those pixels, including other color planes. The image width is detected from the file header. When an image is detected, other models are turned off to improve speed. - JPEG. Files are further compressed by partially uncompressing back to the DCT coefficients to provide context for the next Huffman code. Only baseline DCT-Huffman coded files are modeled. (This ia about 90% of images, the others are usually progresssive coded). JPEG images embedded in other files (quite common) are detected by headers. The baseline JPEG coding process is: - Convert to grayscale and 2 chroma colorspace. - Sometimes downsample the chroma images 2:1 or 4:1 in X and/or Y. - Divide each of the 3 images into 8x8 blocks. - Convert using 2-D discrete cosine transform (DCT) to 64 12-bit signed coefficients. - Quantize the coefficients by integer division (lossy). - Split the image into horizontal slices coded independently, separated by restart codes. - Scan each block starting with the DC (0,0) coefficient in zigzag order to the (7,7) coefficient, interleaving the 3 color components in order to scan the whole image left to right starting at the top. - Subtract the previous DC component from the current in each color. - Code the coefficients using RS codes, where R is a run of R zeros (0-15) and S indicates 0-11 bits of a signed value to follow. (There is a special RS code (EOB) to indicate the rest of the 64 coefficients are 0). - Huffman code the RS symbol, followed by S literal bits. The most useful contexts are the current partially coded Huffman code (including S following bits) combined with the coefficient position (0-63), color (0-2), and last few RS codes. - Match. When a context match of 400 bytes or longer is detected, the next bit of the match is predicted and other models are turned off to improve speed. ARCHITECTURE The context models are mixed by up to 3 of several hundred neural networks selected by a low-order context. The outputs of these networks are combined by weighted averaging with fixed weights, then fed through two stages of adaptive probability maps (APM) before arithmetic coding. The three neural networks are selected by the following contexts: - NN1: Order 0: the current (partially coded) byte, and the number of bits coded (0-7). - NN2: The previous whole byte (does not include current byte). - NN3: The second from last byte. - NN4: The file type (image, long match, or normal), order (0-7), 3 high bits of the previous byte, and whether the last 2 bytes are equal. The outputs are combined by another neural network with the 4 predicitons as inputs. For images and long matches, only one neural network is used and its context is fixed. An APM is a stationary map combining a context and an input probability. The input probability is stretched and divided into 32 segments to combine with other contexts. The output is interpolated between two adjacent quantized values of stretch(p1). There are 2 APM stages in series: p1 := (p1 + 3 APM(order 0, p1)) / 4. p1 := (APM(order 1, p1) + 2 APM(order 2, p1) + APM(order 3, p1)) / 4. IMPLEMENTATION Hash tables are designed to minimize cache misses, which consume most of the CPU time. Most of the memory is used by the nonstationary context models. Contexts are hashed to 32 bits. Hash tables use the low order bits to index a table and an 8 bit checksum of the hash to detect collisions, which have a 1/256 chance of a false match. A context hash is updated every 4 bits to index a table entry containing 15 states (1 byte each) for each of the possible combinations of 0-3 bits that follow. If a collision is detected, 4 adjacent elements within the same 64-byte block are tested (trying within a 32-byte block first). Hash elements are aligned on 64 byte address boundaries so as not to cross cache lines. If all 4 hash elements are full, then the state of the first context (the one on the 4 bit boundary) is used as a priority to select a replacement. The states are ordered by increasing n so that the least used state is replaced first. A zero state is replaced immediately. Run models use 3 byte hash elements consisting of a checksum, a repeat count (0-255) and the byte value. The count also serves as a priority. Stationary models are most appropriate for small contexts, so the context is used as a direct table lookup without hashing. The match model maintains a pointer to the last match until a mismatching bit is found. At the start of the next byte, the hash table is referenced to find another match. The hash table of pointers is updated after each whole byte. There is no checksum. Collisions are detected by comparing the current and matched context in a rotating buffer. The inner loops of the neural network prediction (1) and training (2) algorithms are implemented in MMX assembler, which computes 4 elements at a time. Using assembler is 8 times faster than C++ for this code and 1/3 faster overall. (However I found that SSE2 code on an AMD-64, which computes 8 elements at a time, is not any faster). */ #define PROGNAME "paq7" // Please change this if you change the program. #include #include #include #include #include #include #define NDEBUG // uncomment for debugging (turns on Array bound checks) #include namespace std {} using namespace std; // 8, 16, and 32 bit unsigned types (adjust as appropriate) typedef unsigned char U8; typedef unsigned short U16; typedef unsigned int U32; // min, max functions template inline T max(const T& a, const T& b) {return a inline T min(const T& a, const T& b) {return a clock_t start_time; // in ticks public: void alloc(int n) {memused+=n;} // report memory allocated ProgramChecker(): memused(0) { start_time=clock(); assert(sizeof(U8)==1); assert(sizeof(U16)==2); assert(sizeof(U32)==4); assert(sizeof(short)==2); assert(sizeof(int)==4); } ~ProgramChecker() { printf("Time %1.2f sec, memory %d bytes\n\n", double(clock()-start_time)/CLOCKS_PER_SEC, memused); } } programChecker; //////////////////////////// Array //////////////////////////// // Array a(n); creates n elements of T initialized to 0 bits. // Constructors for T are not called. // Indexing is bounds checked if assertions are on. // a.size() returns n. // Copy, assignment, resize are not supported. // Memory is aligned on a 64 byte boundary. template class Array { int n; // size char *ptr; // allocated memory, zeroed T* data; // start of n elements of aligned data public: explicit Array(int i); // ~Array() {free(ptr);} T& operator[](int i) { #ifndef NDEBUG if (i<0 || i>=n) fprintf(stderr, "%d out of bounds %d\n", i, n), exit(1); #endif return data[i]; } const T& operator[](int i) const { #ifndef NDEBUG if (i<0 || i>=n) fprintf(stderr, "%d out of bounds %d\n", i, n), exit(1); #endif return data[i]; } int size() const {return n;} private: Array(const Array&); // no copy or assignment Array& operator=(const Array&); }; template Array::Array(int i): n(i) { const int sz=64+n*sizeof(T); programChecker.alloc(sz); ptr = (char*)calloc(sz, 1); if (!ptr) fprintf(stderr, "Out of memory\n"), exit(1); data = (T*)(ptr+64-(((long)ptr)&63)); } ////////////////////////////// Buf ///////////////////////////// // Buf(n) buf; creates an array of n bytes (must be a power of 2). // buf[i] returns a reference to the i'th byte with wrap (no out of bounds). // buf(i) returns i'th byte back from pos (i > 0) // buf.size() returns n. int pos; // Number of input bytes in buf (not wrapped) class Buf { Array b; public: Buf(int i): b(i) {assert((i&(i-1))==0);} U8& operator[](int i) {return b[i & (b.size()-1)];} int operator()(int i) const { assert(i>0); return b[(pos-i) & (b.size()-1)]; } int size() const {return b.size();} }; //////////////////////////// context /////////////////////////// // Global context. This information is available to all models. int MEM=0x100000; // Memory usage scale (power of 2), set by command option int y=0; // Last bit, 0 or 1, set by encoder int c0=1; // Last 0-7 bits of the partial byte with a leading 1 bit (1-255) U32 c4=0; // Last 4 whole bytes, packed. Last byte is bits 0-7. int bpos=0; // bits in c0 (0 to 7) Buf buf(MEM*8); // Rotating input queue ///////////////////////////// ilog ////////////////////////////// // ilog(x) = round(log2(x) * 16), 0 <= x < 64K class Ilog { Array t; public: int operator()(U16 x) const {return t[x];} Ilog(); } ilog; // Compute lookup table by numerical integration of 1/x Ilog::Ilog(): t(65536) { U32 x=14155776; for (int i=2; i<65536; ++i) { x+=774541002/(i*2-1); // numerator is 2^29/ln 2 t[i]=x>>24; } } // llog(x) accepts 32 bits inline int llog(U32 x) { if (x>=0x1000000) return 256+ilog(x>>16); else if (x>=0x10000) return 128+ilog(x>>8); else return ilog(x); } ///////////////////////// state table //////////////////////// // State table: // nex(state, 0) = next state if y = 0, 0 <= state < 256 // nex(state, 1) = next state if y = 1 // nex(state, 2) = number of zeros in bit history represented by state // nex(state, 3) = number of ones represented // States represent a bit history within some context. // State 0 is the starting state (no bits seen). // States 1-30 represent all possible sequences of 1-4 bits. // States 31-255 represent a pair of counts, (n0,n1), the number // of 0 and 1 bits respectively. // For states 31-67 where n0+n1 < 8 and n0 > 0 and n1 > 0, there // are two states for each (n0,n1) pair depending on whether // the last bit was a 0 or 1. // There are no states where both n0 and n1 are large: // n0 < 64. n1 < 64. // If n0 = 1 then n1 < 32. If n1 = 1 then n0 < 32. // If n0 = 2 then n1 < 16. If n1 = 2 then n0 < 16. // If n0 = 3 then n1 < 8. If n1 = 3 then n0 < 8. // On bit 0, n0 is incremented and if n1 > 2 then n1 := 2 log2(n1). // (This rule tends to favor newer data over old). // On bit 1, then n1 is incremented and n0 is reduced as above. // If an increment leads to a nonexistent state, then go to the // nearest state with about the same ratio (n0+1/4)/(n1+1/4). // For example, state (7,3) on bit 0 would lead to nonexistent // state (8,3), so go to (5,2) instead. // State (31,1) on 0 would go to nonexistent state (32,1) so // go to (8,0) instead. // State -> last bit: 1 = last bit is 0, 2 = last bit is 1, 0=unknown static const U8 LB[256] = { 0, 1,2, 1,2,1,2, 1,1,1,2,1,2,2,2, 1,1,1,1,2,1,1,1,2,2,2,1,2,2,2,2, 1,1,2,1,2,1,2,1,2,2, 1,1,2,1,2,1,2,1,2,1,2,2, 1,1,2,1,2,1,2,1,2,1,2,1,2,2}; // Initialize nex() class StateTable { Array ns; // state*4 -> next state if 0, if 1, n0, n1 enum {B=5, N=64}; static const int b[B]; // n0 -> max n1, n1 -> max n0 static int t[N+1][N+1][2]; // n0,n1 -> state number, number of states int num_states(int x, int y); void discount(int& x); void next_state(int& x, int& y, int b); public: int operator()(int state, int sel) {return ns[state*4+sel];} StateTable(); } nex; const int StateTable::b[B]={N,32,16,8,5}; // n0 -> max n1, n1 -> max n0 int StateTable::t[N+1][N+1][2]; int StateTable::num_states(int x, int y) { if (x=N || y>=N || y>=B || x>=b[y]) return 0; // States 0-30 are a history of the last 0-4 bits if (x+y<=4) { // x+y choose x = (x+y)!/x!y! int r=1; for (int i=x+1; i<=x+y; ++i) r*=i; for (int i=2; i<=y; ++i) r/=i; return r; } // States 31-255 represent a 0,1 count and possibly the last bit // if the state is reachable by either a 0 or 1. else return 1+(y>0 && x+y<8); } // New value of count x if the opposite bit is observed void StateTable::discount(int& x) { if (x>2) x=ilog(x) >> 3; } // compute next x,y (0 to N) given input b (0 or 1) void StateTable::next_state(int& x, int& y, int b) { if (x0) ns1+=t[x-1][y+1][1]; ns[state*4]=ns0; ns[state*4+1]=ns1; ns[state*4+2]=x; ns[state*4+3]=y; } else if (t[x][y][1]) { next_state(x0, y0, 0); next_state(x1, y1, 1); ns[state*4]=ns0=t[x0][y0][0]; ns[state*4+1]=ns1=t[x1][y1][0]+(t[x1][y1][1]>1); ns[state*4+2]=x; ns[state*4+3]=y; } assert(state>=0 && state<256); assert(t[x][y][1]>0); assert(t[x][y][0]<=state); assert(t[x][y][0]+t[x][y][1]>state); assert(t[x][y][1]<=6); assert(t[x0][y0][1]>0); assert(t[x1][y1][1]>0); assert(ns0-t[x0][y0][0]=0); assert(ns1-t[x1][y1][0]=0); ++state; } } } } ///////////////////////////// Squash ////////////////////////////// // return p = 1/(1 + exp(-d)), d scaled by 8 bits, p scaled by 12 bits int squash(int d) { static const int t[33]={ 1,2,3,6,10,16,27,45,73,120,194,310,488,747,1101, 1546,2047,2549,2994,3348,3607,3785,3901,3975,4022, 4050,4068,4079,4085,4089,4092,4093,4094}; if (d>2047) return 4095; if (d<-2047) return 0; int w=d&127; d=(d>>7)+16; return (t[d]*(128-w)+t[(d+1)]*w+64) >> 7; } //////////////////////////// Stretch /////////////////////////////// // Inverse of squash. d = ln(p/(1-p)), d scaled by 8 bits, p by 12 bits. // d has range -2047 to 2047 representing -8 to 8. p has range 0 to 4095. class Stretch { Array t; public: Stretch(); int operator()(int p) const { assert(p>=0 && p<4096); return t[p]; } } stretch; Stretch::Stretch(): t(4096) { int pi=0; for (int x=-2047; x<=2047; ++x) { // invert squash() int i=squash(x); for (int j=pi; j<=i; ++j) t[j]=x; pi=i+1; } t[4095]=2047; } //////////////////////////// Mixer ///////////////////////////// // Mixer m(N, M, S) combines models using M neural networks with // N inputs each, of which up to S may be selected. If S > 1 then // the outputs of these neural networks are combined using another // neural network (with parameters S, 1, 1). If S = 1 then the // output is direct. It is used as follows: // m.update() trains the network where the expected output is the // last bit (in the global variable y). // m.add(stretch(p)) inputs prediction from one of N models. The // prediction should be positive to predict a 1 bit, negative for 0, // nominally +-256 to +-2K. The maximum allowed value is +-32K but // using such large values may cause overflow if N is large. // m.set(cxt, range) selects cxt as one of 'range' neural networks to // use. 0 <= cxt < range. Should be called up to S times such // that the total of the ranges is <= M. // m.p() returns the output prediction that the next bit is 1 as a // 12 bit number (0 to 4095). // dot_product returns dot product t*w of n elements. n is rounded // up to a multiple of 8. Result is scaled down by 8 bits. #ifdef NOASM // no assembly language int dot_product(short *t, short *w, int n) { int sum=0; n=(n+7)&-8; for (int i=0; i> 8; return sum; } #else // The NASM version uses MMX and is about 8 times faster. extern "C" int dot_product(short *t, short *w, int n); // in NASM #endif // Train neural network weights w[n] given inputs t[n] and err. // w[i] += t[i]*err, i=0..n-1. t, w, err are signed 16 bits (+- 32K). // err is scaled 16 bits (representing +- 1/2). w[i] is clamped to +- 32K // and rounded. n is rounded up to a multiple of 8. #ifdef NOASM void train(short *t, short *w, int n, int err) { n=(n+7)&-8; for (int i=0; i>16)+1>>1); if (wt<-32768) wt=-32768; if (wt>32767) wt=32767; w[i]=wt; } } #else extern "C" void train(short *t, short *w, int n, int err); // in NASM #endif class Mixer { const int N, M, S; // max inputs, max contexts, max context sets Array tx; // N inputs from add() Array wx; // N*M weights Array cxt; // S contexts int ncxt; // number of contexts (0 to S) int base; // offset of next context int nx; // Number of inputs in tx, 0 to N Array pr; // last result (scaled 12 bits) Mixer* mp; // points to a Mixer to combine results public: Mixer(int n, int m, int s=1); // Adjust weights to minimize coding cost of last prediction void update() { for (int i=0; i=-32768 && err<32768); train(&tx[0], &wx[cxt[i]*N], nx, err); } nx=base=ncxt=0; } // Input x (call up to N times) void add(int x) { assert(nx=0 && ncxt=0 && base+cxupdate(); for (int i=0; i>5); mp->add(stretch(pr[i])); } mp->set(0, 1); return mp->p(); } else { // S=1 context return pr[0]=squash(dot_product(&tx[0], &wx[0], nx)>>8); } } }; Mixer::Mixer(int n, int m, int s): N((n+7)&-8), M(m), S(s), tx(N), wx(N*M), cxt(S), ncxt(0), base(0), nx(0), pr(S), mp(0) { assert(N>0 && (N&7)==0 && M>0); for (int i=0; i1) mp=new Mixer(S, 1, 1); } //////////////////////////// APM ////////////////////////////// // A ProbMap map maps a probability and a context into a new probability // that bit y will next be 1. After each guess it updates // its state to improve future guesses. Methods: // // ProbMap pm(N) creates with N contexts, uses 66*N bytes memory. // pm.p(pr, cx, rate=8) returned adjusted probability in context cx (0 to // N-1). rate determines the learning rate (smaller = faster, default 8). // Probabilities are scaled 12 bits (0-4095). class ProbMap { private: int index; // last p, context protected: const int N; // number of contexts Array t; // [N][33]: p, context -> p public: ProbMap(int n): index(0), N(n), t(n*33) {} int p(int pr=2048, int cxt=0, int rate=8) { assert(pr>=0 && pr<4096 && cxt>=0 && cxt0 && rate<32); pr=stretch(pr); int g=(y<<16)+(y<> rate; t[index+1] += g-t[index+1] >> rate; const int w=pr&127; // interpolation weight (33 points) index=((pr+2048) >> 7)+cxt*33; return (t[index]*(128-w)+t[index+1]*w) >> 11; } }; // An APM is a ProbMap initialized such that p(pr, cxt) = pr for all cxt. class APM: public ProbMap { public: APM(int n); }; APM::APM(int n): ProbMap(n) { // initialize t[p] -> p for (int i=0; i probability * 256 class StateMap { int cxt; protected: Array t; // 256 states -> probability * 64K public: StateMap(): cxt(0), t(256) { for (int i=0; i<256; ++i) t[i] = 65536*(nex(i,3)+1)/(nex(i,2)+nex(i,3)+2); } int p(int cx) { assert(cx>=0 && cx<256); t[cxt]+=(y<<16)-t[cxt]+128 >> 8; return t[cxt=cx] >> 4; } }; //////////////////////////// hash ////////////////////////////// // Hash 2-5 ints. inline U32 hash(U32 a, U32 b, U32 c=~0, U32 d=~0, U32 e=~0) { U32 h=a*200002979u+b*30005491u+c*50004239u+d*70004807u+e*110002499u; return h^h>>9^a>>2^b>>3^c>>4^c>>5^e>>6; } ///////////////////////////// BH //////////////////////////////// // A BH maps a 32 bit hash to an array of B bytes (checksum and B-1 values) // // BH bh(N); creates N element table with B bytes each. // N must be a power of 2. The first byte of each element is // reserved for a checksum to detect collisions. The remaining // B-1 bytes are values, prioritized by the first value. This // byte is 0 to mark an unused element. // // bh[i] returns a pointer to the i'th element, such that // bh[i][0] is a checksum of i, bh[i][1] is the priority, and // bh[i][2..B-1] are other values (0-255). // The low lg(n) bits as an index into the table. // If a collision is detected, up to M nearby locations in the same // cache line are tested and the first matching checksum or // empty element is returned. // If no match or empty element is found, then the lowest priority // element is replaced. template class BH { enum {M=4}; // search limit Array t; // elements const U32 n; // size-1 int replaced; public: BH(int i): t(i*B), n(i-1), replaced(0) { assert(B>=2 && i>0 && (i&(i-1))==0); // size a power of 2? } U8* operator[](U32 i) { int chk=(i>>24^i>>12^i)&255; i&=n; int bi=i, b=1024; // best replacement so far U8 *p; for (int j=0; j BH::~BH() { /* int empty=0, once=0; for (int i=1; i %d empty, %d once, %d replaced of %d\n", B, empty, once, replaced, t.size()/B); */ } /////////////////////////// ContextMap ///////////////////////// // // A ContextMap maps contexts to a bit histories and makes predictions // to a Mixer. Methods common to all classes: // // ContextMap cm(M); creates using about M bytes (a power of 2). // cm.set(cx); sets the i'th whole-byte context to cx, // a 32 bit hash of a context ending on a byte boundary. // It should be called before predicting the first bit of each byte. // cm.mix(m) updates Mixer m with the next prediction. Returns 1 // if context cx is found, else 0. It should be called for every bit: // // if (bpos==0) cm.set(cxt). // cm.mix(m) // // The different types are as follows: // // - NonstationaryContextMap. Uses one byte per context (state map). // - SmallNonstationaryContextMap. The context is not a hash, but // must be 0 <= cx < M/256. // - RunContextMap. The bit history is a count of 0-255 consecutive // zeros or ones. Uses 3 bytes per whole byte context. // - ContextMap. Combines a NonstationaryContextMap and a RunContextMap. // - SmallContextMap. Combines a SmallNonstationaryContextMap and an // RunContextMap. // - SmallStationaryContextMap. The context is not hashed. 0 <= cx < M/512. // The state is a 16-bit probability that is adjusted after each // prediction. class NonstationaryContextMap { U32 cxt; // Whole byte context hash. U8* cp; // Pointer to bit-oriented context. BH<16> t; // Array of hash tables mapping context to bit history. StateMap sm; public: int bh() { // return bit history *cp=nex(*cp,y); cp=bpos&3?cp+(y+1<<((bpos&3)-1)):t[cxt^c0*9]+1; return *cp; } NonstationaryContextMap(int m): cxt(0), t(m/16) {cp=t[0]+1;} void set(U32 cx) {cxt=cx;} // Put prediction in m int mix(Mixer& m) { int s=bh(); int p1=sm.p(s); int n0=nex(s,2); int n1=nex(s,3); int st=stretch(p1)>>2; m.add(st); p1>>=4; int p0=255-p1; m.add(p1-p0); m.add(st*(!n1-!n0)); m.add((p0&-!n1)-(p1&-!n0)); m.add((-(LB[s]==1)&p1)-(-(LB[s]==2)&p0)); return s>0; } }; // Context is not hashed class SmallNonstationaryContextMap { U32 cxt; // Whole byte context. U8* cp; // Pointer to bit-oriented context. Array t; // Array of bit histories. StateMap sm; public: int bh() { // return bit history *cp=nex(*cp,y); cp=&t[cxt+c0]; return *cp; } SmallNonstationaryContextMap(int m): cxt(0), t(m/2) {cp=&t[0];} void set(U32 cx) {cxt=cx*256&t.size()-256;} // Put prediction in m int mix(Mixer& m) { int s=bh(); int p1=sm.p(s); int n0=nex(s,2); int n1=nex(s,3); int st=stretch(p1)>>2; m.add(st); p1>>=4; int p0=255-p1; m.add(p1-p0); m.add(st*(!n1-!n0)); m.add((p0&-!n1)-(p1&-!n0)); m.add((-(LB[s]==1)&p1)-(-(LB[s]==2)&p0)); return s>0; } }; // A RunContextMap maps a context into the next byte and a repeat // count up to M. Size should be a power of 2. Memory usage is 3M/4 class RunContextMap { BH<3> t; U8* cp; public: RunContextMap(int m): t(m/4) {cp=t[0]+1;} void set(U32 cx) { // update count if (cp[0]==0 || cp[1]!=buf(1)) cp[0]=1, cp[1]=buf(1); else if (cp[0]<255) ++cp[0]; cp=t[cx]+1; } int p() { // predict next bit if (cp[1]+256>>8-bpos==c0) return ((cp[1]>>7-bpos&1)*2-1)*ilog(cp[0]+1)*8; else return 0; } int mix(Mixer& m) { // return run length m.add(p()); return cp[0]!=0; } }; // ContextMap combines a NonstationaryContextMap and RunContextMap. // Memory usage is 19/16 M. class ContextMap { RunContextMap rcm; NonstationaryContextMap cm; public: ContextMap(int m): rcm(m/4), cm(m) {} void set(U32 cx) { rcm.set(cx); cm.set(cx); } int mix(Mixer& m) { rcm.mix(m); return cm.mix(m); } }; // Combines SmallNonstationaryMap and RunContextMap. // Suports M/512 byte contexts. Memory usage is M bytes. class SmallContextMap { RunContextMap rcm; SmallNonstationaryContextMap scm; public: SmallContextMap(int m): rcm(m/128), scm(m) {} void set(U32 cx) { rcm.set(cx); scm.set(cx); } int mix(Mixer& m) { scm.mix(m); return rcm.mix(m); } }; // Context is looked up directly. m=size is power of 2 in bytes. // Context should be < m/512. High bits are discarded. class SmallStationaryContextMap { Array t; int cxt; U16 *cp; public: SmallStationaryContextMap(int m): t(m/2), cxt(0) { assert((m/2&m/2-1)==0); // power of 2? for (int i=0; i> rate; cp=&t[cxt+c0]; m.add(stretch(*cp>>4)); } }; //////////////////////////// Models ////////////////////////////// // All of the models below take a Mixer as a parameter and write // predictions to it. //////////////////////////// matchModel /////////////////////////// // matchModel() finds the longest matching context and returns its length int matchModel(Mixer& m) { const int MAXLEN=2047; // longest allowed match + 1 static Array t(MEM); // hash table of pointers to contexts static int h=0; // hash of last 7 bytes static int ptr=0; // points to next byte of match if any static int len=0; // length of match, or 0 if no match static int result=0; if (!bpos) { h=h*997*8+buf(1)+1&t.size()-1; // update context hash if (len) ++len, ++ptr; else { // find match ptr=t[h]; if (ptr && pos-ptrMAXLEN) len=MAXLEN; int sgn; if (len && buf(1)==buf[ptr-1] && c0==buf[ptr]+256>>8-bpos) { if (buf[ptr]>>7-bpos&1) sgn=1; else sgn=-1; } else sgn=len=0; m.add(sgn*4*ilog(len)); m.add(sgn*64*min(len, 32)); return result; } //////////////////////////// picModel ////////////////////////// // Model a 1728 by 2376 2-color CCITT bitmap image, left to right scan, // MSB first (216 bytes per row, 513216 bytes total). Insert predictions // into m. void picModel(Mixer& m) { static U32 r0, r1, r2, r3; // last 5 rows, bit 8 is over current pixel static Array t(0x10200); // model: cxt -> state const int N=3; // number of contexts static int cxt[N]; // contexts static StateMap sm[N]; // update the model for (int i=0; i>(7-bpos))&1); r2+=r2+((buf(431)>>(7-bpos))&1); r3+=r3+((buf(647)>>(7-bpos))&1); cxt[0]=r0&0x7|r1>>4&0x38|r2>>3&0xc0; cxt[1]=0x100+(r0&1|r1>>4&0x3e|r2>>2&0x40|r3>>1&0x80); cxt[2]=0x200+(r0&0x3f^r1&0x3ffe^r2<<2&0x7f00^r3<<5&0xf800); // predict for (int i=0; i wpos(MEM); // last position of word static int nl1=-3, nl=-2; // previous, current newline position // Update word hashes if (bpos==0) { int c=c4&255; if (c>='A' && c<='Z') c+='a'-'A'; if (c>='a' && c<='z') word0^=hash(word0, c); else if (word0) { word4=word3; word3=word2; word2=word1; word1=word0; word0=0; } if (c==10) nl1=nl, nl=pos-1; int i=0; int col=min(255, pos-nl), above=buf[nl1+col]; // text column context U32 h=hash(++i, word0, buf(1)); int w=h&wpos.size()-1; cm1.set(h); cm2.set(word0); cm3.set(hash(++i, h, word1)); cm4.set(hash(++i, word0, word1)); cm5.set(hash(++i, h, word2)); cm6.set(hash(++i, h, word1, word2)); cm7.set(hash(++i, h, word3)); cm8.set(hash(++i, h, word4)); wcm.set(hash(++i, w, w>>8, llog(pos-wpos[w])>>4)); wpos[w]=pos; // Text column models lcm1.set(h=hash(++i, col, buf(1))); lcm2.set(h=hash(++i, buf(1), above)); lcm3.set(h=hash(++i, col, above)); } // Predict cm1.mix(m); cm2.mix(m); cm3.mix(m); cm4.mix(m); cm5.mix(m); cm6.mix(m); cm7.mix(m); cm8.mix(m); wcm.mix(m); lcm1.mix(m); lcm2.mix(m); lcm3.mix(m); } //////////////////////////// recordModel /////////////////////// // Model 2-D data with fixed record length. Also order 1-2 models // that include the distance to the last match. void recordModel(Mixer& m) { const int MAXRUN=1024; static Array run(MAXRUN); // candidate record lengths static Array cpos1(256), cpos2(256); // buf(1) -> last 2 positions static Array wpos1(0x10000); // last position of order 2 context static Array wpos3(MEM); // last position of order 3 hash static int rlen=2; // current, previous run length static ContextMap cm1(MEM), cm2(MEM), cm3(MEM), cm4(MEM), cm5(MEM), cm6(MEM); // Find record length if (!bpos) { int c=buf(1); int w=c4&0xffff; int r=pos-cpos1[c]; if (r1 && r==cpos1[c]-cpos2[c]) { if (++run[r] > 15) { rlen=r; memset(&run[0], 0, MAXRUN); } } // Set 2 dimensional contexts int col=pos%rlen; cm1.set(hash(1, rlen, buf(1), buf(rlen))); cm2.set(hash(2, rlen, buf(rlen), buf(rlen*2))); cm3.set(hash(3, rlen, buf(rlen), col)); cm4.set(hash(4, rlen, buf(1), col)); cm5.set(hash(5, buf(1), min(255, pos-cpos1[buf(1)]))); cm6.set(hash(6, buf(1), buf(2), llog(pos-wpos1[w])>>2)); // update last context positions cpos2[c]=cpos1[c]; cpos1[c]=pos; wpos1[w]=pos; } cm1.mix(m); cm2.mix(m); cm3.mix(m); cm4.mix(m); cm5.mix(m); cm6.mix(m); } //////////////////////////// sparseModel /////////////////////// // Model order 1-2 contexts with gaps. void sparseModel(Mixer& m) { static ContextMap cm13(MEM/4), cm14(MEM/4), cm15(MEM/4), cm16(MEM/4), cm23(MEM/4), cm24(MEM/4), cm36(MEM/4), cm48(MEM/4); static SmallContextMap scm1(0x20000), scm2(0x20000), scm3(0x20000), scm4(0x20000), scm5(0x20000), scm6(0x20000), scm7(0x20000), scm8(0x20000); if (bpos==0) { int i=0; cm13.set(hash(++i, buf(1), buf(3))); cm14.set(hash(++i, buf(1), buf(4))); cm15.set(hash(++i, buf(1), buf(5))); cm16.set(hash(++i, buf(1), buf(6))); cm23.set(hash(++i, buf(2), buf(3))); cm24.set(hash(++i, buf(2), buf(4))); cm36.set(hash(++i, buf(3), buf(6))); cm48.set(hash(++i, buf(4), buf(8))); scm1.set(buf(1)); scm2.set(buf(2)); scm3.set(buf(3)); scm4.set(buf(4)); scm5.set(buf(5)); scm6.set(buf(6)); scm7.set(buf(7)); scm8.set(buf(8)); } cm13.mix(m); cm14.mix(m); cm15.mix(m); cm16.mix(m); cm23.mix(m); cm24.mix(m); cm36.mix(m); cm48.mix(m); scm1.mix(m); scm2.mix(m); scm3.mix(m); scm4.mix(m); scm5.mix(m); scm6.mix(m); scm7.mix(m); scm8.mix(m); } //////////////////////////// bmpModel ///////////////////////////////// // Model a 24-bit color uncompressed .bmp or .tif file. Return // width in pixels if an image file is detected, else 0. // 32-bit little endian number at buf(i)..buf(i-3) inline U32 i4(int i) { return buf(i)+256*buf(i-1)+65536*buf(i-2)+16777216*buf(i-3); } // 16-bit inline int i2(int i) { return buf(i)+256*buf(i-1); } // Square buf(i) inline int sqrbuf(int i) {return buf(i)*buf(i);} int bmpModel(Mixer& m) { static int width=0, w=0; // width of image in pixels, bytes static int eoi=0; // end of image static U32 tiff=0; // offset of tif header const int SC=0x20000; static SmallStationaryContextMap scm1(SC), scm2(SC), scm3(SC), scm4(SC), scm5(SC), scm6(SC*2); static ContextMap cm1(MEM/2), cm2(MEM/2), cm3(MEM/2), cm4(MEM/2), cm5(MEM/2), cm6(0x40000), cm7(0x40000), cm8(MEM/2); // Detect .bmp file header (24 bit color, not compressed) if (!bpos && pos>=eoi && buf(54)=='B' && buf(53)=='M' && i4(44)==54 && i4(40)==40 && i4(24)==0) { const int iwidth=i4(36); // image width const int height=i4(32); eoi=pos+i4(20); // image size in bytes w=(eoi-pos)/height; // width in bytes width=w/3; // width includes pixels on right edge not displayed if (eoi-pos>=iwidth*height*3 && eoi-pos<=(iwidth+3)*height*3 && width>=1 && height>=1) { printf("BMP %dx%d ", width, height); } else eoi=pos; } // Detect .tif file header (24 bit color, not compressed). // Parsing is crude, won't work with weird formats. if (!bpos && pos>=eoi) { if (c4==0x49492a00) tiff=pos; // Intel format only if (pos-tiff==4 && c4!=0x08000000) tiff=0; // 8=normal offset to directory if (tiff && pos-tiff==200) { // most of directory should be read by now int dirsize=i2(pos-tiff-4); // number of 12-byte directory entries width=w=0; int bpp=0, compression=0, height=0; for (int i=tiff+6; i0; i+=12) { int tag=i2(pos-i); // 256=width, 257==height, 259: 1=no compression // 277=3 samples/pixel int tagfmt=i2(pos-i-2); // 3=short, 4=long int taglen=i4(pos-i-4); // number of elements in tagval int tagval=i4(pos-i-8); // 1 long, 1-2 short, or points to array if ((tagfmt==3||tagfmt==4) && taglen==1) { if (tag==256) width=tagval; if (tag==257) height=tagval; if (tag==259) compression=tagval; // 1 = no compression if (tag==277) bpp=tagval; // should be 3 } } if (width>0 && height>0 && width*height>50 && compression==1 && (bpp==1||bpp==3)) eoi=tiff+width*height*bpp, w=width*bpp; if (eoi>pos) printf("TIFF %dx%dx%d ", width, height, bpp); else width=tiff=w=0; } } if (pos>eoi) return w=width=0; // Select nearby pixels as context if (!bpos) { int color=pos%3; int mean=buf(3)+buf(w-3)+buf(w)+buf(w+3); const int var=sqrbuf(3)+sqrbuf(w-3)+sqrbuf(w)+sqrbuf(w+3)-mean*mean/4>>2; mean>>=2; const int logvar=ilog(var); int i=0; cm1.set(hash(++i, buf(3)>>2, buf(w)>>2, color)); cm2.set(hash(++i, buf(3)>>2, buf(1)>>2, color)); cm3.set(hash(++i, buf(3)>>2, buf(2)>>2, color)); cm4.set(hash(++i, buf(w)>>2, buf(1)>>2, color)); cm5.set(hash(++i, buf(w)>>2, buf(1)>>2, color)); cm6.set(hash(++i, buf(3)+buf(w)>>1, color)); cm7.set(hash(++i, buf(3)+buf(w)>>3, buf(1)>>5, buf(2)>>5, color)); cm8.set(hash(++i, mean, logvar>>5, color)); scm1.set(buf(3)+buf(w)>>1); scm2.set(buf(3)+buf(w)-buf(w+3)>>1); scm3.set(buf(3)*2-buf(6)>>1); scm4.set(buf(w)*2-buf(w*2)>>1); scm5.set(buf(3)+buf(w)-buf(w-3)>>1); scm6.set(mean>>1|logvar<<1&0x180); } // Predict next bit scm1.mix(m); scm2.mix(m); scm3.mix(m); scm4.mix(m); scm5.mix(m); scm6.mix(m); cm1.mix(m); cm2.mix(m); cm3.mix(m); cm4.mix(m); cm5.mix(m); cm6.mix(m); cm7.mix(m); cm8.mix(m); return w; } //////////////////////////// jpegModel ///////////////////////// // Model JPEG. Return 1 if a JPEG file is detected or else 0. // Only the baseline and 8 bit extended Huffman coded DCT modes are // supported. The model partially decodes the JPEG image to provide // context for the Huffman coded symbols. // Print a JPEG segment at buf[p...] for debugging void dump(const char* msg, int p) { printf("%s:", msg); int len=buf[p+2]*256+buf[p+3]; for (int i=0; i ht(8); // pointers to Huffman table headers static int htsize=0; // number of pointers in ht // Huffman decode state static U32 huffcode=0; // Current Huffman code including extra bits static int huffbits=0; // Number of valid bits in huffcode static int huffsize=0; // Number of bits without extra bits static int rs=-1; // Decoded huffcode without extra bits. It represents // 2 packed 4-bit numbers, r=run of zeros, s=number of extra bits for // first nonzero code. huffcode is complete when rs >= 0. // rs is -1 prior to decoding incomplete huffcode. static int mcupos=0; // position in MCU (0-639). The low 6 bits mark // the coefficient in zigzag scan order (0=DC, 1-63=AC). The high // bits mark the block within the MCU, used to select Huffman tables. // Decoding tables static Array huf(128); // Tc*64+Th*16+m -> min, max, val static int mcusize=0; // number of coefficients in an MCU static int linesize=0; // width of image in MCU static int hufsel[2][10]; // DC/AC, mcupos/64 -> huf decode table // Image state static Array color(10); // block -> component (0-3) static Array pred(4); // component -> last DC value static int dc=0; // DC value of the current block static int width=0; // Image width in MCU static int row=0, column=0; // in MCU (column 0 to width-1) static Buf cbuf(0x20000); // Rotating buffer of coefficients, coded as: // DC: level shifted absolute value, low 4 bits discarded, i.e. // [-1023...1024] -> [0...255]. // AC: as an RS code: a run of R (0-15) zeros followed by an S (0-15) // bit number, or 00 for end of block (in zigzag order). // However if R=0, then the format is ssss11xx where ssss is S, // xx is the first 2 extra bits, and the last 2 bits are 1 (since // this never occurs in a valid RS code). static int cpos=0; // position in cbuf static U32 huff1=0, huff2=0, huff3=0, huff4=0; // hashes of last codes static int rs1, rs2, rs3, rs4; // last 4 RS codes static int ssum=0, ssum1=0, ssum2=0, ssum3=0, ssum4=0; // sum of S in RS codes in block and last 4 values // Be sure to quit on a byte boundary if (!bpos) next_jpeg=jpeg; if (bpos && !jpeg) return next_jpeg; if (!bpos && app>0) --app; if (app>0) return next_jpeg; if (!bpos) { // Parse. Baseline DCT-Huffman JPEG syntax is: // SOI APPx... misc... SOF0 DHT... SOS data EOI // SOI (= FF D8) start of image. // APPx (= FF Ex) len ... where len is always a 2 byte big-endian length // including the length itself but not the 2 byte preceding code. // Application data is ignored. There may be more than one APPx. // misc codes are DQT, DNL, DRI, COM (ignored). // SOF0 (= FF C0) len 08 height width Nf [C HV Tq]... // where len, height, width (in pixels) are 2 bytes, Nf is the repeat // count (1 byte) of [C HV Tq], where C is a component identifier // (color, 0-3), HV is the horizontal and vertical dimensions // of the MCU (high, low bits, packed), and Tq is the quantization // table ID (not used). An MCU (minimum compressin unit) consists // of 64*H*V DCT coefficients for each color. // DHT (= FF C4) len [TcTh L1...L16 V1,1..V1,L1 ... V16,1..V16,L16]... // defines Huffman table Th (1-4) for Tc (0=DC (first coefficient) // 1=AC (next 63 coefficients)). L1..L16 are the number of codes // of length 1-16 (in ascending order) and Vx,y are the 8-bit values. // A V code of RS means a run of R (0-15) zeros followed by S (0-15) // additional bits to specify the next nonzero value, negative if // the first additional bit is 0 (e.g. code x63 followed by the // 3 bits 1,0,1 specify 7 coefficients: 0, 0, 0, 0, 0, 0, 5. // Code 00 means end of block (remainder of 63 AC coefficients is 0). // SOS (= FF DA) len Ns [Cs TdTa]... 0 3F 00 // Start of scan. TdTa specifies DC/AC Huffman tables (0-3, packed // into one byte) for component Cs matching C in SOF0, repeated // Ns (1-4) times. // EOI (= FF D9) is end of image. // Huffman coded data is between SOI and EOI. Codes may be embedded: // RST0-RST7 (= FF D0 to FF D7) mark the start of an independently // compressed region. // DNL (= FF DC) 04 00 height // might appear at the end of the scan (ignored). // FF 00 is interpreted as FF (to distinguish from RSTx, DNL, EOI). // Detect JPEG (SOI, APPx) if (!jpeg && buf(4)==FF && buf(3)==SOI && buf(2)==FF && buf(1)>>4==0xe) { jpeg=1; app=sos=sof=htsize=data=mcusize=linesize=0; huffcode=huffbits=huffsize=mcupos=cpos=0, rs=-1; memset(&huf[0], 0, huf.size()*sizeof(HUF)); memset(&pred[0], 0, pred.size()*sizeof(int)); } // Detect end of JPEG when data contains a marker other than RSTx // or byte stuff (00). if (jpeg && data && buf(2)==FF && buf(1) && (buf(1)&0xf8)!=RST0) { jassert(buf(1)==EOI); jpeg=0; } if (!jpeg) return next_jpeg; // Detect APPx or COM field if (!data && !app && buf(4)==FF && (buf(3)>>4==0xe || buf(3)==COM)) app=buf(2)*256+buf(1)+2; // Save pointers to sof, ht, sos, data, if (buf(5)==FF && buf(4)==SOS) { int len=buf(3)*256+buf(2); if (len==6+2*buf(1) && buf(1) && buf(1)<=4) // buf(1) is Ns sos=pos-5, data=sos+len+2; } if (buf(4)==FF && buf(3)==DHT && htsize<8) ht[htsize++]=pos-4; if (buf(4)==FF && buf(3)==SOF0) sof=pos-4; // Restart if (buf(2)==FF && (buf(1)&0xf8)==RST0) { huffcode=huffbits=huffsize=mcupos=0, rs=-1; memset(&pred[0], 0, pred.size()*sizeof(int)); } } { // Build Huffman tables // huf[Tc][Th][m] = min, max+1 codes of length m, pointer to byte values if (pos==data && bpos==1) { jassert(htsize>0); for (int i=0; i>4, th=buf[p]&15; if (tc>=2 || th>=4) break; jassert(tc>=0 && tc<2 && th>=0 && th<4); HUF* h=&huf[tc*64+th*16]; // [tc][th][0]; int val=p+17; // pointer to values int code=0; for (int j=0; j<16; ++j) { h[j].min=code; h[j].max=code+=buf[p+j+1]; h[j].val=val; val+=buf[p+j+1]; code*=2; } p=val; } jassert(p==end); } huffcode=huffbits=huffsize=0, rs=-1; // Build Huffman table selection table (indexed by mcupos). // Get image width. if (!sof && sos) return next_jpeg; int ns=buf[sos+4]; int nf=buf[sof+9]; jassert(ns<=4 && nf<=4); mcusize=0; // blocks per MCU int hmax=0; // MCU horizontal dimension for (int i=0; i>4>hmax) hmax=hv>>4; hv=(hv&15)*(hv>>4); // number of blocks in component C jassert(hv>=1 && hv+mcusize<=10); while (hv) { jassert(mcusize<10); hufsel[0][mcusize]=buf[sos+2*i+6]>>4&15; hufsel[1][mcusize]=buf[sos+2*i+6]&15; jassert (hufsel[0][mcusize]<4 && hufsel[1][mcusize]<4); color[mcusize]=i; --hv; ++mcusize; } } } } jassert(hmax>=1 && hmax<=10); width=buf[sof+7]*256+buf[sof+8]; // in pixels int height=buf[sof+5]*256+buf[sof+6]; printf("JPEG %dx%d ", width, height); width=(width-1)/(hmax*8)+1; // in MCU jassert(width>0); mcusize*=64; // coefficients per MCU row=column=0; } } // Decode Huffman { if (mcusize && buf(1+(!bpos))!=FF) { // skip stuffed byte jassert(huffbits<=32); huffcode+=huffcode+y; ++huffbits; if (rs<0) { jassert(huffbits>=1 && huffbits<=16); const int ac=(mcupos&63)>0; jassert(mcupos>=0 && (mcupos>>6)<10); jassert(ac==0 || ac==1); const int sel=hufsel[ac][mcupos>>6]; jassert(sel>=0 && sel<4); const int i=huffbits-1; jassert(i>=0 && i<16); const HUF *h=&huf[ac*64+sel*16]; // [ac][sel]; jassert(h[i].min<=h[i].max && h[i].val0); if (huffcode=h[i].min); rs=buf[h[i].val+huffcode-h[i].min]; huffsize=huffbits; } } if (rs>=0) { if (huffsize+(rs&15)==huffbits) { // done decoding huff4=huff3; huff3=huff2; huff2=huff1; huff1=hash(huffcode, huffbits); rs4=rs3; rs3=rs2; rs2=rs1; rs1=rs; int x=0; // decoded extra bits if (mcupos&63) { // AC if (rs==0) { // EOB mcupos=mcupos+63&-64; jassert(mcupos>=0 && mcupos<=mcusize && mcupos<=640); while (cpos&63) cbuf[cpos++]=0; } else { // rs = r zeros + s extra bits for the next nonzero value // If first extra bit is 0 then value is negative. jassert((rs&15)<=10); const int r=rs>>4; const int s=rs&15; jassert(mcupos>>6==mcupos+r>>6); mcupos+=r+1; x=huffcode&(1<>s-1)) x-=(1<=1; --i) cbuf[cpos++]=i<<4|s; cbuf[cpos++]=s<<4|huffcode<<2>>s&3|12; ssum+=s; } } else { // DC: rs = 0S, s<12 jassert(rs<12); ++mcupos; x=huffcode&(1<>rs-1)) x-=(1<=0 && mcupos>>6<10); const int comp=color[mcupos>>6]; jassert(comp>=0 && comp<4); dc=pred[comp]+=x; jassert((cpos&63)==0); cbuf[cpos++]=dc+1023>>3; ssum4=ssum3; ssum3=ssum2; ssum2=ssum1; ssum1=ssum; ssum=rs; } jassert(mcupos>=0 && mcupos<=mcusize); if (mcupos>=mcusize) { mcupos=0; if (++column==width) column=0, ++row; } huffcode=huffsize=huffbits=0, rs=-1; } } } } // Estimate next bit probability if (!jpeg || !data) return next_jpeg; // Context model const int N=15; // size of t, number of contexts static BH<8> t(MEM); // context hash -> bit history // As a cache optimization, the context does not include the last 1-2 // bits of huffcode if the length (huffbits) is not a multiple of 3. // The 7 mapped values are for context+{"", 0, 00, 01, 1, 10, 11}. static Array cxt(N); // context hashes static Array cp(N); // context pointers static StateMap sm[N]; static Mixer m1(N+1, 800, 4); static APM a1(1024), a2(0x10000); // Update model if (cp[N-1]) { for (int i=0; i2 || huffbits==0) hbcount=0; const int hc=huffcode|1<>6]<<6; jassert(coef>=0 && coef<256); if (hbcount==0) { const int mpos=mcupos>>4|!(mcupos&-64)<<7; int n=0; cxt[0]=hash(++n, hc, mcupos>>2, min(3, mcupos&63)); cxt[1]=hash(++n, hc, mpos>>4, cbuf[cpos-mcusize]); cxt[2]=hash(++n, hc, mpos>>4, cbuf[cpos-width*mcusize]); cxt[3]=hash(++n, hc, ilog(ssum3), mcupos); cxt[4]=hash(++n, hc, mcupos, column>>3); cxt[5]=hash(++n, hc, mcupos, column>>1); cxt[6]=hash(++n, hc, rs1, mpos); cxt[7]=hash(++n, hc, rs1, rs2); cxt[8]=hash(++n, hc, rs1, rs2, rs3); cxt[9]=hash(++n, hc, ssum>>4, mcupos); cxt[10]=hash(++n, hc, mpos, cbuf[cpos-1]); cxt[11]=hash(++n, hc, dc); cxt[12]=hash(++n, hc, rs1, coef); cxt[13]=hash(++n, hc, rs1, rs2, coef); cxt[14]=hash(++n, hc, mcupos>>3, ssum3>>3); } // Predict next bit m1.add(128); assert(hbcount<=2); for (int i=0; i400) { // Model long matches directly m.set(0, 8); return m.p(); } else if (isjpeg) { m.set(1, 8); m.set(c0, 256); m.set(buf(1), 256); return m.p(); } else if (isbmp) { static int col=0; if (++col>=24) col=0; m.set(2, 8); m.set(col, 24); m.set(buf(isbmp)+buf(3)>>4, 32); m.set(c0, 256); return m.p(); } // Normal model if (bpos==0) { U32 h; int i=0; cm0.set(0); cm1.set(buf(1)); cm2.set(hash(++i, buf(1), buf(2))); cm3.set(hash(++i, buf(1), buf(2), buf(3))); cm4.set(hash(++i, buf(1), buf(2), buf(3), buf(4))); cm5.set(h=hash(buf(1), buf(2), buf(3), buf(4), buf(5))); cm6.set(h^=hash(++i, h, buf(6))); rcm7.set(h^=hash(++i, h, buf(7))); cm7.set(h^=hash(++i, h, buf(8))); rcm9.set(h^=hash(++i, h, buf(9), buf(10))); rcm10.set(h^=hash(++i, h, buf(11), buf(12))); cm8.set(h^=hash(++i, h, buf(13), buf(14))); } int order=0; order+=cm0.mix(m); order+=cm1.mix(m); order+=cm2.mix(m); order+=cm3.mix(m); order+=cm4.mix(m); order+=cm5.mix(m); order+=cm6.mix(m); rcm7.mix(m); cm7.mix(m); rcm9.mix(m); rcm10.mix(m); cm8.mix(m); picModel(m); wordModel(m); recordModel(m); sparseModel(m); m.set(buf(1)+8, 264); m.set(c0, 256); m.set(order+8*(c4>>5&7)+64*(buf(1)==buf(2)), 128); m.set(buf(2), 256); int pr=m.p(); return pr; } //////////////////////////// Predictor ///////////////////////// // A Predictor estimates the probability that the next bit of // uncompressed data is 1. Methods: // p() returns P(1) as a 12 bit number (0-4095). // update(y) trains the predictor with the actual bit (0 or 1). class Predictor { int pr; // next prediction public: Predictor(); int p() const {assert(pr>=0 && pr<4096); return pr;} void update(); }; Predictor::Predictor(): pr(2048) {} void Predictor::update() { static APM a1(256), a2(0x10000), a3(0x10000), a4(0x10000); // Update global context: pos, bpos, c0, c4, buf, nibble c0+=c0+y; if (c0>=256) { buf[pos++]=c0; c4=(c4<<8)+c0-256; c0=1; } bpos=(bpos+1)&7; // Filter the context model with APMs pr=contextModel2(); pr=a1.p(pr, c0)*3+pr>>2; int pr2=a2.p(pr, c0+256*buf(1)); int pr3=a3.p(pr, c0^hash(buf(1), buf(2))&0xffff); int pr4=a4.p(pr, c0^hash(buf(1), buf(2), buf(3))&0xffff); pr=pr2+pr3*2+pr4+2>>2; } //////////////////////////// Encoder //////////////////////////// // An Encoder does arithmetic encoding. Methods: // Encoder(COMPRESS, f) creates encoder for compression to archive f, which // must be open past any header for writing in binary mode. // Encoder(DECOMPRESS, f) creates encoder for decompression from archive f, // which must be open past any header for reading in binary mode. // code(i) in COMPRESS mode compresses bit i (0 or 1) to file f. // code() in DECOMPRESS mode returns the next decompressed bit from file f. // Global y is set to the last bit coded or decoded by code(). // compress(c) in COMPRESS mode compresses one byte. // decompress() in DECOMPRESS mode decompresses and returns one byte. // flush() should be called exactly once after compression is done and // before closing f. It does nothing in DECOMPRESS mode. typedef enum {COMPRESS, DECOMPRESS} Mode; class Encoder { private: Predictor predictor; const Mode mode; // Compress or decompress? FILE* archive; // Compressed data file U32 x1, x2; // Range, initially [0, 1), scaled by 2^32 U32 x; // Decompress mode: last 4 input bytes of archive public: Encoder(Mode m, FILE* f); void flush(); int code(int i=0) { // Compress bit y or return decompressed bit int p=predictor.p(); p += p < 2048; U32 xmid = x1 + ((x2-x1) >> 6) * (p >> 6) + ((x2-x1) >> 12) * (p & 63); assert(xmid>=x1 && xmid>24, archive); x1<<=8; x2=(x2<<8)+255; if (mode==DECOMPRESS) x=(x<<8)+(getc(archive)&255); // EOF is OK } return y; } // Compress one byte void compress(int c) { assert(mode==COMPRESS); for (int i=7; i>=0; --i) code((c>>i)&1); } // Decompress and return one byte int decompress() { assert(mode==DECOMPRESS); int c=0; for (int i=0; i<8; ++i) c+=c+code(); return c; } }; Encoder::Encoder(Mode m, FILE* f): mode(m), archive(f), x1(0), x2(0xffffffff), x(0) { if (mode==DECOMPRESS) { // x = first 4 bytes of archive for (int i=0; i<4; ++i) x=(x<<8)+(getc(archive)&255); } } void Encoder::flush() { if (mode==COMPRESS) putc(x1>>24, archive); // Flush first unequal byte of range } //////////////////////////// main program //////////////////////////// // Read one line, return NULL at EOF or ^Z. f may be opened ascii or binary. // Trailing \r\n is dropped. Lines longer than MAXLINE-1=511 are truncated. char *getline(FILE *f=stdin) { const int MAXLINE=512; static char s[MAXLINE]; int len=0, c; while ((c=getc(f))!=EOF && c!=26 && c!='\n' && len0) printf("%10ld \b\b\b\b\b\b\b\b\b\b\b", i); } // Compress/decompress files. Usage: paq7 archive files... // If archive does not exist, it is created and the named files are // compressed. If there are no file name arguments after the archive, // then file names are read from input up to a blank line or EOF. // If archive already exists, then the files in the archive are either // extracted, or compared if the file already exists. The files // are extracted to the names listed on the command line in the // order they were stored, defaulting to the stored names. int main(int argc, char** argv) { clock_t start_time=clock(); // start timer long total_size=0; // uncompressed size of all files // Check for args if (argc<2) { printf(PROGNAME " compressor/archiver (C) 2005, Matt Mahoney.\n" "Free under GPL, http://www.gnu.org/licenses/gpl.txt\n\n" "To compress: " PROGNAME " [-option] archive files... (archive will be created)\n" "Or (Windows): dir/b | " PROGNAME " archive (file names read from input)\n" "To decompress/compare: " PROGNAME " archive [files...] (defaults to stored names)\n" "To view contents: more < archive\n\n" "Options are -1 to -5 (use 62, 96, 163, 296, 525 MB memory), default -3\n" "You can't add/extract single files. Max total file size is 2GB\n"); return 0; } // Get option int option='3'; if (argv[1][0]=='-') option=argv[1][1], ++argv, --argc; if (option<32) option=32; if (option<'1'||option>'5') fprintf(stderr, "Bad option -%c (use -1 to -5)\n", option), exit(1); // Test for archive. If none, create one and write a header. // The first line is PROGNAME. This is followed by a list of // file sizes (as decimal numbers) and names, separated by a tab // and ending with \r\n. The last entry is followed by ^Z Mode mode=DECOMPRESS; FILE *f=fopen(argv[1], "rb"); if (!f) { mode=COMPRESS; f=fopen(argv[1], "wb"); if (!f) perror(argv[1]), exit(1); fprintf(f, "%s -%c\r\n", PROGNAME, option); // Get filenames if (argc==2) printf("Enter names of files to compress, followed by blank line\n"); int i=2; while (true) { char *filename; if (argc==2) { filename=getline(); if (!filename || !filename[0]) break; } else { if (i==argc) break; filename=argv[i++]; } // Test if files exist and get their sizes, store in archive header FILE *fi=fopen(filename, "rb"); if (!fi) perror(filename); else { fseek(fi, 0, SEEK_END); // get size long size=ftell(fi); total_size+=size; if (size<-1 || total_size<0) { fprintf(stderr, "File sizes must total less than 2 gigabytes\n"); fprintf(f, "-1\tError: over 2 GB\r\n"); exit(1); } if (size==-1) perror(filename); else fprintf(f, "%ld\t%s\r\n", size, filename); fclose(fi); } } fputc(26, f); // EOF fclose(f); } // Read existing archive. Two pointers (header and body) track the // current filename and current position in the compressed data. if (mode==COMPRESS) f=fopen(argv[1], "r+b"); else f=fopen(argv[1], "rb"); if (!f) perror(argv[1]), exit(1); long header, body; // file positions in header, body char *filename=getline(f); // check header if (!filename || strncmp(filename, PROGNAME " -", strlen(PROGNAME)+2)) fprintf(stderr, "%s: not a " PROGNAME " file\n", argv[1]), exit(1); option=filename[strlen(filename)-1]; MEM=0x40000< %s", argv[filenum]); filename=argv[filenum]; } printf(": "); if (size<0 || total_size<0) break; header=ftell(f); fseek(f, body, SEEK_SET); // If file exists in COMPRESS mode, compare, else compress/decompress FILE *fi=fopen(filename, "rb"); if (mode==COMPRESS) { if (!fi) perror(filename), exit(1); for (long i=0; i %4ld \n", ftell(f)-body); } else { // DECOMPRESS if (fi) { // compare int c1, c2; bool diff=false; for (long i=0; i %ld", total_size, compressed_size); else printf("%ld -> %ld", compressed_size, total_size); start_time=clock()-start_time; if (compressed_size>0 && total_size>0 && start_time>0) { printf(" (%1.4f bpc) in %1.2f sec (%1.3f KB/sec)", 8.0*compressed_size/total_size, (double)start_time/CLOCKS_PER_SEC, 0.001*total_size*CLOCKS_PER_SEC/start_time); } printf("\n"); fclose(f); } return 0; }