/* PAQ4 v2 - File archiver and compressor. (C) 2003, Matt Mahoney, mmahoney@cs.fit.edu This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation at http://www.gnu.org/licenses/gpl.txt or (at your option) any later version. This program is distributed without any warranty. USAGE To compress: PAQ4 archive file file... (1 or more file names), or or (MSDOS): dir/b | PAQ4 archive (read file names from input) or (UNIX): ls | PAQ4 archive To decompress: PAQ4 archive To list contents: more < archive Compression: The files listed are compressed and stored in the archive, which is created. The archive must not already exist. File names may specify a path, which is stored. If there are no file names on the command line, then PAQ4 prompts for them, reading until the first blank line or end of file. Decompression: No file names are specified. The archive must exist. If a path is stored, the file is extracted to the appropriate directory, which must exist. PAQ4 does not create directories. If the file to be extracted already exists, it is not replaced; rather it is compared with the archived file, and the offset of the first difference is reported. It is not possible to add, remove, or update files in an existing archive. If you want to do this, extract the files, delete the archive, and create a new archive with just the files you want. TO COMPILE gxx -O paq4.cpp DJGPP 2.95.2 (gives fastest executable) bcc32 -O2 paq4.cpp Borland 5.5.1 sc -o paq4.cpp Digital Mars 8.35n FILE FORMAT A PAQ4 archive has the following format. The first 6 bytes are "PAQ4\r\n" followed by a list of file sizes and names in "%ld\t%s\r\n" format, e.g. 768771book1. The last filename is followed by "\x1A\f\0" () and the compressed data. The compressor uses predictive arithmetic coding. The compressed data is a base 256 represenation of a number x (e.g. the first byte has weight 1/256, second is 1/65536, etc.) such that x has as few digits as possible and satisfies P(s > y) <= x < P(s >= y), where y is the concatenated files, s is a random string with probability distribution P, and (s > y) means that s follows y in a lexicographical ordering over strings. As y is read in, the bounds of possible x narrow, allowing the leading digits to be written as they become known. Computation of x is approximated using integer arithmetic, which expands the data less than 0.001% over the Shannon limit of log2 1/P(y) bits. Prediction (estimation of P(y)) is independent of coding. The model by which P is estimated is built from the previous uncompressed data regardless of whether it is being input or output, so the model is identical for compression and decompression. The model consists of an adaptive weighting of submodels, followed by context sensitive probability adjustment using SSE (secondary symbol estimation). It differs from PAQ3 and PAQ3N that the weighting of models is adaptive rather than fixed. The cyclic model for fixed length records has also been improved. There are N = 19 submodels, which consist of counts of 0 and 1 bits that appear in different contexts. These are combined to a probability P(1) that the next bit of input will be a 1 as follows: P(1) = SUM(i=1..N) wi (n1i / ni) where wi is the i'th weight (determined adaptively), n1i is the number of 1 bits observed in the i'th context, and ni = n0i + n1i is the total number of times the i'th context was observed. The weights wi are adjusted after each bit of uncompressed data to reduce the cost (code length) of that bit. The adjustment is along the gradient of the cost in weight space, which is wi := wi + e[nyi/(SUM(j) (wj+wo) nyj) - ni/(SUM(j) (wj+wo) nj)] where e and wo are small constants, and nyi is the count for bit y (0 or 1) in the i'th context. The weight offset wo prevents the gradient from going to infinity as the weights go to 0. The weight vector is context sensitive. There are 8 vectors which are trained and used depending on the context of the 3 high order bits of the previous whole byte. In order to favor recent data over older data, counts are discarded in a way that gives higher weights to probabilities near 0 or 1. After bit y is observed, the count ny is incremented, but half of the opposite count over 2 is discarded. For example, if n0=3, n1=10, then after a 0 is observed, then half of the 8 bits over 2 in n1 are discarded, so n0=4, n1=6. The 19 submodels are as follows. The current 0-7 bits of the partially compressed byte is always included in the context except in the fixed model. - Fixed (n0 = n1 = 1) - Order 0. The context is the 0-7 bits of the current byte. - Order 1. The context is the last byte plus the current 0-7 bits. - Order 2. The context is the last 2 bytes plus the current 0-7 bits. - Order 3. - Order 4. - Order 5. - Order 6. - Order 7. - Word. The context is the last whole word, defined as the last 0 to 8 contiguous bytes with values of 33 or more, plus the current bits. - Sparse order 2, skipping the next to last byte (x.x) - Sparse order 2, skipping 2 bytes (x..x), the 4th to last and last bytes. - Sparse order 2, the 2nd and 4th last bytes (x.x.) - Sparse order 2, the 4th and 8th last bytes (x...x...) - Sparse order 2, the 2nd and 3rd last (xx.) - Sparse order 2, the 3rd and 4th last (xx..) - A long string model consisting of the last match of 8 or more bytes. - A fixed record model consisting of the two bytes above (as a table) - A fixed record model consisting of the byte above and to the left. Counts for the order 2-7 and word models share a 32M element hash table, where both counts are represented by an 8 bit state. Hash collisions are detected (usually) by using an 8 bit checksum of the context, with linear search over 3 buckets. If all 3 are full, then the element with the lowest priority is replaced, where priority = n0 + n1. The order 0 and 1 models use unhashed tables. The sparse order 2 models share a 4M hash table. The two fixed record models share a 2M hash table The long string model uses a 1M hash table to index a 4MB rotating buffer with 8 and 32 bute hashes to find the most recent context match of 8 bytes or more. The count is set to nu = the length of the match, where y is the bit that occured in that context, and the other count to 0. For example, given "abcdefghij1...abcdefghij" then n0=0, n1=10. The fixed record model determines the record size (table width) by searching for 4 consecutive appearances of the same character equally spaced with an interval of at least 3. For instance, if the string "a..a..a..a" is observed, then the record length is set to 3. It remains set until another sequence is observed. The probability P(1) calculated by the weighted sum of models is further adjusted by SSE in the context of the current 0-7 bits of the partially compressed byte and the 2 high order bits of the previous byte (1024 contexts). In each SSE context, P(1) is divided into 32 bins, and the actual counts of 0s and 1s observed in each bin are used to compute the new probability. When P(1) falls between bins, the counts in the bins on both sides are incremented, and the output probability is interpolated between the bins. The bin spacing is closer together for output probabilities near 0 or 1. The counts are not discounted to favor recent data as in the 19 submodels. Instead, when the counters (8 bits) overflow, both n0 and n1 are halved. The SSE mapping is initialized to SSE(P) = P. The final probability Pf is then averaged to Pf = (3 SSE(P) + P) / 4. Thanks to Serge Osnach for introducing me to SSE (in PAQ1SSE/PAQ2) and the sparse models (PAQ3N). Also, credit to Eugene Shelwein, Dmitry Shkarin for suggestions on using multiple character SSE contexts. Credit to Eugene, Serge, and Jason Schmidt for developing faster and smaller executables of previous versions. Credit to Werner Bergmans and Berto Destasio for testing and evaluating them, including modifications that improve compression at the cost of more memory. I expect there will be better versions in the future. If you make any changes, please change the name of the program (e.g. PAQ5), including the string in the archive header by redefining PROGNAME below. This will prevent any confusion about versions or archive compatibility. Also, give yourself credit in the help message. */ #define PROGNAME "PAQ4" // Please change this if you change the program #include #include #include #include #include #include #include #include #include #include const int PSCALE=2048; // Integer scale for representing probabilities const int MEM=8; // 6 = 88 MB, increase by 1 to double it template inline int size(const T& t) {return t.size();} // 8-32 bit unsigned types, adjust as appropriate typedef unsigned char U8; typedef unsigned short U16; typedef unsigned long U32; class U24 { // 24-bit unsigned int U8 b0, b1, b2; // Low, mid, high byte public: explicit U24(int x=0): b0(x), b1(x>>8), b2(x>>16) {} operator int() const {return (((b2<<8)|b1)<<8)|b0;} }; // 32-bit random number generator based on r(i) = r(i-24) ^ r(i-55) class Random { U32 table[55]; // Last 55 random values int i; // Index of current random value in table public: Random(); U32 operator()() { // Return 32-bit random number if (++i==55) i=0; if (i>=64) return table[i]^=table[i-16]; // emilcont else return table[i]^=table[i+63]; // emilcont } } rnd; Random::Random(): i(0) { // Seed the table table[0]=123456789; table[1]=987654321; for (int j=2; j<55; ++j) table[j]=table[j-1]*11+table[j-2]*0.1; // emilcont } /* 3 byte counter, shown for reference only. It implements a nonstationary pair of counters of 0s and 1s such that preference is given to recent history by discarding old bits. */ class Counter3 { U8 n[2]; // n[y] is the counts of ys (0s or 1s) public: Counter3() {n[0]=n[1]=0;} int get0() const {return n[0];} // Return count of 0s int get1() const {return n[1];} // Return count of 1s int priority() const {return get0()+get1();} // For hash replacement void add(int y) { // Add y (0 or 1) to n[y] and age the opposite count if (n[y]<255) ++n[y]; if (n[1-y]>2) n[1-y]/=2, n[1-y]+=1; } }; /* Approximately equivalent 2 byte counter implementing the above. The representable counts (n0, n1) are 0-10, 12, 14, 16, 20, 24, 28, 32, 48, 64, 128, 256, 512. Both counts are represented by a single 8-bit state. Counts larger than 10 are incremented probabilistically. Although it uses 1/3 less memory, it is 8% slower and gives 0.05% worse compression than the 3 byte counter. */ class Counter2 { U8 state; struct E { // State table entry U16 n0, n1; // Counts represented by state U8 s00, s01; // Next state on input 0 without/with probabilistic incr. U8 s10, s11; // Next state on input 1 U32 p0, p1; // Probability of increment x 2^32-1 on inputs 0, 1 }; static E table[150]; // State table public: Counter2(): state(0) {} int get0() const {return table[state].n0;} int get1() const {return table[state].n1;} int priority() const {return state;} void add(int y) { if (y) { if (state<75 || rnd() is a hash table of 2^N elements of type T with linear search of M=3 elements in case of collision. If all elements collide, then the one with the lowest T.priority() is replaced. Hashtable[i] returns a T& indexed by the lower N bits of i whose checksum matches bits 31-24 of i, creating or replacing if needed. The user must supply T::priority() and a 32-bit hash. */ template class Hashtable { private: struct HashElement { U8 checksum; T value; HashElement(int ch=0): checksum(ch), value() {} }; vector table; // Array of 2^N+M elements public: Hashtable(): table((1< T& Hashtable::operator[](U32 i) { const U32 lb=i&((1<>24; int bj=lb; int bget=1000000000; for (U32 j=lb; j= 8 bytes between the current context and previous input, and predicts the next bit in the previous context with weight n. If the next bit is 1, then (n0[0], n1[0]) is assigned (0, n), else (n, 0). Matchies are found in a 4MB rotating buffer using a 1M hash table of pointers. */ class MatchModel { enum {N=1}; // Number of submodels vector buf; // Input buffer, wraps at end vector ptr; // Hash table of pointers U32 hash[2]; // Hashes of current context up to pos-1 int pos; // Element of buf where next bit will be stored int bpos; // Number of bits (0-7) stored at buf[pos] int begin; // Points to first matching byte (does not wrap) int end; // Points to last matching byte + 1, 0 if no match public: MatchModel(): buf(0x10000<>(8-bpos))!=buf[pos]) // Does it match? begin=end=0; // no if (bpos==8) { // New byte bpos=0; hash[0]=hash[0]*(16*56797157)+buf[pos]+1; // Hash last 8 bytes hash[1]=hash[1]*(2*45684217)+buf[pos]+1; // Hash last 32 bytes U32 h=hash[0] >> (10); // emilcont if ((hash[0]>>28)==0) h=hash[1] >> (10); // 1/16 of 8-contexts are hashed to 32 bytes emilcont if (++pos==int(buf.size())) pos=0; if (end) ++end; else { // Search for a matching context end=ptr[h]; if (end>0) { begin=end; int p=pos; while (begin>0 && p>0 && begin!=p+1 && buf[begin-1]==buf[p-1]) { --begin; --p; } } if (end==begin) // No match found begin=end=0; } ptr[h]=U24(pos); } *n0=*n1=0; if (end) { int wt=end-begin; if (wt>255) wt=255; int y=(buf[end]>>(7-bpos))&1; if (y) *n1=wt; else *n0=wt; } } /* A RecordModel finds fixed length records and models bits in the context of the two bytes above (the same position in the two previous records) and in the context of the byte above and to the left (the previous byte). The record length is assumed to be the interval in the most recent occurrence of a byte occuring 4 times in a row equally spaced, e.g. "x..x..x..x" would imply a record size of 3 (the mimimum). */ class RecordModel { enum {N=2}; // Number of models static int lpos[256][4]; // Position of last 4 bytes vector buf; // Rotating history [4K] Hashtable t; // Model emilcont int pos; // Byte counter int c0; // Leading 1 and 0-7 bits int repeat; // Cycle length int hash[N]; // of c0, context and repeat Counter* cp[N]; // Pointer to current counter public: RecordModel(); void model(int y, int* n0, int* n1); // Update and predict } recordModel; int RecordModel::lpos[256][4]={{0}}; RecordModel::RecordModel(): buf(4096), pos(0), c0(1), repeat(1) { hash[0]=hash[1]=0; cp[0]=cp[1]=&t[0]; } // Update the model with bit y, then put predictions of the next update // as 0 counts in n0[0..N-1] and 1 counts in n1[0..N-1] void RecordModel::model(int y, int* n0, int* n1) { // Update the counters with bit y cp[0]->add(y); cp[1]->add(y); c0+=c0+y; if (c0>=256) { c0-=256; // Save positions of last 4 occurrences of current byte c0 int (&lp)[4]=lpos[c0]; for (int j=3; j>0; --j) lp[j]=lp[j-1]; lp[0]=pos; // Check for a repeating pattern of interval 3 or more if (lp[2]-lp[3]==lp[1]-lp[2] && lp[1]-lp[2]==lp[0]-lp[1] && lp[0]-lp[1]>2) repeat=lp[0]-lp[1]; // Save c0 buf[pos&(buf.size()-1)]=c0; ++pos; // Compute context hashes hash[0]=buf[(pos-repeat)&(buf.size()-1)]*578997481 +buf[(pos-repeat*2)&(buf.size()-1)]*878996291 +repeat*378997811; // Context is 2 bytes above hash[1]=buf[(pos-repeat)&(buf.size()-1)]*236399113 +buf[(pos-1)&(buf.size()-1)]*736390141 +repeat*984388129; // Context is the byte above and to the left c0=1; } // Set up counter pointers cp[0]=&t[hash[0]+(c0<<24)+((c0*3)&255)]; cp[1]=&t[hash[1]+(c0<<24)+((c0*3)&255)]; // Predict the next value of y n0[0]=cp[0]->get0(); n1[0]=cp[0]->get1(); n0[1]=cp[1]->get0(); n1[1]=cp[1]->get1(); } /* A CharModel contains a fixed model, n-gram models from 0 to 7, and several order-2 sparse models which skip over parts of the context. */ class CharModel { enum {N=16}; // Number of models vector t0; // Order 0 counts [256] vector t1; // Order 1 counts [64K] Hashtable t2; // Sparse models // emilcont Hashtable t; // Order 2-7 models // emilcont int c0; // Current partial byte with leading 1 bit int c1, c2, c3, c4, c5, c6, c7; // Previous bytes vector cxt; // Context hashes [N] vector cp; // Pointers to current counts [N] public: CharModel(); void model(const int y, int* n0, int* n1); // Update and predict int getc0() const {return c0;} int getc1() const {return c1;} int getc2() const {return c2;} } charModel; CharModel::CharModel(): t0(256), t1(65536), t(), c0(1), c1(0), c2(0), c3(0), c4(0), c5(0), c6(0), c7(0), cxt(N), cp(N) { for (int i=0; iadd(y); // Update context c0+=c0+y; if (c0>=256) { // Start new byte for (int i=8; i>1; --i) cxt[i]=(cxt[i-1]+c0)*987660757; c0-=256; if (c0>32) cxt[9]=(cxt[9]+c0)*34609027*4; // Last whole word (8 chars max) else cxt[9]=0; cxt[10]=c0*837503159+c2*537496093; // Sparse models cxt[11]=c0*840101893+c3*850090301; cxt[12]=c1*769377353+c3*653589317; cxt[13]=c3*368910977+c7*568909763; cxt[14]=c1*950087393+c2*970092001; cxt[15]=c2*629495183+c3*649492307; c7=c6; c6=c5; c5=c4; c4=c3; c3=c2; c2=c1; c1=c0; c0=1; } // Set up pointers to next counters cp[1]=&t0[c0]; cp[2]=&t1[c0+c1*256]; for (int i=3; i<10; ++i) cp[i]=&t[cxt[i]+(c0<<24)+((c0*3)&255)]; for (int i=10; iget0(); n1[i]=cp[i]->get1(); } } /* A MixModel mixes an array of counts to guess the next bit. The counts are adaptively weighted to minimize cost. model(y, pr) updates all the models with y, gets their bit counts, weights them, and returns the probability P(1) * PSCALE in pr. The weights are adjusted to minimize the cost of encoding y. There are C sets of weights, selected by a context (the 3 upper bits of the last whole byte). */ class MixModel { enum {N=19, C=8}; // Number of models, number of weight contexts vector wt; // Context weights [C, N] int bc0[N], bc1[N]; // Bit counts concatenated from various models int cxt2; // Secondary context to select a weight vector public: MixModel(); ~MixModel(); void model(int y, int& pr); // Update with bit y, then put next guess in pr } mixModel; MixModel::MixModel(): wt(C*N), cxt2(0) { for (int i=0; i0 && s1>0) { const int s=s0+s1; const int sy=y?s1:s0; const int sy1=0xffffffff/sy+(rnd()&1023) >> 10; const int s1 =0xffffffff/s +(rnd()&1023) >> 10; for (int i=0; i> 8; wt[cn+i]=min(65535, max(1, wt[cn+i]+dw)); } } } // Get counts charModel.model(y, bc0, bc1); recordModel.model(y, bc0+16, bc1+16); matchModel.model(y, bc0+18, bc1+18); // Update secondary context cxt2=charModel.getc1()/(256/C); // Predict next bit int n0=1, n1=n0; for (int j=0; j0); while (n>2000000000/PSCALE) n/=4, n1/=4; pr=int((PSCALE-1)*n1/n); } /* A Predictor adjusts the model probability using SSE and passes it to the encoder. An SSE model is a table of counters, sse[SSE1][SSE2] which maps a context and a probability into a new, more accurate probability. The context, SSE1, consists of the 0-7 bits of the current byte and the 2 leading bits of the previous byte. The probability to be mapped, SSE2 is first stretched near 0 and 1 using SSEMap, then quantized into SSE2=32 intervals. Each SSE element is a pair of 0 and 1 counters of the bits seen so far in the current context and probability range. Both the bin below and above the current probability is updated by adding 1 to the appropriate count (n0 or n1). The output probability for an SSE element is n1/(n0+n1) interpolated between the bins below and above the input probability. This is averaged with the original probability with 25% weight to give the final probability passed to the encoder. */ class Predictor { enum {SSE1=256*4, SSE2=32, // SSE dimensions (contexts, probability bins) SSESCALE=1024/SSE2}; // Number of mapped probabilities between bins // Scale probability p into a context in the range 0 to 1K-1 by // stretching the ends of the range. class SSEMap { U16 table[PSCALE]; public: int operator()(int p) const {return table[p];} SSEMap(); } ssemap; // functoid // Secondary source encoder element struct SSEContext { U8 c1, n; // Count of 1's, count of bits int p() const {return PSCALE*(c1*64+1)/(n*64+2);} void update(int y) { if (y) ++c1; if (++n>254) { // Roll over count overflows c1/=2; n/=2; } } SSEContext(): c1(0), n(0) {} }; vector > sse; // [SSE1][SSE2+1] context, mapped prob int nextp; // p() int ssep; // Output of sse int context; // SSE context public: Predictor(); int p() const {return nextp;} // Returns pr(y = 1) * PSCALE void update(int y); // Update model with bit y = 0 or 1 }; Predictor::SSEMap::SSEMap() { for (int i=0; i1023) p=1023; if (p<0) p=0; table[i]=p; } } Predictor::Predictor(): sse(SSE1), nextp(PSCALE/2), ssep(512), context(0) { // Initialize to sse[context][ssemap(p)] = p for (int i=0; i=0; --i) { int p=(ssemap(i*PSCALE/N)+SSESCALE/2)/SSESCALE; int n=1+N*N/((i+1)*(N-i)); if (n>254) n=254; int c1=(i*n+N/2)/N; for (int j=oldp-1; j>=p; --j) { for (int k=0; k=0x4000000) xmid+=(xdiff>>12)*p; else if (xdiff>=0x100000) xmid+=((xdiff>>6)*p)>>6; else xmid+=(xdiff*p)>>12; // Update the range if (y) x2=xmid; else x1=xmid+1; predictor.update(y); // Shift equal MSB's out while (((x1^x2)&0xff000000)==0) { putc(x2>>24, archive); x1<<=8; x2=(x2<<8)+255; } } /* Decode one bit from the archive, splitting [x1, x2] as in the encoder and returning 1 or 0 depending on which subrange the archive point x is in. */ inline int Encoder::decode() { // Split the range const U32 p=predictor.p()*(4096/PSCALE)+2048/PSCALE; // P(1) * 4K assert(p<4096); const U32 xdiff=x2-x1; U32 xmid=x1; // = x1+p*(x2-x1) multiply without overflow, round down if (xdiff>=0x4000000) xmid+=(xdiff>>12)*p; else if (xdiff>=0x100000) xmid+=((xdiff>>6)*p)>>6; else xmid+=(xdiff*p)>>12; // Update the range int y=0; if (x<=xmid) { y=1; x2=xmid; } else x1=xmid+1; predictor.update(y); // Shift equal MSB's out while (((x1^x2)&0xff000000)==0) { x1<<=8; x2=(x2<<8)+255; int c=getc(archive); if (c==EOF) c=0; // Fix for version 2 x=(x<<8)+c; } return y; } // Should be called when there is no more to compress void Encoder::flush() { // In COMPRESS mode, write out the remaining bytes of x, x1 < x < x2 if (mode==COMPRESS) { while (((x1^x2)&0xff000000)==0) { putc(x2>>24, archive); x1<<=8; x2=(x2<<8)+255; } putc(x2>>24, archive); // First unequal byte } } // Read one byte from encoder e int decompress(Encoder& e) { // Decompress 8 bits, MSB first int c=0; for (int i=0; i<8; ++i) c=c+c+e.decode(); return c; } // Write one byte c to encoder e void compress(Encoder& e, int c) { for (int i=7; i>=0; --i) e.encode((c>>i)&1); } // Fail if out of memory void handler() { printf("Out of memory\n"); exit(1); } // Read and return a line of input from FILE f (default stdin) up to // first control character except tab. Skips CR in CR LF. string getline(FILE* f=stdin) { int c; string result=""; while ((c=getc(f))!=EOF && (c>=32 || c=='\t')) result+=char(c); if (c=='\r') (void) getc(f); return result; } // User interface int main(int argc, char** argv) { set_new_handler(handler); // Test the compiler assert(sizeof(U8)==1); assert(sizeof(U16)==2); assert(sizeof(U24)==3); assert(sizeof(U32)==4); assert(sizeof(int)==4); // Check arguments if (argc<2) { printf( PROGNAME " v2 file compressor/archiver, (C) 2003, Matt Mahoney, mmahoney@cs.fit.edu\n" "This program is free software distributed without warranty under the terms\n" "of the GNU General Public License, see http://www.gnu.org/licenses/gpl.txt\n" "\n" "To compress: " PROGNAME " archive filenames... (archive will be created)\n" " or (MSDOS): dir/b | " PROGNAME " archive (reads file names from input)\n" "To extract/compare: " PROGNAME " archive (does not clobber existing files)\n" "To view contents: more < archive\n"); return 1; } // File names and sizes from input or archive vector filename; // List of names vector filesize; // Size or -1 if error int start_time=clock(); int uncompressed_bytes=0, compressed_bytes=0; // Input, output sizes // Extract files FILE* archive=fopen(argv[1], "rb"); if (archive) { if (argc>2) { printf("File %s already exists\n", argv[1]); return 1; } printf("Extracting archive %s ...\n", argv[1]); // Read PROGNAME "\r\n" at start of archive if (getline(archive) != PROGNAME) { printf("Archive file %s not in " PROGNAME " format\n", argv[1]); return 1; } // Read "size filename" in "%d\t%s\r\n" format while (true) { string s=getline(archive); if (s.size()>1) { filesize.push_back(atol(s.c_str())); string::iterator tab=find(s.begin(), s.end(), '\t'); if (tab!=s.end()) filename.push_back(string(tab+1, s.end())); else filename.push_back(""); } else break; } // Test end of header for "\f\0" { int c1=0, c2=0; if ((c1=getc(archive))!='\f' || (c2=getc(archive))!=0) { printf("%s: Bad " PROGNAME " header format %d %d\n", argv[1], c1, c2); return 1; } } // Extract files from archive data Encoder e(DECOMPRESS, archive); for (int i=0; i2) for (int i=2; i=0) fprintf(archive, "%ld\t%s\r\n", filesize[i], filename[i].c_str()); } putc(032, archive); // MSDOS EOF putc('\f', archive); putc(0, archive); // Write data Encoder e(COMPRESS, archive); long file_start=ftell(archive); for (int i=0; i=0) { uncompressed_bytes+=size; printf("%-23s %10ld -> ", filename[i].c_str(), size); FILE* f=fopen(filename[i].c_str(), "rb"); int c; for (long j=0; j0 && elapsed_time>0) { printf(" (%1.4f bpc, %1.2f%% at %1.0f KB/s)", compressed_bytes*8.0/uncompressed_bytes, compressed_bytes*100.0/uncompressed_bytes, uncompressed_bytes/(elapsed_time*1000.0)); } printf("\n"); return 0; }