[A83] Re: Groups
[Prev][Next][Index][Thread]
[A83] Re: Groups
I've thought of compression myself too. The best algorithm would probably be
LZ77. It's simple and effective, with fast decompression but slow
compression. Here's a description (by one "Joe D", describing the HPI archive
format used by Cavedog Entertainment's Total Annihilation):
The compression algorithm is a very basic sliding window compression scheme
from the LZ77 family using a 4095 byte history and matches from 2 to 17
bytes long.
The first byte is kind of a "tag" byte which determines if the next eight
pieces of data are literal bytes or history matches. Starting with the
least-significant bit, this tag byte is scanned to figure out what to do.
When the current bit is a zero, the next byte of the input is transfered
directly to the output and added to end of the history buffer.
When the current bit is a one, the next two bytes taken from the input are
used as a offset/length pair. The upper 12 bits are the offset into the
history buffer and the lower 4 bits are the length. [NOTE: Different numbers
of bits may be more effective.] If the offset is zero, the end of the input
data has been reached and the decompressor simply exits.
Since we can assume that there will be no matches with a length of zero
or only one byte, any match is a mimimum of two bytes so we just add two
to the length which extends our range from 0-15 to 2-17 bytes.
[NOTE: 2 byte matches make little sense either, because the match code is
itself 2 bytes. Therefore, it would work better to add 3, getting 3-18 byte
matches.]
The match is then copied from the history buffer to the output and also added
onto the end of the history buffer to keep it in sync with the output.
[NOTE: The history number can't go over 4095 bytes. When it reaches that
number, things are dumped from the beginning. It makes sense for the history
buffer to actually be a pointer to the output, 4095 bytes back from the
current position, or at the beginning if less than 4095 bytes have been
decompressed.]
When all eight bits of the tag byte have been used, the mask is reset and
the next tag byte is loaded.