A85: Re: Assembly-85 Digest V1 #218
[Prev][Next][Index][Thread]
A85: Re: Assembly-85 Digest V1 #218
>Date: Sat, 21 Mar 1998 12:26:18 -0500 (EST)
>From: Jon Olson <morph@jmss.com>
>Subject: Re: A85: Compress levels
>In case the guy doesn't release the source to hufflib - here's how
>huffman compression works.
>Basically, first it finds out what the most common bit combination >is.
For
>example, let's say we're working with an alphanumeric string.
>ABCDEFABCDEABCDABCABA
>Huffman would go through and find the most common letter, second,
>third,
>etc.
>In this string, A is the most common letter, so it would be assigned
>the
>bit pattern: 0
>that way it only takes up 1 bit.
>Next most comon is B, B would be assigned 10
>C 110
>D 1110
>E 11110
>F 111110
>This way, the most commonly used letters (or in your case, bytes) >will
>only take up 1 or 2 bits instead of the full 8. It's fairly efficient
>compression. (This is part of what pkZip uses).
>- -- Jon Olson
actually, that is not the way huffman works:
using the above ABCDEFABCDEABCDABCABA we get the following:
A was used 6 times
B:5
C:4
D:3
E:2
F:1
put them in order:
F:1
E:2
D:3
C:4
B:5
A:6
now start equating numbers (my method...huffman has it's own, but mine
is much faster and compresses better):
letter:times used:encoder bit pattern:tree code
F:1::1 FE:3:0,1:011 C:4::1
E:2::1 D:3::1 B:5::1
D:3::1 C:4::1 A:6::1
C:4::1 B:5::1 FED:6:00,01,1:00111
B:5::1 A:6::1
A:6::1
A:6::1 CB:9:0,1:011
FED:6:00,01,1:00111 AFED:12:0,100,101,11:0100111
CB:9:0,1:011
CBAFED:21:00,01,10,1100,1101,111:00110100111
C B A F E D
A=10
B=01
C=00
D=111
E=1101
F=1100
ok...what happens is huffman placed both the lowest numbers together,
added the times used, placed a 0 in front of all the encoded bit
patterns of the left equate and a 1 in front of all the ones on the
right equate, and then he put a 0+left tree equate+right tree equate.
this maybe difficult and people usually use the method the guy above
stated, but this is the way huffman got his coding. so, here's the bits
using the encoding process:
ABCDEFABCDEABCDABCABA
10 01 00 111 1101 1100 10 01 00 111 1101 10 01 00 111 10 01 00 10 01 10
place the tree in front of the code stripping the first 0, place the
bytes used afterward in order of the tree, and place the file size after
that:
tree: 0110100111
bytes:CBAFED
size:21
code:10010011 11101110 01001001 11110110 01001111 00100100 110
huffman needs these to do the decoding...i'll post the decoding later
unless someone else does it first (or you figure it out yourself).
-Rob
______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com