
Huffman Encode
Huffman code is a specific type of optimal prefix code that is commonly used for lossless data compression. The algorithm was developed by David A. Huffman while he was a Sc.D. student at MIT and was published in 1952.
The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol. The algorithm derives this table from the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols.
The simplest construction algorithm uses a priority queue, where the node with the lowest frequency is given the highest priority.
- Create a leaf node for each symbol and add it to the priority queue.
- While there is more than one node in the queue:
- Remove the two nodes of highest priority (lowest frequency) from the queue.
- Create a new internal node with these two nodes as children and with a frequency equal to the sum of the two nodes' frequencies.
- Add the new node to the queue.
- The remaining node is the root node, and the tree is complete.
It's important to note that for our task, nodes with the same frequency have different priorities. Symbols with lower ASCII code have a higher priority, for example, 'A' has a higher priority than 'B', and 'DZ' has a higher priority than 'E'.
- Mark the connections between nodes with 0 and 1 (the connection with the higher priority node with 0, and the other with 1).
- The digits along the way from the root node to the leaf form the code for the leaf's symbol.
- The result for our task is a source string in which all the symbols have been replaced by their codes.
Input: String (str).
Output: String (str).
Examples:
assert huffman_encode("BADABUM") == "1001110011000111" assert ( huffman_encode("A DEAD DAD CEDED A BAD BABE A BEADED ABACA BED") == "1000011101001000110010011101100111001001000111110010011111011111100010001111110100111001001011111011101000111111001" ) assert ( huffman_encode("no devil lived on") == "100101111000001110010011111011010110001000111101100" ) assert huffman_encode("an assassin sins") == "110111100110001100010111110001011110"
How it’s used: Commonly used for lossless data compression.
Precondition: Given string maximum length is 32000. String contains letters and spaces (a-z, A-Z, " ").
Idea for the mission was taken from local school challenge for kids.
Screen by Cmglee for wiki.