Huffman And Matlab
ELEG 4003 Communications Theory
Fall 2014
Course Project #1
(Max. Score: 20)
Date Assigned: 09/16/2014
Assignment Due Date: October 02, 2014
Instruction:
Adherence to Prairie View A&M University's honor code is fully expected. Discussion with
other team members or getting external help is strictly prohibited (i.e., if you have any
doubts, consult the course instructor on the interpretation of problems). Each team of three
students needs to submit only one report. Clearly indicate the contributions of each group
members in the report. You should also write and sign the honor pledge on the project
report: “We have neither given nor received unauthorized assistance on this assignment.”
Late submissions (after 11:00 am on 10/02/2014) will not be accepted/graded.
Question 1 (10 points)
Consider a three-symbols alphabet with the specified probability of assignment shown below:
Listed with the input alphabet are six binary code assignments.
(a) Scan these codes and determine which codes are practical (can be used for data compression
application). Justify your answers.
(b) Design a Huffman code for the above three-symbols source alphabet shown above and find its
code efficiency (i.e., compression efficiency).
(c) Design a Shannon-Fano code for the above three-symbol source alphabet shown above and
find its code efficiency. Compare it with your answer in part (b).
(d) Can you suggest a technique to improve the code efficiency to achieve a greater compression
ratio? Determine the code efficiency for your improved source coding method.
Xi P(Xi)
a 0.70
b 0.25
c 0.05
Symbol Code 1 Code 2 Code 3 Code 4 Code 5 Code 6
a 00 00 0 1 1 1
b 11 01 1 10 01 00
c 11 10 11 100 11 01
Question 2 (10 points): Huffman Coding
In this problem, you will study the efficacy of Huffman source coding (data compression algorithm)
on two different data sources. Generate two test data files data1.txt and data2.txt. Create the first test
data file data1.txt such that it contains at least 20 characters (including spaces). Next create a second
test data file data2.txt consisting of about 20 binary digits (i.e., digits 0 and 1).
(a) Suppose you wish to compress the data1.txt and data2.txt using Huffman source coding method.
Find the compression efficiency and the average codeword length. Clearly show your final code
design (i.e., codeword for each source symbol).
(b) Repeat part (a) but now consider at least 5000 characters and 5000 digits for data1.txt and
data2.txt, respectively.
(c) Validate that your design and your program works fine (i.e., correct data encoding and decoding)
for both of your test data files in parts (a) and (b). Compare your results with the compression effi-
ciency attainable with “compress” routine in UNIX or with other data compression utilities (e.g.,
zip). Explain any interesting observations and/or trends from your results.
(d) Explain why Huffman coding is NOT used for data compression of practical data sources? In
other words, why practical source compression utilities (e.g., zip, gnuzip, etc.) do not use Huffman
coding method despite its optimality?
(e) Explain why it might be more advantageous to employ the Lempel-Ziv algorithm over Huffman
coding method for digital multimedia (image/audio/video) data compression applications?
Your report should not be longer than 12 pages (single-line spacing). Your write-up should include:
(a) any source code (e.g., MATLAB commands or m-files) that you have written/used;
(b) a brief description of the data compression algorithms that you have utilized or implemented
(highlighting their unique features and possible improvements);
(c) discussions on your results including a comparison with other standard source coding (data com-
pression) utilities used in UNIX or MS-DOS operating systems;
(d) sample listing of the encoded sequence and decoded sequences.