Just for fun: Implementing a simple Huffman-Encoder and Decoder

Ok. Today I acually wanted to finally finish my seminar-script about the Barr Theorem and learn a little Algebra, but since my dorsals, my head and my ears seem to be competing which one can hurt most, I decided to do something easier. Today I heard about the Huffman-Code in a lecture, so to minimize my remorse, I decided to implement it in Common Lisp. So well, here is the code – it unfortunaly doesnt work when the lists given content of less than 2 characters. And I only tested it using SBCL. And it should be very slow.

(defun create-prob-table (charlist)
  (let ((probs (make-hash-table :test 'equalp)))
    (dolist (char charlist)
      (if (gethash char probs nil)
	  (incf (gethash char probs))
	  (setf (gethash char probs) 1)))
    probs))

(defun create-huffman-tree (probs)
  (let ((smallest nil)
	(second-smallest nil))
    (maphash
     #'(lambda (k v)
	 (cond
	   ((or (not smallest) (< v (cdr smallest)))
	    (setf second-smallest smallest)
	    (setf smallest (cons k v)))
	   ((or (not second-smallest) (< v (cdr second-smallest)))
	    (setf second-smallest (cons k v)))))
     probs)
    (cond
      ((not second-smallest)
       (car smallest))
      (T
       (remhash (car smallest) probs)
       (remhash (car second-smallest) probs)
       (setf (gethash
	      (list (car smallest) (car second-smallest))
	      probs)
	     (+ (cdr smallest) (cdr second-smallest)))
       (create-huffman-tree probs)))))

(defun create-huffman-table (tree)
  (let ((table (make-hash-table)))
    (labels ((create-table (tree prefix)
	       (if (listp (car tree))
		   (create-table
		    (car tree)
		    (cons 0 prefix))
		   (setf (gethash (car tree) table)
			 (reverse (cons 0 prefix))))
	       (if (listp (cadr tree))
		   (create-table
		    (cadr tree)
		    (cons 1 prefix))
		   (setf (gethash (cadr tree) table)
			 (reverse (cons 1 prefix))))))
      (create-table tree nil)
      table)))

(defun huffman-encode (charlist table)
  (let ((ret nil))
    (dolist (char charlist)
      (setf ret
	    (concatenate 'list
			 ret
			 (gethash char table))))
    ret))

(defun huffman-compress (charlist)
  (let* ((prob-table (create-prob-table charlist))
	 (huffman-tree
	  (create-huffman-tree prob-table))
	 (huffman-table
	  (create-huffman-table huffman-tree))
	 (compressed
	  (huffman-encode charlist huffman-table)))
    (values compressed huffman-tree)))

(defun huffman-decompress
    (tree bitseq &optional (ret nil) (wholetree tree))
  (if (listp tree)
      (if (eql (car bitseq) 0)
	  (huffman-decompress (car tree)
			      (cdr bitseq) ret wholetree)
	  (huffman-decompress (cadr tree)
			      (cdr bitseq) ret wholetree))
      (if bitseq
	  (huffman-decompress wholetree bitseq
	   (concatenate 'list ret (list tree))
	   wholetree)
	  (concatenate 'list ret (list tree)))))

And here is how it works:

First, the function create-prob-table creates a hash-table mapping the characters in the list given to their numbers of occurence. Then the function create-huffman-tree creates a huffman-tree from this prob-table:

CL-USER> (create-prob-table (list 1 1 1 1 1 1 1 1 1 1 2 3 2 3 4 5))
#<HASH-TABLE :TEST EQUALP :COUNT 5 {AED1619}>
CL-USER> (create-huffman-tree *)
(((4 5) (2 3)) 1)

From this tree, create-huffman-table creates a dictionary, mapping the caracters to their desired bit-sequence. huffman-encode then takes this dictionary and the caracter-sequence to encode the whole thing into the bit-sequence:

CL-USER> (create-huffman-table *)
#<HASH-TABLE :TEST EQL :COUNT 5 {BC0E4C9}>
CL-USER> (huffman-encode (list 1 1 1 1 1 1 1 1 1 1 2 3 2 3 4 5) *)
(1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1)

huffman-compress just calls these functions at once and returns two values: the tree and the encoded bitlist:

CL-USER> (huffman-compress (list 1 1 1 1 1 1 1 1 1 1 2 3 2 3 4 5))
(1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1)
(((4 5) (2 3)) 1)

huffman-decompress takes these values and restores the given character list:

CL-USER> (multiple-value-bind (tree val)
(huffman-compress (list 1 1 1 1 1 1 1 1 1 1 2 3 2 3 4 5))
(huffman-decompress val tree))
(1 1 1 1 1 1 1 1 1 1 2 3 2 3 4 5)

Well, its a slow implementation, but at least now I know the Huffman-Encoding!

Schreibe einen Kommentar

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s

%d Bloggern gefällt das: