Skip to content

Data Compression

April 20, 2013

The importance of data compression stays relevant despite the computer power increases due to Moore’s law, but kept in check by Parkinson’s law which states that data will grow to fill the available space. IBM recently put out a paper to quantify how much data we currently produce: 90% of all data created in the last 2 years, and every day creates 2.5 quintillion bytes.

Interesting facts

  • Lossless compression and expansion for natural language can be expected to achieve 50-75% compression ratio.
  • No such thing as a universal data compression algorithm. Why? It’s undecidable, but easy to prove by contradiction, if that was possible, then you’d be able to continually pass the result of compression as input to that compression algorithm repeatedly until you achieved a zero byte result.
  • Natural language compresses with big ratios because of the redundancy in the language. We can read this, because there is enough information to decode the word.

Algorithms

Run-length encoding

Encodes long running sequences of data by contracting counts of the occurrences of that item.

Huffman compression

Provides an optimal prefix-tree symbol-for-symbol encoding. This algorithm uses variable length codes for fixed symbols. Compression is done by mapping from a code lookup table, while expansion is done by traversing a prefix trie. However, when sending compressed data to another source, the expansion prefix-tree must be serialized using tree pre-order traversal. Running time is O(N log N). Another way to divide up symbols is Shannon-Fano coding.

LZW compression

This is an adaptive compression model that changes as input is processed. It’s the other end of the spectrum from Huffman coding, because it uses fixed length codes to replace variable length symbols.

Burrows-Wheeler compression

Lossy compression – using FFTs, wavelets, and fractals to accomplish this for media data formats.

Shannon entropy – defines an absolute limit on the best possible lossless encoding achievable.

Advertisements

From → Algorithms

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: