There are two compression techniques I know of basically - one compresses every character based on its frequency - if 'a' is used a lot, it could be '01' while if 'x' is rarely used, it would be represented as '100101' or something long. This is called huffman coding.
The other type is LZW encoding - it has to do with a fixed length code, but variable length of what the code represents - so if we've seen "aaaaaaa" a lot before, it could be encoded in only 8 bits.
So here's one for you - music encoding. Let's say we're encoding notes. So we might have "bs3q" or something like that - b sharp, 3rd octave above the lowest octave we'll encode, quarter note. We could use LZW for this - it might be nice if we see a lot of notes in one octave. Huffman might be a little naive - if all the notes are in one octave, there would be plenty of overhead to keep entering bits for the octave.
So think about music - sheet music is fairly sparse compared to the representation I described. At the beginning, it says which octave it is, which notes are sharp, and the time. If we could encode that only once, it might be useful.
But that's not the point - I could write the desnsist representation of music ever, and then run huffman and or lzw over it. But the beauty of XML is readable markup - so how do we compress this extremely densely while having the original being uber readable? How do we use smarter compression algorithms than Huffman & LZW?
I'd love to check into "smarter" compression techniques - not necessarily ones tuned for music or another format, but ones that take into account how humans like to represent data (ie: not referring to an external table, but having all information available at their point of focus) and seeing if there's a better way to compress data meant to be read by humans - based on how the mind really likes to work, and how humans actually write naturally.
Just for fun.