My research this semester, under Prof. Christopher Olston is on search engine index freshness - particularly the discovery of new pages on the web, using the Web Information Collector (WIC) algorithm.
The general idea is that there is a fixed resource limitation while trying to index the web. For example, bandwidth is limited so you can only download 3 pages per site per week.
What would be the best algorithm to find all of these pages given a certain set of requirements? Timeliness and completeness are at odds, so you must optimize for one over the other.
The WIC algorithm is a greedy algorithm that improves on the naive algorithm - instead of only downloading the same best 3 pages per week, it balances what it has recently viewed to balance completeness and timliness and rotates which page it downloads.
I got results today on WIC vs the naive algorithm, and WIC kicks its ass for small resource limitation. It's about 70% better when the resource limitation is 1 page, and it's always at least 15% better from my estimation.