We have just experienced one hour of downtime. This was, sadly, unavoidable. Our primary database server reached melting point. Queries were taking a couple of seconds to run.. and that's fine at low loads, but when millions of requests are being made, it causes some big problems. We want queries to be running in a twentieth of a second, but it was running about 100 times slower than that.
We made some big changes, including moving the database to a separate SCSI drive, flushing most of our super old irrelevant feed items (no 'live' feed items were erased), and rebuilding all our indexes. We've also optimized the SQL powering most of our digests to the bare bones. We have also made some improvements to the memory setup.
When the server was started up again, it was serving rapidly and I was impressed with the increase. Then I noticed the cache wasn't running! It was serving every single FeedDigest request straight from the database and still had very good performance! I've fixed the cache again now, and the load is ridiculously low (for the techs, we were averaging 1.5->2.5 load - dual CPU - and now we're at 0.1!)
There are still changes to be made, but we'll try to make sure they don't infringe on your service too much. We just want to roll out cool products rather than have to focus on the infrastructure too much, so it's great things are now running smoothly.. just give things a few hours to fully sort themselves out on the feed crawling front now as our crawlers catch up.