Socialize

Recent Bookmarks

More...

Fastolfe.Net

I'm David. This site is where I experiment. You could call it a blog, but it would kind of suck as a blog, so I'd rather you didn't.

I'm an engineer at Google, and I love it. If you're interested in learning more, you can start with the personal stuff or the more interesting professional stuff. Some more obscure content can be found through the topic index.

Recent Updates

So, to make it easier for you to find some things that I've added recently (like I said, as a blog, the site is kind of sucky), here's a list of the last three things I added:

2008-12-05

Friend connect

Google just announced their new Friend Connect feature, which lets people convert their web site into a social networking site pretty easily. Since I use this site to experiment, I decided to give it a try. The home page should give you the option to log in or sign up, and each blog post (like this one) should now accept comments on the right. Go wild.

2008-06-02

Review: Indiana Jones 4 Sucked

2 of 5 stars

As far as I'm concerned, there are only three Indiana Jones movies. The fourth simply does not count. Lucas took a great thing and tried to stretch it just a little too far and ended up with something horrifying.

2007-12-19

Risk: Who should pay for it?

America is widely regarded as an increasingly litigious society. We're sue-happy, and lots of people are discovering that the system can be (ab)used to win jackpots. While some consider this trend an unredeemable negative, I view this as a reflection on the desires of the community to shift risk around. This may be stupid and short-sighted, but the market normally can correct for it.

Shared Items

These are other peoples' entries that I've marked as shared on Google Reader.

2009-09-04

Two-year-old as finite state machine

Some time ago I joined a family for dinner, and they had a two-year-old. During dinner, the two-year-old accidentally knocked over her glass, and liquid quickly spread across the table. The adults at the table sprang into action, containing the spill on the table, wiping it up, and checking for leakage onto the floor.

After all the excitement died down, the two-year-old looked down, saw the empty glass, and threw her hands up in the air, proudly announcing, "I drank it all!"

2009-08-15

Warehouse-bots


We want these for the Adafruit warehouse! via Bre

IEEE spectrum (http://spectrum.ieee.org) takes you inside Kiva Systems’ robotic warehouse, where orange robots make inventory move instead of workers. Over time the system becomes increasingly efficient, with the robots learning from the wisdom of the crowd.

2009-07-21

Automated testing of production deployments

When you work as a systems engineer at a company that has a large scale system infrastructure, sooner or later you realize that you need to automate pretty much everything you do. You can't afford not to, if you want to keep up with the ever-present demands of scaling up and down the infrastructure.

The main promise of cloud computing -- infinite elastic scaling based on demand -- is real, but you can only achieve it if you automate your deployments. It's fairly safe to say that most teams that are involved in such infrastructures have achieved high levels of automation. Some fearless teams practice continuous deployment, others do frequent dark launches. All these practices are great, but my thesis is that in order to achieve fearlessness you need automated tests of your production deployments.

Note the word 'production' -- I believe it is necessary to go one step beyond running automated tests in an isolated staging environment (although that is a very good thing to do, especially if staging mirrors production at a smaller scale). That next step is to run your test harness in production, every time you deploy. And deployment, at a fast moving Web company these days, can happen multiple times a day. Trust me, with no automated tests in place, you'll never get rid of that nagging feeling in the pit of your stomach that you might have broken things horribly, in production.

So how do you go about writing automated tests for your deployments? I wrote a while ago about automating and testing your system setup checklists. Even testing small things such as 'is httpd/mysqld/postfix setup to run at boot time' will go a long way in achieving peace of mind.

Assuming you have a list of things to test (it can be just a couple of critical things for starters), how and when do you run the tests? Again, you can do the simplest thing that works -- a bash shell that iterates through your production servers and runs the test scripts remotely on the servers via ssh. Some things I test this way these days are:

* do local MySQL databases on servers in a particular cluster contain the same data in certain tables? (this shows me that things are in sync across servers)
* is MySQL replication working as expected across the cluster of read-only slaves?
* are periodic operations happening as expected (here I can do a simple tail of a log file to figure it out)
* are certain PHP modules correctly installed?
* is Apache serving a number of requests per second that is not too high, but not too low either (where high and low are highly dependent on your traffic and application obviously)

I run these tests (and many others) each time I push a change to production. No matter how small the change can seem, it can have unanticipated side effects. I found that having tests that probe the system from as many angles as possible are the most efficient -- the angles in my case being Apache, MySQL, PHP, memcached for example. I also found that this type of testing (push-based if you want) is very good at showing discrepancies between servers. If you see a server being out of wack this way, then you know you need to attempt to fix it, or even terminate it and deploy a new one.

Another approach in your automated testing strategy is to run your test harness periodically (via cron for example) and also to write the harness in a proper language (Python comes to mind), integrated into a test framework. You can have the results of the tests emailed to you in case of failure. The advantage of this approach is that you can have things run automatically without your intervention (in the first approach, you still have to remember to run the test suite!).

The ultimate in terms of automated testing is to integrate it with your monitoring infrastructure. If you use Nagios for example, you can easily write plugins that essentialy probe for the same things that your tests probe for. The advantage of this approach is that the tests will run every time Nagios runs, and you can set up alerts easily. One disadvantage is that it can slow down your monitoring, depending on the number of tests you need to run on each server. Monitoring typically happens very often (every 5 minutes is a common practice), so it may be overkill to run all the tests every 5 minutes. Of course, this should be configurable in your monitoring tool, so you can have a separate class of checks that only happen every N hours for example.

In any case, let me assure you that even if you take the first approach I mentioned (ssh into all servers and run commands remotely that way), you'll reap the rewards very fast. In fact, you'll like it so much that you'll want to keep adding more tests, so you can achieve more inner peace. It's a sure way to becoming test infected, but also to achieve deployment nirvana.

Read more...

(For spam harvesters and poorly behaved spiders: poisoned addresses)