Home | Recent Changes | Search | Log in

How should I upgrade a cluster?

One at a time ;) Memcached is at least backwards compatible with itself. You should have enough memcached's in production, that clearing one of them at a time does not degrade your service (maybe even two at a time).

A decent rule of thumb might be:

Often the typical hit rate will be achieved within 10-15 minutes of restarting. So if you watch what's going on, you can run a rolling restart without sinking a lot of time.

How do you read the stats output?

protocol.txt has started getting descriptions of the 'stats' output as of memcached 1.2.5. You must keep in mind that the format of 'stats' commands is subject to change.

What should I monitor?

Graph your hit ratio. This is the number of gets divided by the number of get_hits. Any significant change in hit ratio warrants investigation. Did a developer push some awful code? Did something get flushed?

Uptime - find out when instances restart. Try to not do that too often.

Graph hit rate, and/or bytes in/out. Sometimes popular keys can end up on different instances. Or some servers will not be able to contact all of the memcached's. This will show you everything is happy and even.

Graph eviction rate. 'evictions' are items which were expired early in their lifespan.

Monitor 'stats items' as of memcached 1.2.5. Check to ensure you are not getting out of memory errors. You will also be able to find out what specific item size is causing evictions. Compare with 'stats slabs' to see if not enough slabs were assigned for a particularily hot slab class. This sometimes happens if the popular size of items changes over time. The easiest way to fix is to restart memcached and let it re-assign slabs naturally.

Debugging

If you get errors from the server, find a crash, etc. The first thing to do is upgrade to the latest available release. If the problem still happens, make sure you gather all of the exact errors your client and server are reporting. If you are getting a segfault or crash, it is also possible to run memcached under gdb and get debugging information.

If you're having run of the mill connection errors, you probably want to drill down on what your actual issue is. Write a script standalone from your main application. Use the same memcached client, same connection code, etc. Run it in a loop and try to reproduce the error. If you can, start adding debugging code to the test script, run tcpdump inbetween, etc. Being able to reproduce the issue outside of your application is a very important step in getting it all fixed.

Reporting bugs

Ensure you have done all of what is described in the Debugging section above. Before reporting a bug you should be on the latest release of your memcached client and the memcached server. Your best bet for help will be the mailing list. Get exact error messages, exact code examples, and submit those together with a careful description of how to reproduce. If you have an easily reproducable bug, the best possible way to report it is by writing a test. In memcached/t are an array of perl tests. Write a new test and submit the patch with your bug report if you can.

Page Last Updated: Mar 16 4:30pm by Dormando


Log in - Socialtext v3.2.0.5