The next post in a series of technical ‘question and answer’ guest blogs written by one of the UKFast Linux aficionados
Continuing from ‘what does the swappiness parameter actually do?‘ I’ll try to explain how much SWAP space there should be on a server, if you stay with me..
I feel there is a common misconception these days about the value of swap and what its purpose is. It is often considered that swap is meant as a ‘reserve bank’ for memory when you’re running low. Yes, this is partly true, but the kernel doesn’t want to use your swap as a reserve bank. Additionally the kernel never wants to invoke the disk to get the data you are after!
In application space, there are a number of things the kernel will be keeping in memory:
For the purpose of memory management, memory allocation is backed in some form or another by a supporting device.
File-backed memory is memory which has come from a file and on a typical O/S makes up the vast majority of memory allocation on the system. It includes files such as shared libraries which have been loaded, files read from disk and stored in the page cache, and files mapped from disk (in fact, the kernel makes no distinction of pages between files in the page cache and files mapped as its essentially the same thing).
The great thing about this memory, from the kernels standpoint, is it’s disposable. That is it should be possible to dump these pages if you need the memory for something else and this is exactly what the page cache does if memory is suddenly a requirement.
Anonymously-backed memory is a different matter. Memory from this region is anonymous because, well there is no file on disk which actually contains this data. This is normally made up of the application stack, the heap, anything in tmpfs and mapped data which is private and has been modified (since it can’t sync this stuff back to disk). Since there’s just no valid file on the filesystem to write these pages back to if they change, anonymously-backed memory is backed by the swap media.
Now, the kernel knows that when memory is scarce, its far far cheaper to ditch file-backed memory than the anonymously-mapped memory, that’s because anonymous data has a much higher chance of being “dirty” than file-backed data. In fact by default the kernel rates anonymously-backed memory as being 80 times more valuable than file-backed memory and this is actually what the swappiness modifier does on Linux (see this post if you want to know what exactly the swappiness parameter is altering).
The worst case scenario when it comes to a server going out of control and OOMing is based on the fact that it spends far too much time handling I/O requests, rather than honouring memory allocation requests.
There are two conditions which can invoke this criteria:
The first one, is the commonly thought of problem. That is, since so much of the memory lives inside of swap, anonymous memory needs to be swapped out of RAM, put back into swap, then something taken from swap and put into real RAM.
This operation is very expensive, slowing down the machine to the point where it can become an unrecoverable situation (because more ‘stuff’ is queuing for page demands than what can be served from I/O).
The second one is less considered but just as important. If you allocate almost all of your memory to real application data – you won’t last long. Nearly every application relies on reading files from the filesystem to operate. This could be because some instructions live in a shared library or because you need to read /etc/resolv.conf for a library call, or any other purpose. It’s entirely plausible to halt an operating system – yet have enough memory to fit all your applications, but since you’re queuing so much I/O requests up nothing has a chance to complete properly.
The kernel wants to use your swap to get rid of pages that are wasting memory so it can use that memory for something else.
Basically, in normal operation, the kernel loves to aggressively fill up page cache with data read from disk, this means it won’t read the disk for the same data. This is good design and can massively reduce I/O. Now, it might be that you have some application sat in memory that sleeps for 3 days, wakes up, does a bunch of work then sleeps another 3 days.
What the kernel would like to do with this data is swap it out to make space for filesystem activity instead, since you have a much higher chance of actually using these pages more often than the pages you use for your application. Swapping, in this sense might be a 16kb transaction to your swap media which you should hardly feel, but in return you freed up 16kb of memory that could be used to store four files worth of data.
The kernel definitely doesn’t want to use your swap to allocate more anonymous memory by swapping out some other anonymous memory, this is the situation people worry about the most and rightly so.
However, I should point out that if you have allocated so much memory that the kernel has no choice but to do this, this is a configuration problem of the system administrators, not the kernel itself – it’s just trying to do the best by the options you’ve given it!
No! If you have 1Gb of ram and 4Gb of swap, there’s not an 80% chance your data gets swapped! The kernel wants to use the swap only when pages in memory can be better served doing something else.
I would never do this. Swapping allows the O/S to get rid of memory you need to have but is never in use. If you have no swap, you’re just swallowing up memory you’ll never get back, for which you might see a significant performance improvement by allowing say, the page cache to have it instead.
Theoretically speaking, find out how much resident memory +20% for safeties like re-entrant library calls that have to allocate memory from heap – then set your swap to that amount. This would (theoretically anyway) permit the operating system to swap all anonymous memory out, if it had to, to make way for something more useful.
Remember, the kernel doesn’t want to swap to make way for more anonymous memory allocation here; it will only ever swap out pages not in use to favour something else that will make better use of the space instead.
If your swapping anon memory out only to allocate from more anonymous memory, you’re doing something wrong and need more RAM or to retune your application stack anyway.
You need to permit enough RAM to run all your applications of course, but you should probably allow an extra 2Gb of RAM for pagecache to fill up — maybe more.
Page cache makes your computer much faster and your disks last longer. If your thinking of running a webserver, having even more for pagecache is a good idea due to the sheer amount of static content you can retrieve and reuse from pagecache that would get served out (if your webserver throughput is 5mb/s, you really don’t want to be retrieving that 5mb/s content you generating from your disk after all).
If you’re really concerned, you can allocate more memory than you have:
Listen to that recommendation instead. Some applications are written in such a way that they deliberately invoke the kernel to have their pages active over all others. This is a nasty trick to be honest but it breaks the kernels ability to seamlessly manage memory when it happens.
If your vendor is giving you specifics then they likely fall into this category and listen to what they say instead.
The kernel will try its best to honour the setting you give it to the apps you run. But don’t be surprised if you OOM if you have 500 apache children all taking up 32M of memory. That’s a fault of the configuration you choose, not the memory management.
The kernel usually does a very good job of managing virtual memory properly. It’s almost always the case that your application is allocating more memory than you could possibly hope to work with, and that’s what is causing an OOM.
Swap used to be used as ‘spare memory’, but it’s not its primary purpose anymore, so don’t think of using it like that. Instead, appreciate that your kernel probably knows best what it wants to use your memory for. Give it space to make those decisions and you’ll benefit from an overall performance improvement.