Continuing from ‘what does the swappiness parameter actually do?‘ I’ll try to explain how much SWAP space there should be on a server, if you stay with me..
A bit about virtual memory
I feel there is a common misconception these days about the value of swap and what its purpose is. It is often considered that swap is meant as a ‘reserve bank’ for memory when you’re running low. Yes, this is partly true, but the kernel doesn’t want to use your swap as a reserve bank. Additionally the kernel never wants to invoke the disk to get the data you are after!
In application space, there are a number of things the kernel will be keeping in memory:
- Mapped file data
- tmpfs filesystems
- Raw application memory allocated at runtime
- Application code (such as stuff in the data segment of ELF formats)
- Privately mapped file data.
For the purpose of memory management, memory allocation is backed in some form or another by a supporting device.
File-backed memory is memory which has come from a file and on a typical O/S makes up the vast majority of memory allocation on the system. It includes files such as shared libraries which have been loaded, files read from disk and stored in the page cache, and files mapped from disk (in fact, the kernel makes no distinction of pages between files in the page cache and files mapped as its essentially the same thing).
The great thing about this memory, from the kernels standpoint, is it’s disposable. That is it should be possible to dump these pages if you need the memory for something else and this is exactly what the page cache does if memory is suddenly a requirement.
Anonymously-backed memory is a different matter. Memory from this region is anonymous because, well there is no file on disk which actually contains this data. This is normally made up of the application stack, the heap, anything in tmpfs and mapped data which is private and has been modified (since it can’t sync this stuff back to disk). Since there’s just no valid file on the filesystem to write these pages back to if they change, anonymously-backed memory is backed by the swap media.
Now, the kernel knows that when memory is scarce, its far far cheaper to ditch file-backed memory than the anonymously-mapped memory, that’s because anonymous data has a much higher chance of being “dirty” than file-backed data. In fact by default the kernel rates anonymously-backed memory as being 80 times more valuable than file-backed memory and this is actually what the swappiness modifier does on Linux (see this post if you want to know what exactly the swappiness parameter is altering).
The worst case scenario
The worst case scenario when it comes to a server going out of control and OOMing is based on the fact that it spends far too much time handling I/O requests, rather than honouring memory allocation requests.
There are two conditions which can invoke this criteria:
- Swapping anonymous memory around all the time to fetch pages in demand or allocate more pages for an application
- Spending too much time fetching data from disk because it’s not in memory than you give to execute processes wanting CPU time.
The first one, is the commonly thought of problem. That is, since so much of the memory lives inside of swap, anonymous memory needs to be swapped out of RAM, put back into swap, then something taken from swap and put into real RAM.
This operation is very expensive, slowing down the machine to the point where it can become an unrecoverable situation (because more ‘stuff’ is queuing for page demands than what can be served from I/O).
The second one is less considered but just as important. If you allocate almost all of your memory to real application data – you won’t last long. Nearly every application relies on reading files from the filesystem to operate. This could be because some instructions live in a shared library or because you need to read /etc/resolv.conf for a library call, or any other purpose. It’s entirely plausible to halt an operating system – yet have enough memory to fit all your applications, but since you’re queuing so much I/O requests up nothing has a chance to complete properly.
What the kernel wants to do with your swap
The kernel wants to use your swap to get rid of pages that are wasting memory so it can use that memory for something else.
Basically, in normal operation, the kernel loves to aggressively fill up page cache with data read from disk, this means it won’t read the disk for the same data. This is good design and can massively reduce I/O. Now, it might be that you have some application sat in memory that sleeps for 3 days, wakes up, does a bunch of work then sleeps another 3 days.
What the kernel would like to do with this data is swap it out to make space for filesystem activity instead, since you have a much higher chance of actually using these pages more often than the pages you use for your application. Swapping, in this sense might be a 16kb transaction to your swap media which you should hardly feel, but in return you freed up 16kb of memory that could be used to store four files worth of data.
What the kernel doesn’t want to use swap for
The kernel definitely doesn’t want to use your swap to allocate more anonymous memory by swapping out some other anonymous memory, this is the situation people worry about the most and rightly so.
However, I should point out that if you have allocated so much memory that the kernel has no choice but to do this, this is a configuration problem of the system administrators, not the kernel itself – it’s just trying to do the best by the options you’ve given it!
If you have a huge amount of swap do you increase the chance you use it?
No! If you have 1Gb of ram and 4Gb of swap, there’s not an 80% chance your data gets swapped! The kernel wants to use the swap only when pages in memory can be better served doing something else.
Is it advantageous to not use swap at all?
I would never do this. Swapping allows the O/S to get rid of memory you need to have but is never in use. If you have no swap, you’re just swallowing up memory you’ll never get back, for which you might see a significant performance improvement by allowing say, the page cache to have it instead.
What’s the best swap to have?
Theoretically speaking, find out how much resident memory +20% for safeties like re-entrant library calls that have to allocate memory from heap – then set your swap to that amount. This would (theoretically anyway) permit the operating system to swap all anonymous memory out, if it had to, to make way for something more useful.
If I get the kernel opportunity to swap out everything that’s dangerous, right?
Remember, the kernel doesn’t want to swap to make way for more anonymous memory allocation here; it will only ever swap out pages not in use to favour something else that will make better use of the space instead.
If your swapping anon memory out only to allocate from more anonymous memory, you’re doing something wrong and need more RAM or to retune your application stack anyway.
How much RAM would you need?
You need to permit enough RAM to run all your applications of course, but you should probably allow an extra 2Gb of RAM for pagecache to fill up — maybe more.
Page cache makes your computer much faster and your disks last longer. If your thinking of running a webserver, having even more for pagecache is a good idea due to the sheer amount of static content you can retrieve and reuse from pagecache that would get served out (if your webserver throughput is 5mb/s, you really don’t want to be retrieving that 5mb/s content you generating from your disk after all).
What to do if you really don’t trust Linux to swap properly
If you’re really concerned, you can allocate more memory than you have:
- Set your swap amount to be no more than your RAM
- Set /proc/sys/vm/overcommit_memory to 2
- Set /proc/sys/vm/overcommit_ratio to be a value which can never actually exceed physical RAM limits. See the kernel documentation to work out what that number would be to you.
What’s the best way to tune my memory for my application?
- Understand what swappiness actually does
- Use CGroups to allocate the correct resources per application
- Change the overcommit mode mentioned above to make the operating system enforce strict limits
- Use cgroups as above to set OOM priorities on applications you really want to keep if you’re out of memory, and application you really want to ditch if you’re out of memory.
What if the vendor recommends a different configuration?
Listen to that recommendation instead. Some applications are written in such a way that they deliberately invoke the kernel to have their pages active over all others. This is a nasty trick to be honest but it breaks the kernels ability to seamlessly manage memory when it happens.
If your vendor is giving you specifics then they likely fall into this category and listen to what they say instead.
- Swap is meant to be used as a place to put wasted memory. Not as ‘spare memory’
- Having huge amounts of swap has no affect whatsoever on your chances of using it
- The kernel really wants to stop you accessing the disk for your data. This applies just as – if not more equally to page cache as it does to swap space
The kernel will try its best to honour the setting you give it to the apps you run. But don’t be surprised if you OOM if you have 500 apache children all taking up 32M of memory. That’s a fault of the configuration you choose, not the memory management.
The kernel usually does a very good job of managing virtual memory properly. It’s almost always the case that your application is allocating more memory than you could possibly hope to work with, and that’s what is causing an OOM.
Swap used to be used as ‘spare memory’, but it’s not its primary purpose anymore, so don’t think of using it like that. Instead, appreciate that your kernel probably knows best what it wants to use your memory for. Give it space to make those decisions and you’ll benefit from an overall performance improvement.