ProsNcons of Linux: July 2018

Wednesday, July 4, 2018

What's going on?

Linux is borrowing unused memory for disk caching. This makes it looks like you are low on memory, but you are not! Everything is fine!

Why is it doing this?

Disk caching makes the system much faster and more responsive! There are no downsides, except for confusing newbies. It does not take memory away from applications in any way, ever!

What if I want to run more applications?

If your applications want more memory, they just take back a chunk that the disk cache borrowed. Disk cache can always be given back to applications immediately! You are not low on ram!

Do I need more swap?

No, disk caching only borrows the ram that applications don't currently want. It will not use swap. If applications want more memory, they just take it back from the disk cache. They will not start swapping.

How do I stop Linux from doing this?

You can't disable disk caching. The only reason anyone ever wants to disable disk caching is because they think it takes memory away from their applications, which it doesn't! Disk cache makes applications load faster and run smoother, but it NEVER EVER takes memory away from them! Therefore, there's absolutely no reason to disable it!

Why does top and free say all my ram is used if it isn't?

This is just a difference in terminology. Both you and Linux agree that memory taken by applications is "used", while memory that isn't used for anything is "free".

But how do you count memory that is currently used for something, but can still be made available to applications?

You might count that memory as "free" and/or "available". Linux instead counts it as "used", but also "available":

Memory that is	You'd call it	Linux calls it
used by applications	Used	Used
used, but can be made available	Free (or Available)	Used (and Available)
not used for anything	Free	Free

This "something" is (roughly) what top and free calls "buffers" and "cached". Since your and Linux's terminology differs, you might think you are low on ram when you're not.

How do I see how much free ram I really have?

To see how much ram your applications could use without swapping, run free -m and look at the "available" column:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           1504        1491          13           0         855      792
Swap:          2047           6        2041

(On installations from before 2016, look at "free" column in the "-/+ buffers/cache" row instead.)

This is your answer in megabytes. If you just naively look at "used" and "free", you'll think your ram is 99% full when it's really just 47%!

For a more detailed and technical description of what Linux counts as "available", see the commit that added the field.

When should I start to worry?

A healthy Linux system with more than enough memory will, after running for a while, show the following expected and harmless behavior:

free memory is close to 0
used memory is close to total
available memory (or "free + buffers/cache") has enough room (let's say, 20%+ of total)
swap used does not change

Warning signs of a genuine low memory situation that you may want to look into:

available memory (or "free + buffers/cache") is close to zero
swap used increases or fluctuates
dmesg | grep oom-killer shows the OutOfMemory-killer at work

How can I verify these things?

See this page for more details and how you can experiment with disk cache to show the effects described here. Few things make you appreciate disk caching more than measuring an order-of-magnitude speedup on your own hardware!

Write Back VS Write THROUGH

Write back is a storage method in which data is written into the cache every time a change occurs, but is written into the corresponding location in main memory only at specified intervals or under certain conditions.

When a data location is updated in write back mode, the data in cache is called fresh, and the corresponding data in main memory, which no longer matches the data in cache, is called stale. If a request for stale data in main memory arrives from another application program, the cache controller updates the data in main memory before the application accesses it.

Write back optimizes the system speed because it takes less time to write data into cache alone, as compared with writing the same data into both cache and main memory. However, this speed comes with the risk of data loss in case of a crash or other adverse event.

Write back is the preferred method of data storage in applications where occasional data loss events can be tolerated. In more critical applications such as banking and medical device control, an alternative method called write through practically eliminates the risk of data loss because every update gets written into both the main memory and the cache. In write through mode, the main memory data always stays fresh.

Tuesday, July 3, 2018

What is the meaning of %iowait as reported by utilities such as sar or top ?

source

Environment

Red Hat Enterprise Linux 4
Red Hat Enterprise Linux 5
Red Hat Enterprise Linux 6
Red Hat Enterprise Linux 7

Issue

What is the meaning of %iowait as reported by utilities such as sar or top ?

Resolution

Following is the definition taken from the sar manpage :

%iowait
Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

So, %iowait means that from the CPU point of view, no tasks were runnable, but at least one i/o was in progress. iowait is simply a form of idle time when nothing could be scheduled. The value may or may not be useful in indicating a performance problem, but it does tell us that the system is idle and could have taken more work.

Comments

A CPU can be in one of four states : user, sys, idle or iowait. Tools such as vmstat, iostat, sar, etc. print out these four states as a percentage. The kernel maintains this information using counters for each of the states and more. On each clock interrupt, the kernel checks the CPU state and increaments the appropriate counter. You can check the counters in /proc/stat.

Linux Performance Monitoring and Tuning

source
TOP Command
System Load

Linux system administrators should be proficient in Linux performance monitoring and tuning. This article gives a high level overview on how we should approach performance monitoring and tuning in Linux, and the various subsystems (and performance metrics) that needs to be monitored.

To identify system bottlenecks and come up with solutions to fix it, you should understand how various components of Linux works. For example, how the kernel gives preference to one Linux process over others using nice values, how I/O interrupts are handled, how the memory management works, how the Linux file system works, how the network layer is implemented in Linux, etc.,

Please note that understanding how various components (or subsystems) works is not the same as knowing what command to execute to get certain output. For example, you might know that “uptime” or “top” command gives the “load average”. But, if you don’t know what it means, and how the CPU (or process) subsystem works, you might not be able to understand it properly. Understanding the subsystems is an on-going task, which you’ll be constantly learning all the time.

On a very high level, following are the four subsystems that needs to be monitored.

CPU
Memory
I/O
Network

1. CPU

You should understand the four critical performance metrics for CPU — context switch, run queue, cpu utilization, and load average.

Context Switch

When CPU switches from one process (or thread) to another, it is called as context switch.
When a process switch happens, kernel stores the current state of the CPU (of a process or thread) in the memory.
Kernel also retrieves the previously stored state (of a process or thread) from the memory and puts it in the CPU.
Context switching is very essential for multitasking of the CPU.
However, a higher level of context switching can cause performance issues.

Run Queue

Run queue indicates the total number of active processes in the current queue for CPU.
When CPU is ready to execute a process, it picks it up from the run queue based on the priority of the process.
Please note that processes that are in sleep state, or i/o wait state are not in the run queue.
So, a higher number of processes in the run queue can cause performance issues.

Cpu Utilization

This indicates how much of the CPU is currently getting used.
This is fairly straight forward, and you can view the CPU utilization from the top command.
100% CPU utilization means the system is fully loaded.
So, a higher %age of CPU utilization will cause performance issues.

Load Average

This indicates the average CPU load over a specific time period.
On Linux, load average is displayed for the last 1 minute, 5 minutes, and 15 minutes. This is helpful to see whether the overall load on the system is going up or down.
For example, a load average of “0.75 1.70 2.10” indicates that the load on the system is coming down. 0.75 is the load average in the last 1 minute. 1.70 is the load average in the last 5 minutes. 2.10 is the load average in the last 15 minutes.
Please note that this load average is calculated by combining both the total number of process in the queue, and the total number of processes in the uninterruptable task status.

2. Network

A good understanding of TCP/IP concepts is helpful while analyzing any network issues. We’ll discuss more about this in future articles.
For network interfaces, you should monitor total number of packets (and bytes) received/sent through the interface, number of packets dropped, etc.,

3. I/O

I/O wait is the amount of time CPU is waiting for I/O. If you see consistent high i/o wait on you system, it indicates a problem in the disk subsystem.
You should also monitor reads/second, and writes/second. This is measured in blocks. i.e number of blocks read/write per second. These are also referred as bi and bo (block in and block out).
tps indicates total transactions per seconds, which is sum of rtps (read transactions per second) and wtps (write transactions per seconds).

4. Memory

As you know, RAM is your physical memory. If you have 4GB RAM installed on your system, you have 4GB of physical memory.
Virtual memory = Swap space available on the disk + Physical memory. The virtual memory contains both user space and kernel space.
Using either 32-bit or 64-bit system makes a big difference in determining how much memory a process can utilize.
On a 32-bit system a process can only access a maximum of 4GB virtual memory. On a 64-bit system there is no such limitation.
The unused RAM will be used as file system cache by the kernel.
The Linux system will swap when it needs more memory. i.e when it needs more memory than the physical memory. When it swaps, it writes the least used memory pages from the physical memory to the swap space on the disk.
Lot of swapping can cause performance issues, as the disk is much slower than the physical memory, and it takes time to swap the memory pages from RAM to disk.

All of the above 4 subsystems are interrelated. Just because you see a high reads/second, or writes/second, or I/O wait doesn’t mean the issue is there with the I/O sub-system. It also depends on what the application is doing. In most cases, the performance issue might be caused by the application that is running on the Linux system.

Remember the 80/20 rule — 80% of the performance improvement comes from tuning the application, and the rest 20% comes from tuning the infrastructure components.

There are various tools available to monitor Linux system performance. For example: top, free, ps, iostat, vmstat, mpstat, sar, tcpump, netstat, iozone, etc., We’ll be discussing more about these tools and how to use them in the upcoming articles in this series.

Following is the 4 step approach to identify and solve a performance issue.

Step 1 – Understand (and reproduce) the problem: Half of the problem is solved when you clearly understand what the problem is. Before trying to solve the performance issue, first work on clearly defining the problem. The more time you spend on understanding and defining the problem will give you enough details to look for the answers in the right place. If possible, try to reproduce the problem, or at least simulate a situation that you think closely resembles the problem. This will later help you to validate the solution you come up to fix the performance issue.
Step 2 – Monitor and collect data: After defining the problem clearly, monitor the system and try to collect as much data as possible on various subsystems. Based on this data, come up list of potential issues.
Step 3 – Eliminate and narrow down issues: After having a list of potential issues, dive into each one of them and eliminate any non issues. Narrow it down further to see whether it is an application issue, or an infrastructure issue. Drill down further and narrow it down to a specific component. For example, if it is an infrastructure issue, narrow it down and identify the subsystem that is causing the issue. If it is an I/O subsystem issue, narrow it down to a specific partition, or raid group, or LUN, or disk. Basically, keep drilling down until you put your finger on the root cause of the issue.
Step 4 – One change at a time: Once you’ve narrowed down to a small list of potential issues, don’t try to make multiple changes at one time. If you make multiple changes, you wouldn’t know which one fixed the original issue. Multiple changes at one time might also cause new issues, which you’ll be chasing after instead of fixing the original issue. So, make one change at a time, and see if it fixes the original problem.

In the upcoming articles of the performance series, we’ll discuss more about how to monitor and address performance issues on CPU, Memory, I/O and Network subsystem using various Linux performance monitoring tools.

Note : dstat command use for overall performance at a glance

Youtube:uptime,top,mpstat,iostat,vmstat ,free,ping,Dstat >>

mpstat,iostat

What is Swappiness?

source

What is Swappiness?

Most of Linux users that have installed a distribution before, must have noticed the existence of the “swap space” during the partitioning phase (it is usually found as /sda5). This is a dedicated space in your hard drive that is usually set to at least twice the capacity of your RAM, and along with it constitutes the total virtual memory of your system. From time to time, the Linux kernel utilizes this swap space by copying chunks from your RAM to the swap, allowing active processes that require more memory than it is physically available to run.

Swappiness is the kernel parameter that defines how much (and how often) your Linux kernel will copy RAM contents to swap. This parameter's default value is “60” and it can take anything from “0” to “100”. The higher the value of the swappiness parameter, the more aggressively your kernel will swap.

Why change it?

The default value is an one-fit-all solution that can't possibly be equally efficient in all of the individual use cases, hardware specifications and user needs. Moreover, the swappiness of a system is a primary factor that determines the overall functionality and speed performance of an OS. That said, it is very important to understand how swappiness works and how the various configurations of this element could improve the operation of your system and thus your everyday usage experience.

As RAM memory is so much larger and cheaper than it used to be in the past, there are many users nowadays that have enough memory to almost never need to use the swap file. The obvious benefit that derives from this is that no system resources are ever occupied by the swapping process and that cached files are not moved back and forth from the RAM to the swap and vise versa for no reason.

How to change it?

The swappiness parameter value is stored in a simple configuration text file located in /proc/sys/vm and is named “swappiness”. If you navigate there through the file manager, you will be able to locate the file and open it to check your system's swappiness. You can also check it or change it through the terminal (which is faster) by typing the following command: “sudo sysctl vm.swappiness=10” or whatever else between “0” and “100” instead of the value “10” that I used. To ensure that the swappiness value was correctly changed to the desired one, you simply type: “cat /proc/sys/vm/swappiness” on the terminal again and the active value will be outputted.

This change has an immediate effect in your system's operation and thus no rebooting is required. In fact, rebooting will revert the swappiness back to its default value (60). If you have thoroughly tested your desired swapping value and you found that it works reliably, you can make the change permanent by navigating to /etc/sysctl.conf which is yet another text configuration file. You may open this as root (administrator) and add the following line on the bottom to determine the swappiness: vm.swappiness=”your desire value here”. Then, save the text file and you're done!

Factors for consideration

There are some maths involved in the swappiness that should be considered when changing your settings. The parameter value set to “60” means that your kernel will swap when RAM reaches 40% capacity. Setting it to “100” means that your kernel will try to swap everything. Setting it to 10 (like I did on this tutorial) means that swap will be used when RAM is 90% full, so if you have enough RAM memory, this could be a safe option that would easily improve the performance of your system.

Some users though want the full cake and that means that they set swapping to “1” or even “0”. “1” is the minimum possible “active swapping” setting while “0” means disable swapping completely and only revert to when RAM is completely filled. While these settings can still theoretically work, testing it in low-spec systems of 2GB RAM or less may cause freezes and make the OS completely unresponsive. Generally, finding out what the golden means between overall system performance and response latency requires quite some experimentation (as always).

Monday, July 2, 2018

Interpreting /proc/meminfo !!!

Environment

Red Hat Enterprise Linux (RHEL) 5
Red Hat Enterprise Linux (RHEL) 6
Red Hat Enterprise Linux (RHEL) 7

Issue

I need an interpretation of /proc/meminfo output.
I want to compare the output of free -k to cat /proc/meminfo.

Resolution

For definition of /proc/meminfo fields in Red Hat Enterprise Linux (RHEL) releases prior to RHEL 5, please look at What is indicated by each value in /proc/meminfo?
Each field of cat /proc/meminfo will be discussed in the Diagnostics Steps.
The RHEL 5 output differs in some settings. This is also marked in the Diagnostic Steps.
RHEL 5 also has some fields no longer present in RHEL 6. For explanation on this issue have a look at Why are LowTotal, LowFree, HighTotal, and HighFree missing from /proc/meminfo on x86_64 RHEL 6?
For more information on the output of the free command see How do I view system memory utilization in Red Hat Enterprise Linux?
RHEL 7 has an additional field called MemAvailable in /proc/meminfo
RHEL 7 has a slightly changed output of the free command

Comparing the output

free -k output (RHEL 5 and RHEL 6):

             total       used       free     shared    buffers     cached
Mem:       7778104    2971960    4806144          0     211756    1071092
-/+ buffers/cache:    1689112    6088992
Swap:      4194296          0    4194296

free -k output (RHEL 7):

              total        used        free      shared  buff/cache   available
Mem:        1012952      252740      158732       11108      601480      543584
Swap:       1048572        5380     1043192

Relevant fields from /proc/meminfo to match them against the output of free -k:

MemTotal:        7778104 kB
MemFree:         4806144 kB
Buffers:          211756 kB
Cached:          1071092 kB
SwapTotal:       4194296 kB
SwapFree:        4194296 kB

For RHEL 7 there is an additional field available, which is used instead of the calculation for -/+ buffers/cache line:

MemAvailable:     543584 kB

Matching output of `free -k` to `/proc/meminfo`

The following table shows how to get the free output matched to the /proc/meminfo fields.

`free output`	coresponding `/proc/meminfo` fields
`Mem: total`	`MemTotal`
`Mem: used`	`MemTotal - MemFree`
`Mem: free`	`MemFree`
`Mem: shared` (can be ignored nowadays. It has no meaning.)	N/A
`Mem: buffers`	`Buffers`
`Mem: cached`	`Cached`
`-/+ buffers/cache: used`	`MemTotal - (MemFree + Buffers + Cached)`
`-/+ buffers/cache: free`	`MemFree + Buffers + Cached`
`Swap: total`	`SwapTotal`
`Swap: used`	`SwapTotal - SwapFree`
`Swap: free`	`SwapFree`

Root Cause

Analyzing memory consumption

Diagnostic Steps

Most stuff is taken from the kernel documentation (Documentation/filesystems/proc.txt and Documentation/vm/hugetlbpage.txt)

High Level statistics

RHEL 5, RHEL 6 and RHEL 7

MemTotal: Total usable memory
MemFree: The amount of physical memory not used by the system
Buffers: Memory in buffer cache, so relatively temporary storage for raw disk blocks. This shouldn't get very large.
Cached: Memory in the pagecache (Diskcache and Shared Memory)
SwapCached: Memory that is present within main memory, but also in the swapfile. (If memory is needed this area does not need to be swapped out AGAIN because it is already in the swapfile. This saves I/O and increases performance if machine runs short on memory.)

RHEL 7 only

MemAvailable: An estimate of how much memory is available for starting new applications, without swapping.

Detailed Level statistics

RHEL 5, RHEL 6 and RHEL 7

Active: Memory that has been used more recently and usually not swapped out or reclaimed
Inactive: Memory that has not been used recently and can be swapped out or reclaimed

RHEL 6 and RHEL 7 only

Active(anon): Anonymous memory that has been used more recently and usually not swapped out
Inactive(anon): Anonymous memory that has not been used recently and can be swapped out
Active(file): Pagecache memory that has been used more recently and usually not reclaimed until needed
Inactive(file): Pagecache memory that can be reclaimed without huge performance impact
Unevictable: Unevictable pages can't be swapped out for a variety of reasons
Mlocked: Pages locked to memory using the mlock() system call. Mlocked pages are also Unevictable.

Memory statistics

RHEL 5, RHEL 6 and RHEL 7

SwapTotal: Total swap space available
SwapFree: The remaining swap space available
Dirty: Memory waiting to be written back to disk
Writeback: Memory which is actively being written back to disk
AnonPages: Non-file backed pages mapped into userspace page tables
Mapped: Files which have been mmaped, such as libraries
Slab: In-kernel data structures cache
PageTables: Amount of memory dedicated to the lowest level of page tables. This can increase to a high value if a lot of processes are attached to the same shared memory segment.
NFS_Unstable: NFS pages sent to the server, but not yet commited to the storage
Bounce: Memory used for block device bounce buffers
CommitLimit: Based on the overcommit ratio (vm.overcommit_ratio), this is the total amount of memory currently available to be allocated on the system. This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in vm.overcommit_memory).
Committed_AS: The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet.
VmallocTotal: total size of vmalloc memory area
VmallocUsed: amount of vmalloc area which is used
VmallocChunk: largest contiguous block of vmalloc area which is free
HugePages_Total: Number of hugepages being allocated by the kernel (Defined with vm.nr_hugepages)
HugePages_Free: The number of hugepages not being allocated by a process
HugePages_Rsvd: The number of hugepages for which a commitment to allocate from the pool has been made, but no allocation has yet been made.
Hugepagesize: The size of a hugepage (usually 2MB on an Intel based system)

RHEL 6 and RHEL 7 only

Shmem: Total used shared memory (shared between several processes, thus including RAM disks, SYS-V-IPC and BSD like SHMEM)
SReclaimable: The part of the Slab that might be reclaimed (such as caches)
SUnreclaim: The part of the Slab that can't be reclaimed under memory pressure
KernelStack: The memory the kernel stack uses. This is not reclaimable.
WritebackTmp: Memory used by FUSE for temporary writeback buffers
HardwareCorrupted: The amount of RAM the kernel identified as corrupted / not working
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
HugePages_Surp: The number of hugepages in the pool above the value in vm.nr_hugepages. The maximum number of surplus hugepages is controlled by vm.nr_overcommit_hugepages.
DirectMap4k: The amount of memory being mapped to standard 4k pages
DirectMap2M: The amount of memory being mapped to hugepages (usually 2MB in size)

Wednesday, July 4, 2018

What's going on?

Why is it doing this?

What if I want to run more applications?

Do I need more swap?

How do I stop Linux from doing this?

Why does top and free say all my ram is used if it isn't?

How do I see how much free ram I really have?

When should I start to worry?

How can I verify these things?

Tuesday, July 3, 2018

Environment

Issue

Resolution

Comments

1. CPU

Context Switch

Run Queue

Cpu Utilization

Load Average

2. Network

3. I/O

4. Memory

Why change it?

How to change it?

Factors for consideration

Monday, July 2, 2018

Environment

Issue

Resolution

Comparing the output

Matching output of free -k to /proc/meminfo

Root Cause

Diagnostic Steps

High Level statistics

RHEL 5, RHEL 6 and RHEL 7

RHEL 7 only

Detailed Level statistics

RHEL 5, RHEL 6 and RHEL 7

RHEL 6 and RHEL 7 only

Memory statistics

RHEL 5, RHEL 6 and RHEL 7

RHEL 6 and RHEL 7 only

Matching output of `free -k` to `/proc/meminfo`