Monday, January 24, 2022

RAID 0, RAID 1, RAID 5, RAID 10 Explained with Diagrams

 source : https://rickardnobel.se/how-raid5-works/

How RAID 5 actually works

In this article we will look in some detail how the RAID 5 parity is created and how it is possible to actually “read” from a destroyed disk in a RAID 5 set.

There are many sources on the web about the general principle of RAID 5, so we will not be covering that part here. In short, and as you might know, the RAID level 5 works with any number of disk equal or greater than 3 and places a parity sum on one disks in the set to be able to recover from a disk failure. (Striped blocks with distributed parity.) We can for example combine eight physical disk into a RAID5 set while only consuming the size of one disk for parity information. If any single drive breaks down we would still have full access to the data that was on the destroyed disk.

To understand how this is possible we have to look at the smallest unit, the binary bit, which could be 1 or 0. When doing mathematical calculations in binary we have several so called boolean algebra operations, for example the AND operation and the OR operation.

One of these low level logical operations is used heavily in RAID5: the XOR (“exclusive or”). XOR takes two binary digits and produces a true result if exactly one digit is true, (i.e. the other digit needs to be false).

Value AValue BXOR result
000
011
101
110

This means that for example 1 XOR 0 = 1, and 1 XOR 1 = 0. Only one binary digits may be 1 for the result to be “true”, that is, 1.

Let us now see how the parity calculations are done in a RAID 5 set using XOR. If we assume we have a small RAID 5 set of four disks and some data is written to it. For simplicity we see only a half byte (4 bits), but the principle is true no matter of the stripe size or the number of disks.

On the first three disks we have the binary information 1010, 1100 and 0011, here representing some data, and we now have to calculate the parity information for the fourth disk.

If looking at the first “column” of the disks to the left we have 1, 1 and 0. If we use XOR to calculate the result that would be:

1 XOR 1 XOR 0 = Parity bit

This could be written as: (1 XOR 1) XOR 0 = Parity bit

This means first 1 XOR 1 = 0 for the first two disks and then the result of that, the zero, against the bit on the third disk. That is, the first result with the last disk, also 0, means 0 XOR 0 = 0, which would give the final result to 0.

For the next “column”, to the right above, we have 0, 1 and 0. We do first 0 XOR 1 = 1 and then this result with the third disk: 1 XOR 0 = 1. The parity bit will here be 1.

For the third column we would have:

1 XOR 0 XOR 1 = Parity

Broken down: 1 XOR 0 = 1 and then 1 XOR 1 = 0

And finally the fourth column:

0 XOR 0 XOR 1 = 1

This will for all four columns end up with the parity sum of 0101.

 
 
 
 
 
 

If any of Disk number 1, 2 or 3 would break the parity information on Disk 4 could be used to recreate the missing data. Let us look how this is done. If we assume that disk number 2 unexpectedly goes down we have lost all read and write access to the real Disk 2, however with the help of the already recorded parity we might be able to calculate the information which is missing.

The primary feature in a RAID5 disk set is to be able to “access” the data on a missing disk. This is done by running the exact same XOR operation over the remaining disks and the parity information. Let us look at the first column again. 1 XOR 0 = 1 (for disk 1 and disk 3) and then 1 XOR 0 (the parity) = 1. This means that there must have been a binary digit of 1 on the missing disk. If we do the same operation on the other columns we will end up with 1100, which is exactly the same data that was on the failed drive.

The XOR operation itself is extremely quick and easily handled by the CPU or RAID controller, but the big downside is that we have to read against ALL other disks to recreate the data on the missing one. If having for example eight disks in the set with one broken, then a single read IO against the missing disk will create seven more disk IOs to calculate the lost data on the fly.

The XOR operation works perfect mathematically with one disk missing, but the moment a second disk is lost then we no longer have enough information to make the calculations. While it is possible to keep using the RAID5 set with one disk missing for some time with degraded performance, it is naturally very good to replace the damaged disk and begin the full re-creation as soon as possible (hot spare is quite handy here).

See also this blog post on the RAID 5 write penalty.


Saturday, September 18, 2021

Difference between Thick and Thin Provisioning

 source : 

https://forum.huawei.com/enterprise/en/difference-between-thick-and-thin-provisioning/thread/523761-893

 

Thick Provisioning

Thick provisioning is a type of storage pre-allocation. With thick provisioning, the complete amount of virtual disk storage capacity is pre-allocated on the physical storage when the virtual disk is created. A thick-provisioned virtual disk consumes all the space allocated to it in the datastore right from the start, so the space is unavailable for use by other virtual machines.

There are two sub-types of thick-provisioned virtual disks:

A Lazy zeroed disk is a disk that takes all of its space at the time of its creation, but this space may contain some old data on the physical media. This old data is not erased or written over, so it needs to be "zeroed out" before new data can be written to the blocks. This type of disk can be created more quickly, but its performance will be lower for the first writes due to the increased IOPS (input/output operations per second) for new blocks;

An Eager zeroed disk is a disk that gets all of the required space still at the time of its creation, and the space is wiped clean of any previous data on the physical media. Creating eager zeroed disks takes longer, because zeroes are written to the entire disk, but their performance is faster during the first writes. This sub-type of thick-provisioned virtual disk supports clustering features, such as fault tolerance.

004429twjksrrzhj19g6zh.jpg?a01.jpg

 

For data security reasons, eager zeroing is more common than lazy zeroing with thick-provisioned virtual disks. Why? When you delete a VM disk, the data on the datastore is not totally erased; the blocks are simply marked as available, until the operating system overwrites them. If you create an eager zeroed virtual disk on this datastore, the disk area will be totally erased (i.e., zeroed), thus preventing anyone with bad intentions from being able to recover the previous data – even if they use specialized third-party software.

Thin Provisioning

Thin provisioning is another type of storage pre-allocation. A thin-provisioned virtual disk consumes only the space that it needs initially and grows with time according to demand.

For example, if you create a new thin-provisioned 30GB virtual disk and copy 10 GB of files to it, the size of the resulting VM disk file will be 10 GB, whereas you would have a 30GB VM disk file if you had chosen to use a thick-provisioned disk.

004531tvbim6i0suvxs6ca.jpg?a02.jpg

 

Thin-provisioned virtual disks are quick to create and useful for saving storage space. The performance of a thin-provisioned disk is not higher than that of a lazy zeroed thick-provisioned disk, because for both of these disk types, zeroes have to be written before writing data to a new block. Note that when you delete your data from a thin-provisioned virtual disk, the disk size is not reduced automatically. This is because the operating system deletes only the indexes from the file table that refer to the file body in the file system; it marks the blocks that belonged to "deleted" files as free and accessible for new data to be written onto. This is why we see file removal as instant. If it were a full deletion, where zeroes were written over the blocks that the deleted files occupied, it would take about the same amount of time as copying the files in question. See the simplified illustration below.

004543rnyt3f96c3cik0t2.jpg?a03.jpg



Wednesday, July 4, 2018

NO RAM AVAILABLE IN LINUX !!

source

What's going on?

Linux is borrowing unused memory for disk caching. This makes it looks like you are low on memory, but you are not! Everything is fine!

Why is it doing this?

Disk caching makes the system much faster and more responsive! There are no downsides, except for confusing newbies. It does not take memory away from applications in any way, ever!

What if I want to run more applications?

If your applications want more memory, they just take back a chunk that the disk cache borrowed. Disk cache can always be given back to applications immediately! You are not low on ram!

Do I need more swap?

No, disk caching only borrows the ram that applications don't currently want. It will not use swap. If applications want more memory, they just take it back from the disk cache. They will not start swapping.

How do I stop Linux from doing this?

You can't disable disk caching. The only reason anyone ever wants to disable disk caching is because they think it takes memory away from their applications, which it doesn't! Disk cache makes applications load faster and run smoother, but it NEVER EVER takes memory away from them! Therefore, there's absolutely no reason to disable it!

Why does top and free say all my ram is used if it isn't?

This is just a difference in terminology. Both you and Linux agree that memory taken by applications is "used", while memory that isn't used for anything is "free".
But how do you count memory that is currently used for something, but can still be made available to applications?
You might count that memory as "free" and/or "available". Linux instead counts it as "used", but also "available":
Memory that isYou'd call itLinux calls it
used by applicationsUsedUsed
used, but can be made availableFree (or Available)Used (and Available)
not used for anythingFreeFree
This "something" is (roughly) what top and free calls "buffers" and "cached". Since your and Linux's terminology differs, you might think you are low on ram when you're not.

How do I see how much free ram I really have?

To see how much ram your applications could use without swapping, run free -m and look at the "available" column:
$ free -m
              total        used        free      shared  buff/cache   available
Mem:           1504        1491          13           0         855      792
Swap:          2047           6        2041

(On installations from before 2016, look at "free" column in the "-/+ buffers/cache" row instead.)
This is your answer in megabytes. If you just naively look at "used" and "free", you'll think your ram is 99% full when it's really just 47%!
For a more detailed and technical description of what Linux counts as "available", see the commit that added the field.

When should I start to worry?

healthy Linux system with more than enough memory will, after running for a while, show the following expected and harmless behavior:
  • free memory is close to 0
  • used memory is close to total
  • available memory (or "free + buffers/cache") has enough room (let's say, 20%+ of total)
  • swap used does not change
Warning signs of a genuine low memory situation that you may want to look into:
  • available memory (or "free + buffers/cache") is close to zero
  • swap used increases or fluctuates
  • dmesg | grep oom-killer shows the OutOfMemory-killer at work

How can I verify these things?

See this page for more details and how you can experiment with disk cache to show the effects described here. Few things make you appreciate disk caching more than measuring an order-of-magnitude speedup on your own hardware!

Write Back VS Write THROUGH


Write back is a storage method in which data is written into the cache every time a change occurs, but is written into the corresponding location in main memory only at specified intervals or under certain conditions.
When a data location is updated in write back mode, the data in cache is called fresh, and the corresponding data in main memory, which no longer matches the data in cache, is called stale. If a request for stale data in main memory arrives from another application program, the cache controller updates the data in main memory before the application accesses it.
Write back optimizes the system speed because it takes less time to write data into cache alone, as compared with writing the same data into both cache and main memory. However, this speed comes with the risk of data loss in case of a crash or other adverse event.
Write back is the preferred method of data storage in applications where occasional data loss events can be tolerated. In more critical applications such as banking and medical device control, an alternative method called write through practically eliminates the risk of data loss because every update gets written into both the main memory and the cache. In write through mode, the main memory data always stays fresh.

Tuesday, July 3, 2018

What is the meaning of %iowait as reported by utilities such as sar or top ?

source

Environment

  • Red Hat Enterprise Linux 4
  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

Issue

  • What is the meaning of %iowait as reported by utilities such as sar or top ?

Resolution

  • Following is the definition taken from the sar manpage :
%iowait
           Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
  • So, %iowait means that from the CPU point of view, no tasks were runnable, but at least one i/o was in progress. iowait is simply a form of idle time when nothing could be scheduled. The value may or may not be useful in indicating a performance problem, but it does tell us that the system is idle and could have taken more work.

Comments

  • A CPU can be in one of four states :  user, sys, idle or iowait. Tools such as vmstat, iostat, sar, etc. print out these four states as a percentage. The kernel maintains this information using counters for each of the states and more. On each clock interrupt, the kernel checks the CPU state and increaments the appropriate counter. You can check the counters in /proc/stat.

Linux Performance Monitoring and Tuning


source
TOP Command
System Load

Linux system administrators should be proficient in Linux performance monitoring and tuning. This article gives a high level overview on how we should approach performance monitoring and tuning in Linux, and the various subsystems (and performance metrics) that needs to be monitored.
To identify system bottlenecks and come up with solutions to fix it, you should understand how various components of Linux works. For example, how the kernel gives preference to one Linux process over others using nice values, how I/O interrupts are handled, how the memory management works, how the Linux file system works, how the network layer is implemented in Linux, etc.,
Please note that understanding how various components (or subsystems) works is not the same as knowing what command to execute to get certain output. For example, you might know that “uptime” or “top” command gives the “load average”. But, if you don’t know what it means, and how the CPU (or process) subsystem works, you might not be able to understand it properly. Understanding the subsystems is an on-going task, which you’ll be constantly learning all the time.
On a very high level, following are the four subsystems that needs to be monitored.
  • CPU
  • Memory
  • I/O
  • Network

1. CPU

You should understand the four critical performance metrics for CPU — context switch, run queue, cpu utilization, and load average.

Context Switch

  • When CPU switches from one process (or thread) to another, it is called as context switch.
  • When a process switch happens, kernel stores the current state of the CPU (of a process or thread) in the memory.
  • Kernel also retrieves the previously stored state (of a process or thread) from the memory and puts it in the CPU.
  • Context switching is very essential for multitasking of the CPU.
  • However, a higher level of context switching can cause performance issues.

Run Queue

  • Run queue indicates the total number of active processes in the current queue for CPU.
  • When CPU is ready to execute a process, it picks it up from the run queue based on the priority of the process.
  • Please note that processes that are in sleep state, or i/o wait state are not in the run queue.
  • So, a higher number of processes in the run queue can cause performance issues.

Cpu Utilization

  • This indicates how much of the CPU is currently getting used.
  • This is fairly straight forward, and you can view the CPU utilization from the top command.
  • 100% CPU utilization means the system is fully loaded.
  • So, a higher %age of CPU utilization will cause performance issues.

Load Average

  • This indicates the average CPU load over a specific time period.
  • On Linux, load average is displayed for the last 1 minute, 5 minutes, and 15 minutes. This is helpful to see whether the overall load on the system is going up or down.
  • For example, a load average of “0.75 1.70 2.10” indicates that the load on the system is coming down. 0.75 is the load average in the last 1 minute. 1.70 is the load average in the last 5 minutes. 2.10 is the load average in the last 15 minutes.
  • Please note that this load average is calculated by combining both the total number of process in the queue, and the total number of processes in the uninterruptable task status.

2. Network

  • A good understanding of TCP/IP concepts is helpful while analyzing any network issues. We’ll discuss more about this in future articles.
  • For network interfaces, you should monitor total number of packets (and bytes) received/sent through the interface, number of packets dropped, etc.,

3. I/O

  • I/O wait is the amount of time CPU is waiting for I/O. If you see consistent high i/o wait on you system, it indicates a problem in the disk subsystem.
  • You should also monitor reads/second, and writes/second. This is measured in blocks. i.e number of blocks read/write per second. These are also referred as bi and bo (block in and block out).
  • tps indicates total transactions per seconds, which is sum of rtps (read transactions per second) and wtps (write transactions per seconds).

4. Memory

  • As you know, RAM is your physical memory. If you have 4GB RAM installed on your system, you have 4GB of physical memory.
  • Virtual memory = Swap space available on the disk + Physical memory. The virtual memory contains both user space and kernel space.
  • Using either 32-bit or 64-bit system makes a big difference in determining how much memory a process can utilize.
  • On a 32-bit system a process can only access a maximum of 4GB virtual memory. On a 64-bit system there is no such limitation.
  • The unused RAM will be used as file system cache by the kernel.
  • The Linux system will swap when it needs more memory. i.e when it needs more memory than the physical memory. When it swaps, it writes the least used memory pages from the physical memory to the swap space on the disk.
  • Lot of swapping can cause performance issues, as the disk is much slower than the physical memory, and it takes time to swap the memory pages from RAM to disk.
All of the above 4 subsystems are interrelated. Just because you see a high reads/second, or writes/second, or I/O wait doesn’t mean the issue is there with the I/O sub-system. It also depends on what the application is doing. In most cases, the performance issue might be caused by the application that is running on the Linux system.
Remember the 80/20 rule — 80% of the performance improvement comes from tuning the application, and the rest 20% comes from tuning the infrastructure components.
There are various tools available to monitor Linux system performance. For example: top, free, ps, iostat, vmstat, mpstat, sar, tcpump, netstat, iozone, etc., We’ll be discussing more about these tools and how to use them in the upcoming articles in this series.
Following is the 4 step approach to identify and solve a performance issue.
  • Step 1 – Understand (and reproduce) the problem: Half of the problem is solved when you clearly understand what the problem is. Before trying to solve the performance issue, first work on clearly defining the problem. The more time you spend on understanding and defining the problem will give you enough details to look for the answers in the right place. If possible, try to reproduce the problem, or at least simulate a situation that you think closely resembles the problem. This will later help you to validate the solution you come up to fix the performance issue.
  • Step 2 – Monitor and collect data: After defining the problem clearly, monitor the system and try to collect as much data as possible on various subsystems. Based on this data, come up list of potential issues.
  • Step 3 – Eliminate and narrow down issues: After having a list of potential issues, dive into each one of them and eliminate any non issues. Narrow it down further to see whether it is an application issue, or an infrastructure issue. Drill down further and narrow it down to a specific component. For example, if it is an infrastructure issue, narrow it down and identify the subsystem that is causing the issue. If it is an I/O subsystem issue, narrow it down to a specific partition, or raid group, or LUN, or disk. Basically, keep drilling down until you put your finger on the root cause of the issue.
  • Step 4 – One change at a time: Once you’ve narrowed down to a small list of potential issues, don’t try to make multiple changes at one time. If you make multiple changes, you wouldn’t know which one fixed the original issue. Multiple changes at one time might also cause new issues, which you’ll be chasing after instead of fixing the original issue. So, make one change at a time, and see if it fixes the original problem.
In the upcoming articles of the performance series, we’ll discuss more about how to monitor and address performance issues on CPU, Memory, I/O and Network subsystem using various Linux performance monitoring tools.

Note : dstat command use for overall performance at a glance
Youtube:uptime,top,mpstat,iostat,vmstat ,free,ping,Dstat   >>  mpstat,iostat

What is Swappiness?

source

What is Swappiness?
Most of Linux users that have installed a distribution before, must have noticed the existence of the “swap space” during the partitioning phase (it is usually found as /sda5). This is a dedicated space in your hard drive that is usually set to at least twice the capacity of your RAM, and along with it constitutes the total virtual memory of your system. From time to time, the Linux kernel utilizes this swap space by copying chunks from your RAM to the swap, allowing active processes that require more memory than it is physically available to run.
Swappiness is the kernel parameter that defines how much (and how often) your Linux kernel will copy RAM contents to swap. This parameter's default value is “60” and it can take anything from “0” to “100”. The higher the value of the swappiness parameter, the more aggressively your kernel will swap.

Why change it?

The default value is an one-fit-all solution that can't possibly be equally efficient in all of the individual use cases, hardware specifications and user needs. Moreover, the swappiness of a system is a primary factor that determines the overall functionality and speed performance of an OS. That said, it is very important to understand how swappiness works and how the various configurations of this element could improve the operation of your system and thus your everyday usage experience.
As RAM memory is so much larger and cheaper than it used to be in the past, there are many users nowadays that have enough memory to almost never need to use the swap file. The obvious benefit that derives from this is that no system resources are ever occupied by the swapping process and that cached files are not moved back and forth from the RAM to the swap and vise versa for no reason.

How to change it?

The swappiness parameter value is stored in a simple configuration text file located in /proc/sys/vm and is named “swappiness”. If you navigate there through the file manager, you will be able to locate the file and open it to check your system's swappiness. You can also check it or change it through the terminal (which is faster) by typing the following command: “sudo sysctl vm.swappiness=10” or whatever else between “0” and “100” instead of the value “10” that I used. To ensure that the swappiness value was correctly changed to the desired one, you simply type: “cat /proc/sys/vm/swappiness” on the terminal again and the active value will be outputted.
This change has an immediate effect in your system's operation and thus no rebooting is required. In fact, rebooting will revert the swappiness back to its default value (60). If you have thoroughly tested your desired swapping value and you found that it works reliably, you can make the change permanent by navigating to /etc/sysctl.conf which is yet another text configuration file. You may open this as root (administrator) and add the following line on the bottom to determine the swappiness: vm.swappiness=”your desire value here”. Then, save the text file and you're done!

Factors for consideration

There are some maths involved in the swappiness that should be considered when changing your settings. The parameter value set to “60” means that your kernel will swap when RAM reaches 40% capacity. Setting it to “100” means that your kernel will try to swap everything. Setting it to 10 (like I did on this tutorial) means that swap will be used when RAM is 90% full, so if you have enough RAM memory, this could be a safe option that would easily improve the performance of your system.
Some users though want the full cake and that means that they set swapping to “1” or even “0”. “1” is the minimum possible “active swapping” setting while “0” means disable swapping completely and only revert to when RAM is completely filled. While these settings can still theoretically work, testing it in low-spec systems of 2GB RAM or less may cause freezes and make the OS completely unresponsive. Generally, finding out what the golden means between overall system performance and response latency requires quite some experimentation (as always).