Thursday, November 27, 2014

List the contents of a tar or tar.gz file


Q. How do I list the contents of a tar or tar.gz file?
A. GNU/tar is an archiving program designed to store and extract files from an archive file known as a tarfile. You can create a tar file or compressed tar file tar. However sometime you need to list the contents of a tar or tar.gz file on screen before extracting the all files.

Task: List the contents of a tar file

Use the following command:
$ tar -tvf file.tar

Task: List the contents of a tar.gz file

Use the following command:
$ tar -ztvf file.tar.gz

Task: List the contents of a tar.bz2 file

Use the following command:
$ tar -jtvf file.tar.bz2
Where,
  • t: List the contents of an archive
  • v: Verbosely list files processed (display detailed information)
  • z: Filter the archive through gzip so that we can open compressed (decompress) .gz tar file
  • j: Filter archive through bzip2, use to decompress .bz2 files.
  • f filename: Use archive file called filename
To know compression ratio use -l option :
 gzip -l all.gz

To see the .zip file contents
zcat myfiles.tar.gz | more


How To Gzip A Folder :

tar -cvzf filename.tar.gz folder

tar -cvjf filename.tar.bz2 folder  # even more compression

tar -zcvf compressFileName.tar.gz folderToCompress
zip -r file.zip folder_to_zip
unzip filenname.zip

Sunday, November 23, 2014

Configuring Huge Pages in Red Hat Enterprise Linux 4 or 5


source


Before configuring Big Pages, ensure to have read Section 14.3, “Sizing Big Pages and Huge Pages”.
In Red Hat Enterprise Linux 4 or 5 the size of the Huge Pages pool is specified by the desired number of Huge Pages. To calculate the number of Huge Pages you first need to know the Huge Page size. To obtain the size of Huge Pages, execute the following command:

$ grep Hugepagesize /proc/meminfo
Hugepagesize:     2048 kB
$

The output shows that the size of a Huge Page on this system is 2MB. This means if a 1GB Huge Pages pool should be allocated, then 512 Huge Pages need to be allocated. The number of Huge Pages can be configured and activated by setting nr_hugepages in the proc file system. For example, to allocate 512 Huge Pages, execute:
# echo 512 > /proc/sys/vm/nr_hugepages
Alternatively, you can use sysctl(8) to change it:
# sysctl -w vm.nr_hugepages=512
To make the change permanent, add the following line to the file /etc/sysctl.conf. This file is used during the boot process. The Huge Pages pool is usually guaranteed if requested at boot time:
# echo "vm.nr_hugepages=512" >> /etc/sysctl.conf
If you allocate a large number of Huge Pages, the execution of the above commands can take a while. To verify whether the kernel was able to allocate the requested number of Huge Pages, run:
$ grep HugePages_Total /proc/meminfo
HugePages_Total:   512
$
The output shows that 512 Huge Pages have been allocated. Since the size of Huge Pages is 2048 KB, a Huge Page pool of 1GB has been allocated and pinned in physical memory.
If HugePages_Total is lower than what was requested with nr_hugepages, then the system does either not have enough memory or there are not enough physically contiguous free pages. In the latter case the system needs to be rebooted which should give you a better chance of getting the memory.
To get the number of free Huge Pages on the system, execute:
$ grep HugePages_Free /proc/meminfo
Free system memory will automatically be decreased by the size of the Huge Pages pool allocation regardless whether the pool is being used by an application like Oracle DB or not:
$ grep MemFree /proc/meminfo
Note
In order that an Oracle database can use Huge Pages in Red Hat Enterprise Linux 4 or 5, you also need to increase the ulimit parameter "memlock" for the oracle user in /etc/security/limits.conf if "max locked memory" is not unlimited or too small, see ulimit -a or ulimit -l. An example can be seen below.
oracle           soft    memlock         1048576
oracle           hard    memlock         1048576
The memlock parameter specifies how much memory the oracle user can lock into its address space. Note that Huge Pages are locked in physical memory. The memlock setting is specified in KB and must match the memory size of the number of Huge Pages that Oracle should be able to allocate. So if the Oracle database should be able to use 512 Huge Pages, then memlock must be set to at least 512 * Hugepagesize, which on this system would be 1048576 KB (512*1024*2). If memlock is too small, then no single Huge Page will be allocated when the Oracle database starts. For more information on setting shell limits, see Chapter 22, Setting Shell Limits for Your Oracle User.
Log in as the oracle user again and verify the new memlock setting by executing ulimit -l before starting the database.
After an Oracle DB startup you can verify the usage of Huge Pages by checking whether the number of free Huge Pages has decreased:
$ grep HugePages_Free /proc/meminfo
To free the Huge Pages pool, you can execute:
# echo 0 > /proc/sys/vm/nr_hugepages
This command usually takes a while to finish.

Monday, November 10, 2014

Different options of 'vi' command



In a long commnd without vi :
Use 'Ctrl+A 'for jump starting of the line .
Use 'Ctrl+E'  for jumo ending of the line . 

Motion

hMove left
jMove down
kMove up
lMove right
wMove to next word
WMove to next blank delimited word
bMove to the beginning of the word
BMove to the beginning of blank delimted word
eMove to the end of the word
EMove to the end of Blank delimited word
(Move a sentence back
)Move a sentence forward
{Move a paragraph back
}Move a paragraph forward
0Move to the begining of the line
$Move to the end of the line
1GMove to the first line of the file
GMove to the last line of the file
nGMove to nth line of the file
:nMove to nth line of the file
fcMove forward to c
FcMove back to c
HMove to top of screen
MMove to middle of screen
LMove to botton of screen
%Move to associated ( ), { }, [ ]




Deleting Text


Almost all deletion commands are performed by typing d followed by a motion. For example, dw deletes a word. A few other deletes are:
xDelete character to the right of cursor
XDelete character to the left of cursor
DDelete to the end of the line
ddDelete current line
:dDelete current line


Yanking Text


Like deletion, almost all yank commands are performed by typing y followed by a motion. For example, y$ yanks to the end of the line. Two other yank commands are:
yyYank the current line
:yYank the current line




Changing text


The change command is a deletion command that leaves the editor in insert mode. It is performed by typing c followed by a motion. For wxample cw changes a word. A few other change commands are:
CChange to the end of the line
ccChange the whole line




Putting text


pPut after the position or after the line
PPut before the poition or before the line




Buffers


Named buffers may be specified before any deletion, change, yank or put command. The general prefix has the form "c where c is any lowercase character. for example, "adw deletes a word into buffer a. It may thereafter be put back into text with an appropriate "ap.




Markers


Named markers may be set on any line in a file. Any lower case letter may be a marker name. Markers may also be used as limits for ranges.
mcSet marker c on this line
`cGo to beginning of marker c line.
'cGo to first non-blank character of marker c line.

Sunday, November 2, 2014

An Example of Parallel Processing

This post shows how to use parallel processing to get a CPU intensive job done faster in Unix/Linux. By splitting a large task into several parts, it is quite easy to give each part to a separate CPU, and complete the task many times faster than it would on a single processor.
These days, even small PCs and other devices often come equipped with several CPU cores. But some tasks will use only one core, sometimes using 100% of it, while other cores stand by idle. Sometimes this is a waste of resources.

Look at those CPUs

I am writing this on a Linux laptop containing 8 CPU cores. Actually it is a quad core Haswell system with 2 hardware threads per core. But Linux thinks it has 8 entirely separate processors, look:
bash-4.2$ grep proc /proc/cpuinfo
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
You would get similar output on any multi-processor system. For example an x86 Xeon system, a Sun Sparc “T” box or a virtual machine. However the “CPUs” are provisioned, the OS just sees them as regular CPUs.

Give the System Some Work to do

To get those processors working, give them some hard work to do. Compressing data is CPU intensive.
For example, compress (gzip) a large text file called big.txt (3.2 Gb).
bash-4.2$ ls -lh big.txt
-rw-rw-r--. 1 james james 3.2G Feb  5 21:46 big.txt
bash-4.2$ time gzip big.txt

real    0m49.897s
user    0m48.857s
sys     0m1.016s
It took about 50 seconds to compress big.txt. Repeating the test gave similar results: 50.7 s, 50.6 s and 51.2 s.

Gzip Uses Only One CPU

Running ‘top‘ in another window during the above test, it can be seen that CPU usage is about 12.5%Top shows CPU stats across all processors, so 12.5% is one eighth of the 8 CPUs in the system, or one whole CPU. The gzip task completely occupied a single core, taking 100% of its resources. Meanwhile the other 7 processors did pretty much nothing – shown by the 87.4% idle time.
top - 21:53:09 up 37 min,  2 users,  load average: 0.24, 0.20, 0.15
Tasks: 183 total,   2 running, 181 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.3 us,  0.2 sy,  0.0 ni, 87.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8096132 total,  5764756 used,  2331376 free,    45284 buffers
KiB Swap:  8388604 total,        0 used,  8388604 free,  4942988 cached
Also, running ps -eLF during the gzip shows it running on a single processor (processor number 1), as you would expect. And there is only 1 thread.
bash-4.2$ ps -eLF  | egrep 'UID|zip' | grep -v grep
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY          TIME CMD
james     2376  1750  2376 90    1  1143   504   1 22:05 pts/1    00:00:04 gzip big.txt

Example 1 – Run Small Jobs in Parallel

If you have many files to compress, and many CPUs, it is quicker to zip them in parallel than one at a time. Putting jobs into the background with an ampersand (&) will make them run at the same time. Doing a “wait” command will make the shell wait until the last one completes. I had a directory containing hundreds of files, each 3 Gb in size, that needed to be compressed. It was easier to write a script than do it by hand:
PARALLEL=4

ls *.dbf |
while read file
do
   jobcount=$(( $jobcount+1 ))
   print "$jobcount gzip $file &"
   gzip $file &
   if [[ $jobcount -eq $PARALLEL ]] then
      print "waiting for $jobcount jobs to complete"
      wait
      print "$jobcount jobs completed"
      jobcount=0
   fi
done

print "waiting for remaining jobs to complete"
wait
The above piece of script will steadily compress all *.dbf files in the current directory, using up to 4 cpus at once, running up to 4 “gzip” commands at the same time.
The PARALLEL variable (4) is the maximum number of jobs that will run at once. As the system contained 10 CPUs, and was not busy, it was reasonable to commandeer 4 of them. Note the wait command in the inner block will be executed when there are 4 or more files yet to be compressed. The outer loop wait will execute when less than 4 remain. I actually have the above code in a cron job, housekeeping that folder.

Example 2 – Split a Big Task into Smaller Tasks

Try the same test again, but this time split up the big file in to 8 separate parts.
First, take a check sum of the file with good old cksum
bash-4.2$ cksum big.txt
1754962233 3345539400 big.txt
Now use the split command with “-n” to chop the file up:
-rw-rw-r--. 1 james james 3.2G Feb  5 21:46 big.txt
bash-4.2$ split -n 8 big.txt
bash-4.2$ ls -lh
total 6.3G
-rw-rw-r--. 1 james james 3.2G Feb  5 21:46 big.txt
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xaa
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xab
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xac
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xad
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xae
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xaf
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xag
-rw-rw-r--. 1 james james 399M Feb  5 22:22 xah
…giving 8 file pieces of 399 Mb each, called xaa, xab, …, xah.
(split on some older versions of Solaris and Linux doesn’t have the -n switch, but you can use -b instead to get pieces of a given size)

Run the Compress Again, Parallelized

This time, 8 gzip tasks will be run in parallel. This takes a bit of thinking about. In order to run the jobs in parallel (ie. at the same time), they need to be launched in the background of the shell. But the time command only works on jobs running in the foreground.
One solution is to use the wait command, which simply waits for all background jobs to finish. Then the commands needed would be:
gzip xaa &
gzip xab &
gzip xac &
gzip xad &
gzip xae &
gzip xaf &
gzip xag &
gzip xah &
wait
I could put that in a script and invoke it with time, but it is quicker to just use awk.
bash-4.2$ time ls x* | awk '{print "gzip "$1" &"} END {print "wait"}' | sh -x
+ gzip xaa
+ gzip xab
+ gzip xac
+ gzip xag
+ wait
+ gzip xah
+ gzip xae
+ gzip xaf
+ gzip xad

real    0m10.048s
user    1m18.151s
sys     0m1.564s
bash-4.2$ ls
big.txt  xaa.gz  xab.gz  xac.gz  xad.gz  xae.gz  xaf.gz  xag.gz  xah.gz
All of the data has been gzipped in about 10 seconds, one fifth of the time it took before. Notice that the “user” time was recorded as 1 minute 18 seconds, even though only 10 seconds of real time actually passed. This is just a quirk of the time command.
Looking at top during the parralized gzip operation, it is pretty obvious that all 8 CPUs were being hammered by 8 gzip commands:
top - 23:03:44 up  1:48,  2 users,  load average: 0.61, 0.29, 0.23
Tasks: 193 total,   8 running, 185 sleeping,   0 stopped,   0 zombie
%Cpu(s): 85.1 us,  1.5 sy,  0.0 ni, 11.6 id,  1.5 wa,  0.2 hi,  0.1 si,  0.0 st
KiB Mem:   8096132 total,  7938740 used,   157392 free,     3020 buffers
KiB Swap:  8388604 total,     6512 used,  8382092 free,  7096640 cached

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2733 james     20   0    4572    504    352 R  99.8  0.0   0:05.40 gzip
 2735 james     20   0    4572    504    352 R  99.8  0.0   0:05.39 gzip
 2736 james     20   0    4572    504    352 R  99.4  0.0   0:05.39 gzip
 2738 james     20   0    4572    500    352 R  99.1  0.0   0:05.39 gzip
 2734 james     20   0    4572    504    348 R  98.8  0.0   0:05.37 gzip
 2737 james     20   0    4572    504    352 R  97.1  0.0   0:05.33 gzip
 2731 james     20   0    4572    500    352 R  49.2  0.0   0:02.71 gzip
 2732 james     20   0    4572    504    348 D  46.2  0.0   0:02.60 gzip
and ps agrees, showing one gzip job in each processor:
bash-4.2$ ps -eLF  | egrep 'UID|zip' | grep -v grep
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY          TIME CMD
james     2764  2763  2764 99    1  1143   504   5 23:05 pts/1    00:00:09 gzip xaa
james     2765  2763  2765 99    1  1143   504   7 23:05 pts/1    00:00:09 gzip xab
james     2766  2763  2766 99    1  1143   504   2 23:05 pts/1    00:00:09 gzip xac
james     2767  2763  2767 99    1  1143   500   3 23:05 pts/1    00:00:09 gzip xad
james     2768  2763  2768 99    1  1143   500   1 23:05 pts/1    00:00:09 gzip xae
james     2769  2763  2769 99    1  1143   500   4 23:05 pts/1    00:00:09 gzip xaf
james     2770  2763  2770 99    1  1143   504   6 23:05 pts/1    00:00:09 gzip xag
james     2771  2763  2771 99    1  1143   504   0 23:05 pts/1    00:00:09 gzip xah

Glue it Back Together

Okay, our compressed data is still in 8 parts. Putting it back together is easy:
bash-4.2$ ls
big.txt  xaa.gz  xab.gz  xac.gz  xad.gz  xae.gz  xaf.gz  xag.gz  xah.gz
bash-4.2$
bash-4.2$ cat x* > all.gz
Now there is a file all.gz. Because of the way gzip works, this file won’t be identical to the original bigfile.txt.gz, but the decompressed data will be identical. To prove that , just gunzip it and do another checksum:
bash-4.2$ gunzip all.gz
bash-4.2$ cksum all big.txt
1754962233 3345539400 all
1754962233 3345539400 big.txt
Uncompressed, the all file gives the same checksum as big.txt, also matching the original checksum taken at the outset of the test (1754962233)

Conclusion

This is just a rough example in the shell to show the effect of multi-processing. Real applications do it differently: buy dividing processes into many threads and running threads across different CPUs. For example, Firefox will run perhaps 40 threads and spread them over all available CPUs.
Note that the gzip test is definitely CPU bound. Repeating the test gives a similar time. Disk/cache effects can be discounted because CPU is the bottleneck in this case. Even if the while file were cached in memory, it would not improve execution speed.
Obviously, this technique applies only where the main operation is “associative”. That is, where the processed data is identical to the concatenation of the processed pieces of data. This is part of the design of several compress tools and is part of the gzip RFC.