Thursday, August 10, 2006

[Some Thinks about everything] Linux:: AWK

Imagine that you have an huge, huge file, organized with columns, like that:


Class Top_Searches Num_Search

1. Heathrow 34521
2. Meebo 34478
3. Buffett and Hezbollah 23442
4. London 21354
5. Lieberman 16532
6. Lebanon 15342
7. Icalendar 15234
8. Israel 12345
9. Video 9877
10. Terror 8532
11. Deceit Beyond 5679
12. Adnan Hajj 4568
13. Apple 3467
14. Terrorism 2345
15. Terror Plot 2123

If you want only to displays the first and third columns, it seems impossible... Actually, only for a windows user, because Linux or Unix
have a powerful tool to manipulate those file. One year ago, my girl friend had a huge huge file, a bit like that for
a study about obesity, she tried to opened it with excel... But excel freezed becquse the file was too big... So she asked me
for a solution: I did all she wanted only by using command line tools and espescially awk! So now some example to measure the
power of awk. The awk command is as you have already probably understood a power pattern matching language that allows you to modify input lines by manipulating the fields they contain.



$ awk '{print}' file.txt

This has the same result as $ cat file.txt, it displays all the content of the file


$ awk '\toto\' file.txt

This has the same result as $ grep Heathrow file.txt, it displays only line which content the word Heathrow.


$ awk '\Lieberman\ {print $5,$7,$12}' file.txt

It only displays columns 5, 7 and 12 of line containing "Lieberman"


$ awk '{if ($3 <>

It displays columns 3 and 7 if the key 3 is less than 2000.


$ awk -F":" '{ print $3 "\t" $1 }' /etc/passwd |sort -g

Display user by increasing userid. Notice that if you want to print a tabulation you've to use \t, idem if you want print a new line, use \n.


$ awk -F"\t" '{ print $4 "\t" $10 }' file.txt

You can specified the separator with the command -F"\t" (notice that tabulation and space are the default one)

And this is only the beginning with some file, where you use a end line to separate data and a some sign like @ to separate
a group of data,
you can use awk to retrieve some informations:

$ awk ' BEGIN { RS="^"; FS="\n" } /London/ ' file.txt

This kind of command permit you to process file like this one:


^London
Airport: Heathrow
Bus: First

^Paris
Airport: Charles de Gaulle
Bus: RATP

Incredible isn't it. Some other links to improve your "awk" skills:

A Guided Tour Of Awk

The GNU Awk User's Guide

Getting started with awk

How to Use AWK

UNIX Utilities - awk

Awk Tutorial

Awk and shell

DMOZ

Introduction to akw

Awk et bash

IBM developer's work

String manipulations

AWK: The Linux Administrators' Wisdom Kit

Introduction to (g)awk

Gawk Chapter 1




--
Posted by ServalX02 to Some Thinks about everything at 8/10/2006 09:23:00 PM

No comments: