grep/sed/gawk/awk/tr/cut

References:

Good utilities for extracting stuff from text

Filter non printable characters
tr -dc '[:print:]'
Concatenate lines with trailing \
sed ':a; /\\$/N; s/\\\n//; ta'
Substitute a space for an embedded newline using tr
tr '\n' ' '
To remove a control character from the input stream, enclose it in single quotes
| tr -d '\r'
Remove a newline from within a line using sed
sed 'N;s/\n//'
Remove a newline from within a line using awk
awk '{printf $0}'
Extract part of a line. ex between xx and yy exclusive
sed 's/.*xx\(.*\)yy.*/\1/'
Extract part of a line. ex between [ and ]
sed 's/^.*\[\(.*\)\].*$/\1/'

This boils down to a "substitute everything with the back reference" so 's/everything/\1/' or 's/the beginning (^) then some chars (.*) then the opening bracket (escaped \[) then the chars after the opening bracket which we want printed (must be placed in parenthesis) which must also be escaped \( .* \) then the closing bracket (escaped \]) then all after the closing bracket (.*) up to the end ($). The \1 means the first back reference (everything in the first set of parenthisis (there is only one set)
so here it is using spaces for clarity (which must be removed afterwards)
'   s   /   ^   .*   \[   \(   .*   \)   \]   .*   $   /   \1   /   '

Example: Extract the WAN IP address from a Belkin F7D7302 V1 Share 300 Wireless Router

#!/bin/sh -
# Get WAN IP from a Belkin F7D7302 Share N300 wireless router
# The target page is status.stm
# We need the line containing <space>wan_ip="
# From that line extract everything between the first quote and the space before the last quote
wanip=$(curl -s 192.168.1.1/status.stm | sed -n '/wan_ip="/p' | sed 's/.*"\(.*\) ".*/\1/')
echo $wanip
exit 1
Delete everything after the first blank line
sed '/^$/q'
Delete everything up to the first blank line
sed '1,/^$/d'
Delete leading angle bracket and space from each line
sed 's/^> //'
Add a leading angle bracket and space to each line
sed 's/^/> /'
Insert double quote marks around each line in a file
gawk "{ print \"\042\" \$0 \"\042\" }" file
Print lines containing pattern
sed -n '/pattern/p'
Print line lineno
sed -n 'lineno p'
Do both of the above
sed -ne 'lineno p' -e '/pattern/ p'
Print two lines lineno1 and lineno2
sed -ne 'lineno1 p' -e 'lineno2 p'
Print from one line to the next
sed -ne '/pattern/,+1p'
Print only the second line
sed -ne '/pattern/,+1p' | sed '$!d'
Print from one line to the end of the file
sed -n -e '/begin/,/^$/p'
Print a file with line numbers
awk '{ print FNR, $0 }'
Print columns 1,2,and 7
awk '{ print $1, $2, $7 }'
Print columns 2 and 3 from the line containing 'hal'
awk '/hal/{print$2,$3}'
Substitute an underscore for any non-alphanumeric character
a="Harry's hat"
echo $a | awk '{ gsub(/[^[:alnum:]]/, "_");print }'

A range is two addresses separated by a comma. /pattern1/ , /pattern2/

Can't use alternation with sed but you can with grep

Print lines containing either of two patterns
egrep pattern1 | pattern2
Ignore comment lines
grep -v "^#"
Ignore blank lines
sed /^$/d
Remove all newlines and carriage returns from a line of text
tr -d '\r\n'
Replace all newlines and carriage returns with a space
tr '\r\n' '   '

awk will not accept parameters until after it has opened the input file and that doesn't happen until after the BEGIN block has finished. If you want to use awk as a calculator with no input file you can build the command string in advance and then run it with eval:
a=79
b="awk 'BEGIN {print $a / 3}'"
c=$(eval $b)
echo $c --> 26.3333


Bash string extractions and other manipulations

if [[ "$string" =~ $substring ]]; then

String length

${#string}
or
expr length $string

Length of Matching Substring at Beginning of String

expr match "$string" '$substring'
$substring is a regular expression.

Index - Numerical position in $string of first character in $substring that matches.

expr index $string $substring

Substring Extraction - Extracts substring from $string at $position.

${string:position}
If the $string parameter is "*" or "@", then this extracts the positional parameters, [1] starting at $position

Extracts $length characters of substring from $string at $position.

${string:position:length}
Extracts $length characters of substring from $string at $position.


Other neat stuff

List all files in /etc that start with 'p' and end with 'd'
run-parts --list --regex '^p.*d$' /etc

Send mail to the Webmaster

logo This site best viewed with a browser
Warning: This is a Debian centric site and MAY contain peanuts.
Many thanks to Debra Lynn and Ian Murdock for making Debian possible
First created Apr 22, 2008 ~ Last revised April 10, 2011

Valid XHTML 1.0 Strict Valid CSS!