QMVIEWS: April 2011

Mittwoch, 13. April 2011

Shell 005: String processing

A directory contains the following


~/shell $ ls -l
00_01_boxpdb.py
00_01_realign.py
00_02_callPDB2PQR.py
00_03_getPointCharges.py
00_04_getCharge.py
01_01_connect.py

How can the leading terms be removed?


~/shell $ for i in 0*
> do
> mv $i ${i:6}
> done

The syntax is ${i:FROM:BY}. The indexing starts from 0. Also


# Is it possible to index from the right end of the string?
    
echo ${stringZ:-4}               # abcABC123ABCabc
# Defaults to full string, as in ${parameter:-default}.
# However . . .

echo ${stringZ:(-4)}             # Cabc 
echo ${stringZ: -4}              # Cabc
# Now, it works.
# Parentheses or added space "escape" the position parameter.

Given the case that a number of files are in a directory, file-test.py and at the same time there is a directory called file-test in the same directory. How can only the directories be listed? This is useful when the content of the directories needs to be processed somehow.


ls -l|grep ^d|awk '{print $8}'

Grep matches the beginning of the line output by 'ls' with 'd', which indicates a directory. " 'awk {print $8}' " then prints the 8th column of the output (so the directory name is available again).

Dienstag, 12. April 2011

Vim 005: Efficient formatting

The following example demonstrates how to format a run control file efficiently. The initial PYTHONPATH in the file is


 export PYTHONPATH=$HOME/shellscripts:$HOME/tree/lib64/python2.4/site-packages:$HOME/kemibin/2010/woche_03/wave/

and it is difficult to read. It would be nice to have all directories added to the path on separate lines. In Vim, this can be done by issueing:


:s/:\$/:\rexport PYTHONPATH=\$PYTHONPATH:\$/g

\r is the line feed character (apparently it works on Linux, even if its the Windows line feed character). The common Unix linefeed character is \n. \$ escapes the dollar sign so it can be found and replaced/inserted (it would be considered the end of the line if not escaped). Calling the command returns


export PYTHONPATH=$HOME/shellscripts:
export PYTHONPATH=$PYTHONPATH:$HOME/tree/lib64/python2.4/site-packages:
export PYTHONPATH=$PYTHONPATH:$HOME/kemibin/2010/woche_03/wave/

There remains a colon on all but the last lines. Replacing this would require the command to recognize if it is operating on a line which is not the last one of the selection.

Vim 004: Pasting into Vim

If a piece of text is copied into the clipboard and then pasted into Vim (while in insertion mode), most likely the text will be very ill formatted and kind of "drifting" to the right with increasing line number. Especially this is likely to happen when copying from a formatted document but is very unwanted e.g. when code should be pasted into Vim.
In Vim, use


:set paste

to enable pasting from the clipboard.

Vim 003: Deleting all Lines of given Kind

In a file, all lines containing a given identifier 'IDENT' can be deleted by calling (within Vim):


:d/IDENT/g

Sonntag, 10. April 2011

Shell 005: Redirect Part of a File

Suppose a file contains data starting from a given keyword (the content above the keyword is considered not relevant). If only the content from after the keyword is to be redirected to a new file (in a batch), this can be done using sed.


for i in file-*.log
do
    sed -n '0,/DONE/!p' $i > ${i/file/new}
done

"!" tells it to ignore everything from the start of the file "0" to the first line containing the keyword "/DONE/", then p makes it print the rest, with a shell redirection of the output to a new file.
Regards to David the H. at LinuxQuestions for pointing this out to me.

Freitag, 8. April 2011

Vim 002: Submitting Vim Command from the shell

A Vim command can be submitted to Vim directly from the shell (without entering Vim) by calling:

vi "+:%s/+0/+1/g" "+wq" $i

Notice the '+' operator within the command string. Also notice the 'wq' command to save and "exit" Vim once it finished its job. This is an excellent way of processing a batch of files.

Shell 003: Tail

All but the top 40 lines of a file can be catted to a new file by calling:


count=`cat file.dat|wc -l`
tail -n -$((count-40)) file.dat > newfile.dat

Notice the backtics to evaluate the cat and wc calls in the first line. Also notice the '-' sign in front of the numeric argument to the tail command.

Dienstag, 5. April 2011

Papers: Collection

Description of Chemoinformatics tools.

Montag, 4. April 2011

Shell 004: Arithmetic Extension

Sometimes it is useful to have an arithmetic extension available in the BaSH. In its most primitive form this could look something like


mzh @ ~/shell $ for i in {1,2,3}
> do
> tmp=$((i*i))
> echo $tmp
> done
1
4
9

I found it here.

PyMOL 003: Displaying Atom Coordinates

In PyMOL, the coordinates of atoms of a selection can be printed by first selecting the atoms of interest and then calling

PyMOL>iterate_state 1, sele, print name, x, y, z

where 'sele' is just the selection created when clicking on the atoms of interest.

Sonntag, 3. April 2011

GAMESS 002: Copying to a new job type

When a batch of optimizations finished, the final coordinates need to be transfered to for example a HESSIAN job. At the same time, the file should change the extension from .log to .inp and the jobtype identifier should change from opt to hes. On the command line, this can be done in a for-loop by issueing:

mzh @ ~/shell $ for i in a-{1,2,3}-opt.log
do
OPTLOG=$i
HESLOG=${OPTLOG/opt/hes}           # Rename jobtype
HESINP=${HESLOG/log/inp}           # Rename extension
grep -i -A 29 "equilibrium" $OPTLOG|tail -n 26 >> $HESINP
done

I dont know how to rename two parts of a file name in the same step, i.e. how to get from 'a-{1,2,3}-job-opt.log' to 'a-{1,2,3}-job-hes.inp' in one command. That's why I rename part of the file step by step. Then, after I have the filename ready in a variable, the important content is written to this file. The molecule has 26 atoms, but the identifier is 29 lines above the end of the coordinate block, that's why I tail it (I dont know how to identify an empty line which terminates the coordinate block-- there is no identifier for the ending of the coordinate block). The generic command would be to insert N+3 where N is the number of atoms in the file.IMPORTANT: remember the '>>' append operator when transfering to the input file.

Now a more tricky part, transfering the Hessian matrix from the GAMESS output .dat file to the SADPOINT job file. How can this be automated in a smart way?
This line of code is the solution:

mzh ~/shell $ awk '/ \$HESS/,/^ \$END/' hess.dat >> sad.inp

I dont know what the ',' does. Note the quotes.

There also appear to be other ways of doing this:


mzh ~/shell $ ruby -ne 'print if /\$HESS/../\$END/' file


mzh ~/shell $ f=0;
while read l;
    do [[ $l =~ \$HESS ]] && f=1;
        [[ f -eq 1 ]] && echo $l;
        [[ $l =~ \$END ]] && f=0;
    done < file

These other options are less readable to me, but seem to work nicely.

Freitag, 1. April 2011

Fails.

Fail-001: Compilation of OpenBabel fails.