2008-12-29

Submitting jobs to a cluster in Sun Grid Engine

by Forrest Sheng Bao http://fsbao.net

I am only talking about SERIAL jobs! Parallel jobs may not work!

Texas Tech deployed a new cluster Grendel, the 289th fastest computer in the world, this October. Recently, I am doing a research using boosting algorithms with SVM. So, i need to do lotta cross-validation computing. I plan to train weak classifiers independently on different cores. So there are a dozen serial jobs. Grendel's job is managed by Sun Grid Engine, which will distribute jobs to nodes in the cluster. After reading Sun Grid Engine documentation and with the help of some friends, I figured out how to submit jobs.

The job receptor is called qsub. You need to write few lines like this (I prefer using a script, which will show the convenience later):

#!/bin/sh
#$ -V
#$ -cwd
#$ -S /bin/bash
#$ -N nwchem
#$ -o $JOB_NAME.o$JOB_ID
#$ -e $JOB_NAME.e$JOB_ID
#$ -q normal
#$ -pe fill 8
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 2048 -g 0.5 1eq.scale.txt 1>1eq.out 2>&1" &
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 128 -g 2 2eq.scale.txt 1>2eq.out 2>&1" &
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 32 -g 2 3eq.scale.txt 1>3eq.out 2>&1" &
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 32 -g 2 4eq.scale.txt 1>4eq.out 2>&1" &
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 128 -g 2 5eq.scale.txt 1>5eq.out 2>&1" &
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 2048 -g 0.5 6eq.scale.txt 1>6eq.out 2>&1" &
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 32 -g 2 7eq.scale.txt 1>7eq.out 2>&1" &
ssh $HOSTNAME "cd $PWD;/home/bao/libsvm/svm-train -v 14872 -c 32 -g 2 8eq.scale.txt 1>8eq.out 2>&1" &
run=8
while [ "$run" -ge 1 ]
do
sleep 60
run=`ps ux | grep $HOSTNAME | grep -v grep | wc -l`
done

The last 8 lines are jobs need to do. Use qsub your_script to submit jobs. Then you can see 8 processes running on that node:

top - 04:10:38 up 11 days, 12:39,  1 user,  load average: 8.13, 5.50, 2.47
Tasks: 256 total, 9 running, 247 sleeping, 0 stopped, 0 zombie
Cpu(s): 99.8%us, 0.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16431924k total, 2307148k used, 14124776k free, 219332k buffers
Swap: 1020116k total, 0k used, 1020116k free, 971428k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11210 bao 25 0 120m 107m 1044 R 100 0.7 0:37.68 svm-train
11218 bao 25 0 120m 108m 1044 R 100 0.7 0:37.68 svm-train
11245 bao 25 0 120m 108m 1044 R 100 0.7 0:37.46 svm-train
11255 bao 25 0 122m 110m 1044 R 100 0.7 0:37.52 svm-train
11221 bao 25 0 121m 108m 1044 R 100 0.7 0:37.68 svm-train
11237 bao 25 0 120m 108m 1044 R 100 0.7 0:37.60 svm-train
11240 bao 25 0 120m 108m 1044 R 100 0.7 0:37.35 svm-train
11217 bao 25 0 121m 109m 1044 R 98 0.7 0:37.17 svm-train
5603 sge 15 0 14608 2140 1544 S 2 0.0 0:10.77 sge_execd
11444 bao 15 0 12716 1176 792 R 0 0.0 0:00.15 top
1 root 15 0 10312 628 532 S 0 0.0 0:01.91 init

There are some other necessary options, like -q ( to submit your job to which queue). You can refer Sun's document. http://wikis.sun.com/display/GridEngine/Grid+Engine

To see the status of your jobs, simply type qstat. If it is running, you will see something like this:

grendel:bao:/libsvm/10fold$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
508 0.56000 ten.1-8 bao r 12/29/2008 17:37:12 development@compute-7-3.local 8

You will see two files by default, like ten.1-8.o493 and ten.1-8.e493, which are standard output and error files. If you open the *.o* file, you can see things which are expected to be shown in the shell. Of course, clusters use something like NFS, so don't worry about files generated by your own program.

To kill a job, just type qdel followed by your job ID.

There is a GUI interface called qmon. You can monitor all jobs in a graphic window. Awesome! I feel like that I am a commander of an army. And if you see a job on pending status too long, you can click "Why?" button and check what the problem is. You can even submit jobs using this GUI and store all parameters into a file to use later.

2008-12-26

Dumping Cross-Validation Result out from libsvm

by Forrest Sheng Bao http://fsbao.net

Libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) is a great SVM implementation, made by a group in National Taiwan University, Taiwan.

But you can't see the cross-validation result, by default. Now I am gonna do a multi-source boosting and I am gonna compute the sensitivity and specificity. So I need the result of each weak classifier.

What to do? Well, libsvm is open sourced! You just need to simply modify the svm-train.c file. Following discussion is based on libsvm v. 2.88, released on Oct. 30, 2008

If you could understand the diff tool, here is the result of running

diff svm-train.c svm-train.origin.c 
You can see my modifications are marked as "modified by Forrest"
43c43
<> void do_cross_validation();
70c70
<> do_cross_validation();
86c86
<> void do_cross_validation()
117,119d116
< //modified by Forrest < dumpfile=" fopen(filename," accuracy =" %g%%\n"> printf("Cross Validation Accuracy = %g%%\n",100.0*total_correct/prob.l);

The result is stored in the array target but the actual target is stored in the array prob.y. So I dump these two arrays as two columns into a file. The dumping file shares the same prefix with the input file (for cross-validation there is only one input file of svmlight format) but with a ".dump" extension. So, the do_cross_validationchar *filename this time. In line 70, you can feed in the filename by a global variable input_file_name which is obtained by the function parse_command_line. should have an input argument

That's why we should open source software for research - you can always make tools work in the way you want. Now I am gonna write another program to read the result from the dumping file and do classifier boosting.

2008-12-14

good products vs. consumers who do not know how to use them

by Forrest Sheng Bao http://fsbao.net

"Where does the pipe come from? " - Giselle in Enchanted, 2007

"UNIX is user-friendly. It's just picky who there friends are." - Anonymous

I bought an Apple Mighty Mouse few days ago. After bringing it home, I plugged it into my Linux desktop. But, soon, I found a problem. The right-click was recognized as left-click and the menu popped up. The "error rate" is around 30%. So it was very annoying. I got to click the right side of the mouse shell one or two more times if such case happened.

I was very angry. Why would Apple built such an "ambiguous" product to torture Mac users? I complained to my roommate; This mouse have design flaw. I thought there are two pressure sensor under the mouse shell, each one on each side. I tried to press the mouse toward the right-click direction. But the problem was still there.

In despair, I did some Google search and I was led to Apple's official page about the design to this mouse. http://www.apple.com/mightymouse/design.html Then I found this sentence: "Capacitive sensors under Mighty Mouse’s seamless top shell detect where your fingers are and predict your clicking intentions, so you don’t need two buttons — just two fingers."

Bingo! The trouble I had was because of my bad habit on using mice - I liked to put both index and middle fingers on the mouse buttons, which are replaced by one shell in Apple mouse. The Mighty mouse does not determine the clicking side by pressure but the location of my finger. Since I like putting two fingers on the mouse (it has no problem to traditional two-button mice), the Mighty mouse sensor cannot the side I am pressing. So if I only keep one finger on the shell, the problem could be solved. So I did a test. Aw! No problem any more.

So, I would suggest Apple put this sentence on their mouse manual: Only keep one finger on the mouse shell for left- or right-click for the mouse is touch-sensitive rather than pressure-sensitive.

Ok, let us dig down a bit. In this incident, is it because of the design defects of Apple Mighty Mouse? Is it due to me? Of course not. Apple stuffs are over-revolutionary that I don't know how to use them - shame on me, who is of computer science and electrical engineering dual-degree major. In industries, we have had many cases that a good idea cannot conquer the market because consumers are lack of the knowledge on how to use them. I think even Apple should look backward and ask themselves why they can't win Microsoft.

Let us take a look at our star, Linux. For quite a long time, geeks only care about how to make things done in an efficient or powerful way. But, it requires users to have some knowledge. For instance, geeks like to use the shell, the command line interface, or easier, the interface like the old DOS. Let's make the case more complex that geeks like to use pipe and output redirection on the shell. Wow! My grandma would ask me "where does the pipe come from?" In this case, Linux is facing a consumer that does not know how to use it.

So we cannot always *boast* how stable, fast, powerful, efficient, fancy, cool, sci-fi (and blah blah blah) Linux is. The Windows user would just ask you one question: "How to use it in five minutes?" A mouse-click on an English menu without beating around the bush would be much easier than a line of command on the shell, though shell is very powerful. I agree with current philosophy of Ubuntu Linux development team. Ubuntu is a "Linux for human beings". So we should make grandmas be able to use it without going to college for a bachelor degree in engineering. If a user does not know how to use your products, all your lofting ideas are useless.

2008-12-12

Ubuntu Linux installation CD being sold in BestBuy store

by Forrest Sheng Bao http://fsbao.net

I went to BestBuy store (BestBuy is the largest consumer electronics and computer retailer in US and Canada) today. I heard Ubuntu Linux installation CD with technical support subscription is sold in BestBuy. So I checked with the operating system shelf. Next by dozens of boxes of Windows XP/Vista, was the Ubuntu disks.

I couldn't find the version information. So I asked one sells man in the store. He said the version should be Hardy Heron, which is the code name of Ubuntu 8.04 LTS. I was surprised to hear that he knew Ubuntu and the version number. He even told me since the Ubuntu Linux is open sourced, you can upgrade it to the latest version for free from the Internet. Wow! Linux now at least make sells men know her.

I was so excited to see Ubuntu being sold in the top 1 computer retail store and being put together with Microsoft Windows. This is a great victory for us, the open source community, which includes many industry leaders, such as IBM, HP or Google. You know, you could not imagine this 10 years ago.

I really have to say that after Ubuntu was born, the position of Linux in desktop market has been changed a lot. Before that, the opem source community doesn't care much about common users. They just care making things work in their own geeky way. But Ubuntu even provides better experience than Mac and much more convenience features.

So, I bought the pack, though I know I can download the ISO and burn the disk for free, legally. It's so exciting. After coming back, I took some pictures for that box. Here it is:

The cover and the bar code:

The inner page:

The back side:

The technical support registration code

See the $19.99 item :