2010-04-03

Compiling Numpy and Scipy on CentOS Linux 5.4

by Forrest Sheng Bao http://fbao.net

Following instructions to compile Numpy and Scipy work on 64-bit CentOS 5.4 with precompiled ATLAS library. I am not sure whether it works for ATLAS provided by your Linux distribution as I compiled ATLAS from its source code by myself. My CPU type is (actually thousands) Intel Xeon. Some configurations exist is because my system administrator sets Intel C compiler and its libraries as default whereas I insist to use GNU stuffs. You may not need them or you may run into other troubles if you use Intel stuffs all the time.

If you don't know the relationships among Numpy, ATLAS and Scipy, here is a short version. Numpy defines a bunch of data structure for matrix and operations on matrix. Therefore, Numpy wraps ATLAS library (in C and Fortran) into Python. Scipy contains more functions for scientific computing based on Numpy. That's why you need ATLAS first, Numpy then, and Scipy finally.

The default Python version on CentOS 5.4 is 2.4 - which sucks. Red Hat Enterprise Linux 5.5 has been released but CentOS 5.5 is not ready yet.

If you do not know what PYTHONPATH environment variable is, please check this out http://docs.python.org/using/cmdline.html#envvar-PYTHONPATH

Table of Contents:
Part 1: Numpy
Part 2: Scipy
Part 3: Troubleshooting (e.g., undefined reference to `PyErr_Format')

Part 1: Compiling and installing NumPy


1. Download numpy source package from their official website.

2. Extract the numpy source package. My result is a folder called numpy-1.3.0. Enter this folder.

3. Edit site.cfg. If you don't have it, create a blank one. You may copy it from site.cfg.example. Make sure the [DEFAULT] section is configured as follows
[DEFAULT}
libraries = gfortran, gfortranbegin
library_dirs = /usr/local/lib:/usr/lib/gcc/x86_64-redhat-linux/4.1.2

4.
python setup.py build

You should see things like this

creating build/temp.linux-x86_64-2.4/numpy/linalg
compile options: '-DATLAS_INFO="\"3.9.23\"" -Inumpy/core/include -Ibuild/src.linux-x86_64-2.4/numpy/core/include/numpy -Inumpy/core/src -Inumpy/core/include -I/usr/include/python2.4 -c'
gcc: numpy/linalg/lapack_litemodule.c
gcc: numpy/linalg/python_xerbla.c
/usr/bin/gfortran -Wall -Wall -shared build/temp.linux-x86_64-2.4/numpy/linalg/lapack_litemodule.o build/temp.linux-x86_64-2.4/numpy/linalg/python_xerbla.o -L/home/bao/installtest/ATLAS/DONE/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -Lbuild/temp.linux-x86_64-2.4 -llapack -lptf77blas -lptcblas -latlas -lgfortran -lgfortranbegin -lgfortran -lgfortranbegin -lgfortran -o build/lib.linux-x86_64-2.4/numpy/linalg/lapack_lite.so
building 'numpy.random.mtrand' extension
compiling C sources
C compiler: gcc -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fPIC

As you can see, the ATLAS library compiled is used in numpy compilation.

5. Test (optional). You can test on any numpy functions
cd build/lib.linux-x86_64-2.4
start your Python interpreter and do like this.
>>> from numpy import *
>>> linalg.svd(array([[1,2],[3,4]]))
(array([[-0.40455358, -0.9145143 ],
       [-0.9145143 ,  0.40455358]]), array([ 5.4649857 ,  0.36596619]), array([[-0.57604844, -0.81741556],
       [ 0.81741556, -0.57604844]]))

6. Install numpy into a target diretory.
Go back to your numpy compilation directory if you've done step 5.

I do not like install anything to the system level directory which requires root privileges. I prefer to install into any directory and set proper environment variable to point - thanks to the great "everything is file" idea of UNIX. The target directory will contain two folders, bin and lib64. The directory you should add into your PYTHONPATH Linux environment variable is TARGET_DIRECTORY/lib64/python2.4/

Please use --prefix to specify your target directory. This is what I did:
python setup.py install --prefix=/home/bao/installtest

Then I set /home/bao/installtest/lib64/python2.4/site-packages as PYTHONPATH and export it in my ~/.bashrc

If you do not specify --prefix, the default installation location is /usr/local and you don't need to set the PYTHONPATH as it is default.

You can check it later by doing so in your Python interpreter:
>>> import sys
>>> sys.path
['', '/home/bao', '/home/bao/installtest/lib64/python2.4/site-packages', '/usr/lib64/python24.zip', '/usr/lib64/python2.4', '/usr/lib64/python2.4/plat-linux2', '/usr/lib64/python2.4/lib-tk', '/usr/lib64/python2.4/lib-dynload', '/usr/lib64/python2.4/site-packages', '/usr/lib64/python2.4/site-packages/Numeric', '/usr/lib64/python2.4/site-packages/gtk-2.0', '/usr/lib/python2.4/site-packages']



Part II: Compiling and installing Scipy


You need numpy and ATLAS to compile Scipy.

1. Download Scipy source package from their official website.

2. Extract the source package. The extracted folder on my computer is scipy-0.7.2. Enter this folder.

3. Create site.cfg file if it doesn't exist and make sure these three lines are in it.

[DEFAULT]
libraries = gfortran, gfortranbegin
library_dirs = /usr/local/lib:/usr/lib/gcc/x86_64-redhat-linux/4.1.2

4.
python setup.py build

5. Test (optional) Actually, there isn't much to test. So I skipped it.

6. Install scipy into a target directory.
This step is very similar to that in Numpy part. I did
python setup.py install --prefix=/home/bao/installtest

If you have already pointed PYTHONPATH in Numpy part, then you are already done. The default installation directory is /usr/local

Part 3: Troubleshooting

If you see error messages like
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:103: undefined reference to `PyType_IsSubtype'
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:114: undefined reference to `PyErr_Format'
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:109: undefined reference to `PyErr_Format'
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:104: undefined reference to `PyErr_Format'
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:109: undefined reference to `PyErr_Format'
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:104: undefined reference to `PyErr_Format'
build/temp.linux-x86_64-2.6/numpy/linalg/lapack_litemodule.o: In function `lapack_lite_dgeev':
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:165: undefined reference to `Py_BuildValue'
build/temp.linux-x86_64-2.6/numpy/linalg/lapack_litemodule.o: In function `initlapack_lite':
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:830: undefined reference to `PyErr_SetString'
/home/bao/apps/numpy-1.4.1/numpy/linalg/lapack_litemodule.c:833: undefined reference to `PyDict_SetItemString'
build/temp.linux-x86_64-2.6/numpy/linalg/python_xerbla.o: In function `xerbla_':
/home/bao/apps/numpy-1.4.1/numpy/linalg/python_xerbla.c:35: undefined reference to `PyExc_ValueError'
/home/bao/apps/numpy-1.4.1/numpy/linalg/python_xerbla.c:35: undefined reference to `PyErr_SetString'
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/libgfortranbegin.a(fmain.o): In function `main':
(.text+0xa): undefined reference to `MAIN__'
collect2: ld returned 1 exit status
error: Command "/usr/bin/gfortran -Wall build/temp.linux-x86_64-2.6/numpy/linalg/lapack_litemodule.o build/temp.linux-x86_64-2.6/numpy/linalg/python_xerbla.o -L/home/bao/apps/ATLAS/ATLAS_LINUX/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -Lbuild/temp.linux-x86_64-2.6 -llapack -lptf77blas -lptcblas -latlas -lgfortran -lgfortranbegin -lgfortran -lgfortranbegin -lgfortran -o build/lib.linux-x86_64-2.6/numpy/linalg/lapack_lite.so" failed with exit status 1


Try delete environment variable LDFLAGS and/or CFLAGS. I had this problem once and found the solution here.

Comments are welcomed. Please feel free to email and help me polish this doc. I didn't check my grammar and typo here. Will get back later.

2 comments:

Anonymous said...

tnx dude, this was very useful to me!

Cheers

doschman said...

Forrest and I exchanged a handful of e-mails about this.

If you're having issues trying to build RPMs or install python in to an INSTALLROOT (like I was for building an RPM or installing into a SYSTEMROOT for cross-compiling, etc.) It is true that whatever you pre-set with 'env' will trump what is defaultly set in Numpy's distutils/fcompiler/*.py files for the respective compiler.

My only solution when having to set extra library paths (e.g. -L/path/to/library/lib , etc) is to hack my respective compiler environment python file ( numpy/distutils/fcompiler/gnu.py ) and append my extra library path to the 'linker_so' list data structure.

...Then before building, go back and run 'python setup.py' and use the 'f' interactive option to list the compiler settings and you should see whatever you added to the 'linker_so' is there, then go ahead and exit and build.

If anyone knows how to set that during run-time without hacking the .py files, do tell.