Sunday 4 March 2018

Faster python for data science and scientific computing



Scientific computing and HPC developers will probably be familiar with  Intel's C/C++ compiler suite, which can be used to compile your C, C++ and Fortran code instead of the free GCC compilers and can often result in significant performance improvements without changing a single line. Further improvements can be made by swapping out (generally fantastic) open source C maths libraries such as ATLAS or BLAS for equivalent functionality in Intels MKL (Math Kernal Language). Again - this is usually simply a matter of compiling your existing code against Intel's library and can result in very impressive speed gains for very little work.

What has this to do with Python? Most of Python's most famous data science and scientific computing libraries are written in C/C++, with a simple wrapper allowing them to be called easily from python. If you've ever wondered why Numpy, SciPy, scikit-learn and pandas are so much faster than trying to write the same code yourself in native Python, it's because all of the work in a function like np.multiply() is actually carried out in C "under the hood".

Previously, if you had a licence for Intel's  compiler suite you could compile these python libraries yourself and take advantage of Intel's speed boost in your python applications, but this required both familiarly with C code compilation, as well as an expensive licence. However Intel have now made available a free pre-compiled Python distribution with all the major packages (numpy, scipy, pandas etc.) based on the popular Anaconda distribution.  According to kdnuggets Intel have also re-written some common functions entirely for further optimization - in particular it looks like numpy and scipy's FFT (Fast Fourier Transform) functions have been enhanced significantly. Depending on your workload, using this distribution could boost the execution speed of these libraries by 10-50% without the need for any code change.

If you're interested in optimizing Python code that you wrote yourself and isn't available in any existing (C-implemented) library check out Cython as a way of implementing the most performance sensitive parts of your code in C. Unlike using the Intel distribution linked above, converting part of your code to use Cython can take some development work, however even when using the free GCC compilers you'll see a significant increase in speed over native python code.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

HTML form won't submit (Angular)

 It turns out if you mix normal HTML forms with Angular ones (i.e. using FormsModule) Angular will disable the default behaviour of forms on...