Sunday, 4 March 2018

Faster python for data science and scientific computing



Scientific computing and HPC developers will probably be familiar with  Intel's C/C++ compiler suite, which can be used to compile your C, C++ and Fortran code instead of the free GCC compilers and can often result in significant performance improvements without changing a single line. Further improvements can be made by swapping out (generally fantastic) open source C maths libraries such as ATLAS or BLAS for equivalent functionality in Intels MKL (Math Kernal Language). Again - this is usually simply a matter of compiling your existing code against Intel's library and can result in very impressive speed gains for very little work.

What has this to do with Python? Most of Python's most famous data science and scientific computing libraries are written in C/C++, with a simple wrapper allowing them to be called easily from python. If you've ever wondered why Numpy, SciPy, scikit-learn and pandas are so much faster than trying to write the same code yourself in native Python, it's because all of the work in a function like np.multiply() is actually carried out in C "under the hood".

Previously, if you had a licence for Intel's  compiler suite you could compile these python libraries yourself and take advantage of Intel's speed boost in your python applications, but this required both familiarly with C code compilation, as well as an expensive licence. However Intel have now made available a free pre-compiled Python distribution with all the major packages (numpy, scipy, pandas etc.) based on the popular Anaconda distribution.  According to kdnuggets Intel have also re-written some common functions entirely for further optimization - in particular it looks like numpy and scipy's FFT (Fast Fourier Transform) functions have been enhanced significantly. Depending on your workload, using this distribution could boost the execution speed of these libraries by 10-50% without the need for any code change.

If you're interested in optimizing Python code that you wrote yourself and isn't available in any existing (C-implemented) library check out Cython as a way of implementing the most performance sensitive parts of your code in C. Unlike using the Intel distribution linked above, converting part of your code to use Cython can take some development work, however even when using the free GCC compilers you'll see a significant increase in speed over native python code.

8 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Online casino for everyone, come in and win now only we have the best online slots The best online slots we have.

    ReplyDelete
  3. I definitely enjoying every little bit of it. It is a great website and nice share. I want to thank you. Good job! You guys do a great blog, and have some great contents. Keep up the good work. space

    ReplyDelete
  4. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post I would like to read this
    python training in chennai

    ReplyDelete
  5. Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to read  about their market situation nowadays.
    Python Online training
    Python Course institute in Bangalore

    ReplyDelete
  6. Thank you because you have been willing to share information with us. we will always appreciate all you have done here because I know you are very concerned with our. gravity

    ReplyDelete

Sending an SMS via Intent in Android

Alchemy allows users to donate to their chose charities via text message, which it used to do automatically using Android's SmsManager ...