Sunday, 4 March 2018

Faster python for data science and scientific computing



Scientific computing and HPC developers will probably be familiar with  Intel's C/C++ compiler suite, which can be used to compile your C, C++ and Fortran code instead of the free GCC compilers and can often result in significant performance improvements without changing a single line. Further improvements can be made by swapping out (generally fantastic) open source C maths libraries such as ATLAS or BLAS for equivalent functionality in Intels MKL (Math Kernal Language). Again - this is usually simply a matter of compiling your existing code against Intel's library and can result in very impressive speed gains for very little work.

What has this to do with Python? Most of Python's most famous data science and scientific computing libraries are written in C/C++, with a simple wrapper allowing them to be called easily from python. If you've ever wondered why Numpy, SciPy, scikit-learn and pandas are so much faster than trying to write the same code yourself in native Python, it's because all of the work in a function like np.multiply() is actually carried out in C "under the hood".

Previously, if you had a licence for Intel's  compiler suite you could compile these python libraries yourself and take advantage of Intel's speed boost in your python applications, but this required both familiarly with C code compilation, as well as an expensive licence. However Intel have now made available a free pre-compiled Python distribution with all the major packages (numpy, scipy, pandas etc.) based on the popular Anaconda distribution.  According to kdnuggets Intel have also re-written some common functions entirely for further optimization - in particular it looks like numpy and scipy's FFT (Fast Fourier Transform) functions have been enhanced significantly. Depending on your workload, using this distribution could boost the execution speed of these libraries by 10-50% without the need for any code change.

If you're interested in optimizing Python code that you wrote yourself and isn't available in any existing (C-implemented) library check out Cython as a way of implementing the most performance sensitive parts of your code in C. Unlike using the Intel distribution linked above, converting part of your code to use Cython can take some development work, however even when using the free GCC compilers you'll see a significant increase in speed over native python code.

17 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Online casino for everyone, come in and win now only we have the best online slots The best online slots we have.

    ReplyDelete
  3. I definitely enjoying every little bit of it. It is a great website and nice share. I want to thank you. Good job! You guys do a great blog, and have some great contents. Keep up the good work. space

    ReplyDelete
  4. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post I would like to read this
    python training in chennai

    ReplyDelete
  5. Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to read  about their market situation nowadays.
    Python Online training
    Python Course institute in Bangalore

    ReplyDelete
  6. Thank you because you have been willing to share information with us. we will always appreciate all you have done here because I know you are very concerned with our. gravity

    ReplyDelete
  7. Great information...I am happy to find this post Very Helpful for me, as it contains lot of information. We are the Best Mobile App Development | Mobile App Development Company in India | Website Development Company in Delhi | Website Designing Company in Gurgaon.

    ReplyDelete
  8. This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me.datascience with python training in bangalore







    ReplyDelete
  9. The primary reason behind utilizing information examination answers for the vitality and utilities segment is to empower utility suppliers to advance their productivity and limit misfortunes happening in power age and conveyance. Data Analytics Course in Bangalore

    ReplyDelete
  10. Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work. Admond Lee

    ReplyDelete
  11. Really a awesome blog for the freshers. Thanks for posting the information.
    Python Training in Delhi
    Python Course in Delhi

    ReplyDelete
  12. Shweta gaur is one of the famous makeup artist in all over India. We are providing the best makeup artist courses and more other courses in over branches in Delhi.
    Bridal Makeup

    ReplyDelete

AWS Keyspaces - Managed Cassandra review

AWS recently went live with Keyspaces, their managed version of Cassandra ( https://aws.amazon.com/keyspaces/ ). This service is primarily a...