Thursday, 19 November 2015

Negative diagonal elements in the covariance matrix returned by numpy.polyfit

I ran into a strange issue fitting a line to a small number of data points using numpy.polyfit that I thought was worth documenting.

I ran a command of the form:

p, cov = np.polyfit(x,y,1,w,cov=True),

where x, y and w were arrays of length 3.

The command returned the correct slope and y-intercept values, however the covariance matrix, cov, had strictly negative diagonal terms. This is apparantly because numpy scales the covariance matrix as described in here.

The scaling applied is a factor such that

factor = resids / (len(x) - order - 2.0)

If, like me, you are making a first order polynomial fit to a dataset of 3 values, the denominator has the effect of multiplying the expected matrix by -1. If I was unlucky enough to have 4 points, it would have thrown bigger errors.

In my case, looking at the results here, I could recover the correct values just by multiplying the matrix by minus one. This is a strange weighting to apply to a small dataset - I assume it makes sense if you have many points and the developers wanted to keep numpy.polyfit consistent.

AWS Keyspaces - Managed Cassandra review

AWS recently went live with Keyspaces, their managed version of Cassandra ( ). This service is primarily a...