Thursday, 19 November 2015

Negative diagonal elements in the covariance matrix returned by numpy.polyfit

I ran into a strange issue fitting a line to a small number of data points using numpy.polyfit that I thought was worth documenting.

I ran a command of the form:

p, cov = np.polyfit(x,y,1,w,cov=True),

where x, y and w were arrays of length 3.

The command returned the correct slope and y-intercept values, however the covariance matrix, cov, had strictly negative diagonal terms. This is apparantly because numpy scales the covariance matrix as described in here.

The scaling applied is a factor such that

factor = resids / (len(x) - order - 2.0)

If, like me, you are making a first order polynomial fit to a dataset of 3 values, the denominator has the effect of multiplying the expected matrix by -1. If I was unlucky enough to have 4 points, it would have thrown bigger errors.

In my case, looking at the results here, I could recover the correct values just by multiplying the matrix by minus one. This is a strange weighting to apply to a small dataset - I assume it makes sense if you have many points and the developers wanted to keep numpy.polyfit consistent.

2 comments:

  1. I have to say this. How f****** ridiculous is that. Not only there is no explanation of this in the scipy webpage, but from what you are writing, estimating errors is borderline miracle. Just write a function which will use chi^2 for fitting. At least we will have errors. What's the use of a value with no error???!!!

    ReplyDelete
  2. This is so fun! What a great idea. Also I love how authentic you seem to be.
    1337x

    ReplyDelete

AWS Keyspaces - Managed Cassandra review

AWS recently went live with Keyspaces, their managed version of Cassandra ( https://aws.amazon.com/keyspaces/ ). This service is primarily a...