More TFractionFitter March 30, 2008Posted by gordonwatts in physics, statistics.
A very technical post on the performance of TFractionFitter.
Looking at the original Barlow and Beeston paper we found that the approximation breaks down in the case of weighted events. If you want to have some fun, change the initial normalization of the templates that you pass to TFF.
As I mentioned, we used this fit method in some of our current work. For my Friday night entertainment I decided to see if I could test it out a bit.
Constructing a few test cases, and then running them 1000′s of times (ensemble tests) isn’t all that hard (well, see below). Recall TFractionFitter takes several Monte Carlo templates and tries to find components of them in a data histogram. I constructed the templates out of two Gaussians for this test. The data histogram I built was 30% of one of the two Gaussians, and 70% of the other.
In a simple test where I create the data histogram once, with 10,000 entries, and then re-create the template histograms over and over (about 150 times) and perform the fit each time, I found that the fitter got the fractions correct – it correctly identified the 30% and the 70%. Further, as is mentioned in the paper, if I looked at how much it got the fraction incorrect each time, it does indeed look like the errors are underestimated (the exact amount depends on how I arrange the two Gaussians).
To test the weighting as described by Mike, I altered each template’s normalization to be 50% of its initial value, but left the data histogram the same. I got the same result as before. I tried altering the number of entries in one or the other and still the same. So, thankfully, I can’t reproduce this (I was using 5.18 to do this testing).
One thing I discovered — my initial choice of Gaussians had the second Gaussian about 20% off the end of the histogram I was using. The result? TFractionFitter got the answer wrong consistently by about 5%. 5% isn’t that big a deal if you get your errors right — but it didn’t. It was almost always more than one sigma low or high (depending on which fraction you looked at) – a clear bias. The distributions we are using in our work look more like falling exponential – so I’ll have to test that out and see if that shows the same thing. This is another buyer-be-ware comment similar to Mike’s.
I used this weekend project as an excuse to put my money where my mouth is… sort of (meaning I spent time, not money!). I’ve long talked about, on this blog and other places, that if we are to really take advantage of multi-core processors we are going to have to change the way we write code. So for this little project I thought was a perfect excuse to use a new programming language — a functional programming language. I used F#, which is based on ML (OCaml is the current standard implementation of ML). If anyone is interested, I can write some (also technical like this post) posts on what it was like. I used F# instead of OCaml, in part because it has access to all the ROOT libraries due to some other work I’ve done.