jump to navigation

More TFractionFitter March 30, 2008

Posted by gordonwatts in physics, statistics.
trackback

A very technical post on the performance of TFractionFitter.

My post the other day about TFractionFitter being written up in a arXiv paper generated a few commentsone in particular that got me a little worried:

Looking at the original Barlow and Beeston paper we found that the approximation breaks down in the case of weighted events. If you want to have some fun, change the initial normalization of the templates that you pass to TFF.

As I mentioned, we used this fit method in some of our current work. For my Friday night entertainment I decided to see if I could test it out a bit.

Constructing a few test cases, and then running them 1000′s of times (ensemble tests) isn’t all that hard (well, see below). Recall TFractionFitter takes several Monte Carlo templates and tries to find components of them in a data histogram. I constructed the templates out of two Gaussians for this test. The data histogram I built was 30% of one of the two Gaussians, and 70% of the other.

In a simple test where I create the data histogram once, with 10,000 entries, and then re-create the template histograms over and over (about 150 times) and perform the fit each time, I found that the fitter got the fractions correct – it correctly identified the 30% and the 70%. Further, as is mentioned in the paper, if I looked at how much it got the fraction incorrect each time, it does indeed look like the errors are underestimated (the exact amount depends on how I arrange the two Gaussians).

To test the weighting as described by Mike, I altered each template’s normalization to be 50% of its initial value, but left the data histogram the same. I got the same result as before. I tried altering the number of entries in one or the other and still the same. So, thankfully, I can’t reproduce this (I was using 5.18 to do this testing).

One thing I discovered — my initial choice of Gaussians had the second Gaussian about 20% off the end of the histogram I was using. The result? TFractionFitter got the answer wrong consistently by about 5%. 5% isn’t that big a deal if you get your errors right — but it didn’t. It was almost always more than one sigma low or high (depending on which fraction you looked at) – a clear bias. The distributions we are using in our work look more like falling exponential – so I’ll have to test that out and see if that shows the same thing. This is another buyer-be-ware comment similar to Mike’s.

I used this weekend project as an excuse to put my money where my mouth is… sort of (meaning I spent time, not money!). I’ve long talked about, on this blog and other places, that if we are to really take advantage of multi-core processors we are going to have to change the way we write code. So for this little project I thought was a perfect excuse to use a new programming language — a functional programming language. I used F#, which is based on ML (OCaml is the current standard implementation of ML). If anyone is interested, I can write some (also technical like this post) posts on what it was like. I used F# instead of OCaml, in part because it has access to all the ROOT libraries due to some other work I’ve done.

About these ads

Comments»

1. superweak - March 30, 2008

Well, now you’ve teased us, you have to write a post on F# :-)

2. asub - March 31, 2008

I don’t understand the existence of other languages when there is the great old FORTRAN. Certainly, for us in the condensed matter it has proved more than adequate. I never understood why particle physicists use C++ instead of FORTRAN.

3. Gordon Watts - March 31, 2008

True, true, superweak.

asub — it is becasue FORTRAN fell over under its own weight. In particular, common blocks! In the end, everything is compiled down to assembly language — so it is clear there is a way to do it in any langage – FORTRAN, C++, etc. But C++ makes it simpler for all of us to work together. And objects do have advantages. Most CM experiments are single PI or a small group – FORTRAN is pretty good in that setting. But when you get to 100′s of active programmers on one project FORTRAN isn’t nearly as good. You have to build a framework, and you find yourself building things like indirect arrays (arrays that point to entries in other arrays) – things that are baked into C++ from the get-go.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 44 other followers

%d bloggers like this: