##
A Paper on Fitting *March 27, 2008*

*Posted by gordonwatts in physics.*

trackback

trackback

An arXiv paper went by the other day that I found interesting for several reasons: “A pitfall in the use of extended likelihood for fitting fractions of pure samples in mixed samples”.

The paper discusses a technique of fitting templates of various sources to data in order to extract constituent fractions. For example, lets say you have a series of particle jets. The jets come from light quarks (up, down, strange), charm quarks, or bottom quarks. You want to know the fraction of each of those types that makes up your series of jets. You are clever – you have a variable that behaves differently for each of the quark types. So, now you can make a template for each jet type in this variable, add them together, vary the fractional size of each template until it fits the variable in your data – and “boom” you have the fractional makeup of your series of jets.

There are a number of tools around that will help you do this so you don’t have to write them yourself. In particular, in ROOT, there is TFractionFitter. This paper calls out an error in TFractionFitter by name. This is the first reason I find this article interesting. This is appearing in a physics paper archive and it specifically targets a tool. It isn’t about physics in particular, but the improper application of a statistical technique. I’m fairly sure (though not 100% positive) that no one has written a paper about TFractionFitter’s operation. Sometimes you’ll see papers written to point out errors in other papers, and then rebuttals. But this is a paper to point out an error in C++ code – ROOT. And, ha, it doesn’t even mention a version number for ROOT – which may be a problem if the error is corrected!

The error is subtle. TFractionFitter will get the fit correct. So the raw numbers that come out will be correct. The probably is when it tries to calculate the errors. Apparently the method it uses to calculate the errors are not valid unless one evaluates things at the central value — which ruins the point because the whole idea of an error is to understand how much things change as you move away from the central value.

Which brings me to the second reason I am interested in this article. A group I am part of in ATLAS is using TFractionFitter! Ops!

use the extended ML fitter in RooFit

http://roofit.sourceforge.net/docs/tutorial/intro/show/slide33.html

I don’t think that has the bug.

Interesting — I didn’t realize RooFit did those fits as well. It does the extended likelihood too — so it could have the same bug — but you are right, in general RooFit does a better job of this sort of thing (since it is focused on doing this sort of thing).

Just a clarification about the main point of the paper.

The problem lies in the transformation from event numbers to event fractions. In the extended likelihood approach, the most natural variables are the numbers of events of the different species. TFractionFitter, instead, chooses to formulate the problem in terms of fractions, using this (subtly wrong) definition:

estimated number of events of species i = Measured total number of events X estimated fraction of events of species i

A consequence of this is that, if you use the results of TFractionFitter to compute numbers of events, by multiplying the fitted fractions and their errors by the measured total number of events, you get the right results! I know all this is very confusing, but this is exactly what motivated me to publish my notes.

I read the paper with great interest, Jocelyn Monroe and I were puzzling over the errors returned by TFractionFitter last December. Looking at the original Barlow and Beeston paper we found that the approximation breaks down in the case of weighted events. If you want to have some fun, change the initial normalization of the templates that you pass to TFF. You’ll see that not only do the errors change, but the central values returned from the fit move around in a scary way! Long story short — buyer beware!

The good thing of TFractionFitter is that it deals with the statistical fluctuations of the templates, i.e. the case when they are produced by a Monte-Carlo. The templates can move, within their errors, to improve the fit. So I am not surprised (and I don’t think it is wrong, if that is what you mean) that the central values depend on the normalization of the templates

[…] post the other day about TFractionFitter being written up in a arXiv paper generated a few comments – […]

nello – Actually, TFF only handles MC statistics if it is given extra information, isn’t that right? We often fill histograms with weighted events and so one can’t always tell what the statistics really are by looking at the weights. At any rate, I thought I saw in its interface something that made use of the MC statistics.

I’m trying to teach myself roofit right now.

gordonwatts – I am not really an expert about TFractionFitter. I only got interested in this problem, because a student using HMCMLL was having a hard time in understanding the errors and I tried to look into that. So take my answer for what it is worth.

As far as I understand from Barlow’s paper [Comp.Phys.Comm 77(1993)219] and from the root reference manual, they do not really handle weights in your acception. For proper statistical treatment the input should be event numbers. They do allow to provide some sort of weights in the following sense: if you know that the histograms generated by the Monte-Carlo are not correct, but they can be patched multiplying the average bin content by a constant factor which is only a function of the bin and of the event type, you can handle that as a factor in the “theoretical prediction” . That’s what they allow you to do and that is probably ok from the statistical point of view.

The trouble is that in real life weights are not function only of the variable that is being fitted, but also of other event variables, so each bin gets contributions from events with different weights. That means that weights cannot be attached to the theoretical prediction because they also contribute additional fluctuations. A rule of thumb is that, when you have weights, the statistical error is no longer the square root of the bin content, but rather the square root of the sum of the squares of the weights. This is not handled by TFF. Errors would very likely be wrong, unless the spread in weights in a given bin is small.

Hi Nello, Thanks — you know more about its internal workings than I do! Your analysis of the weight vs # of events problem is correct. I’ve seen it in other places as well in particle physics. It is especially a problem when on event has a huge weight and then the others have a very tiny one (in one bin). Thanks for following up!

Come to think of it, there is a possible fudge. TFF takes, for each MC sample, two informations per bin (event number and weight). One might take advantage of this to create fake data that reproduce both expected values and variance.

Say in a given bin of the MC histogram you have n events, the average weight is w and the sum of the squares of the weights is s. The theoretical prediction for that bin in the data is the sum over the MC samples of

n w p

(where p is a weight related to the fraction of the particular sample in the data). The variance of this prediction is, in reality,

s p^2

Instead TFF assumes that w is a fixed number and will think that the variance is

n w^2 p^2 .

So you want to invent fake values for event numbers (n’) and weights (w’ ) with the following properties

n’w’ = n w

n’ w’^2 = s

giving the solution w’=s/nw ; n’=((nw)^2)/s

which can be computed on the basis of the weighted histogram (nw) and of a second histogram weighted with the square of the weights (s) .

This might work properly as long as the variance is all you need to specify the distribution, in other words only if the numbers of events in each bin is large enough that Poisson can be approximated with Gauss.

that is clever. I’ll have to try that out on my little test sample.