jump to navigation

Reinventing the wheel September 10, 2011

Posted by gordonwatts in Analysis, computers, LINQToTTree, ROOT.
add a comment

Last October (2010) my term came to and end running the ATLAS flavor-tagging group. It was time to get back to being a plot-making member of ATLAS. I don’t know how most people feel when they run a large group like this, but I start to feel separated from actually doing physics. You know a lot more about the physics, and your input affects a lot of people, but you are actually doing very little yourself.

But I had a problem. By the time I stepped down in order to even show a plot in ATLAS you had to apply multiple corrections: the z distribution of the vertex was incorrect, the transverse momentum spectrum of the jets in the Monte Carlo didn’t match, etc. Each of these corrections had to first be derived, and then applied before someone would believe your plot.

To make your one really great plot then, lets look at what you have to do:

  1. Run over the data to get the distributions of each thing you will be reweighting (jet pT, vertex z position, etc.).
  2. Run over the Monte Carlo samples to get the same thing
  3. Calculate the reweighting factors
  4. Apply the reweighting factors
  5. Make the plot you’d like to make.

If you are lucky then the various items you need to reweight are not correlated – so you can just run the one job on the Data and the one job on the Monte Carlo in steps one and two. Otherwise you’ll have to run multiple times. These jobs are either batch jobs that run on the GRID, or a local ROOT job you run on PROOF or something similar. The results of these jobs are typically small ROOT files.

In step three you have to author a small script that will extract the results from the two jobs in steps 1 and 2, and create the reweighting function. This is often no more difficult that dividing one histogram by another. One can do this at the start of the plotting job (the job you create for steps 4 and 5) or do ti at the command line and save the result in another ROOT file that serves as one of the inputs to the next step.

Steps 4 and 5 can normally be combined into one job. Take the results of step 3 and apply it as a weight to each event, and then plot whatever your variable of interest is, as a function of that weight. Save the result to another ROOT file and you are done!!

Whew!

I don’t know about you, but this looked scary to me. I had several big issues with this. First, the LHC has been running gang-busters. This means having to constantly re-run all these steps. I’d better not be doing it by hand, especially as things get more complex, because I’m going to forget a step, or accidentally reuse an old result. Next, I was going back to be teaching a pretty difficult course – which means I was going to be distracted. So whatever I did was going to have to be able to survive me not looking at it for a week and then coming back to it… and me still being able to understand what I did! Mostly, the way I normally approach something like the above was going to lead to a mess of scripts and programs, etc., all floating around.

It took me three tries to come up with something that seems to work. It has some difficulties, and isn’t perfect in a number of respects, but it feels a lot better than what I’ve had to do in the past. Next post I’ll talk about my two failed attempts (it will be a week, but I promise it will be there!). After that I’ll discuss my 2011 Christmas project which lead to what I’m using this year.

I’m curious – what do others do to solve this? Mess of scripts and programs? Some sort of work flow? Makefiles?? What?? What I’ve outlined above doesn’t seem scalable!

Source Code In ATLAS June 11, 2011

Posted by gordonwatts in ATLAS, computers.
3 comments

I got asked in a comment what, really, was the size in lines of the source code that ATLAS uses. I have an imperfect answer. About 7 million total. This excludes comments in the code and blank lines in the code.

The break down is a bit under 4 million lines of C++ and almost 1.5 million lines of python – the two major programming languages used by ATLAS. Additionally, in those same C++ source files there are another about million blank lines and almost a million lines of comments. Python contains similar fractions.

There are 7 lines of LISP. Which was probably an accidental check-in. Once the build runs the # of lines of source code balloons almost a factor of 10 – but that is all generated code (and HTML documentation, actually) – so shouldn’t count in the official numbers.

This is imperfect because these are just the files that are built for the reconstruction program. This is the main program that takes the raw detector signals and coverts them into high level objects (electrons, muons, jets, etc.). There is another large body of code – the physics analysis code. That is the code that takes those high level objects and coverts them into actual interesting measurements – like a cross section, or a top quark mass, or a limit on your favorite SUSY model. That is not always in a source code repository, and is almost impossible to get an accounting of – but I would guess that it was about another x10 or so in size, based on experience in previous experiments.

So, umm… wow. That is big. But it isn’t quite as big as I thought! I mentioned in the last post talking about source control that I was worried about the size of the source and checking it out. However, Linux is apparently about 13.5 million lines of code, and uses one of these modern source control systems. So, I guess these things are up to the job…

Can’t It Be Easy? June 8, 2011

Posted by gordonwatts in ROOT.
3 comments

Friday night. A truly spectacular day in Seattle. I had to take half of it off and was stuck out doors hanging out with Julia. Paula is on a plane to Finland. I’ve got a beer by my slide. A youtube video of a fire in a fireplace.  Hey. I’m up for anything.

So, lets tackle a ROOT problem.

ROOT is weird. It has made it very easy to do very simple things. For example, want to draw a previously made histogram? Just double click and you’re done. Want to see what the data in one of your TTree’s looks like? Just double click on the leaf and it pops up! But, the second you want to do something harder… well, it is much harder. I’d say it was as hard to do something advanced as it was to do something intermediate in ROOT.

Plotting is an example.

Stacking the Plots

I have four plots, and I want to plot them on top of each other so I can compare them. If I do exactly what I learned how to do when I learned to plot one thing, I end up with the following:

image

Note all the lines on black, thin, and on top of each other. No legend. And that “stats” box in the upper right contains data relevant only to the first plot. The title strip is also only for the first plot. Grey background. Lousy font. It should probably have error bars but that is for a later time.

h1->Draw();
h2->Draw("SAME");
h3->Draw("SAME");
h4->Draw("SAME");

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

So, everyone has to make plots like this. This should be “easy” to make it look good! I suspect with a simple solution 90% of the folks who use ROOT would be very happy!

So, someone must have thought of this, right? Turns out… yes. It is called THStack. Its interface is dirt simple:

THStack *s = new THStack();
s->Add(h1);
s->Add(h2);
s->Add(h3);
s->Add(h4);
s->Draw("nostack");

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }and we end up with the following:

image

THStack actually took care of a lot of stuff behind our backs.It matched up the axes, it made sure the max and min of the plot were correct, removed the stats box, and killed off the title. So this is a big win for us! Thanks to the ROOT team. But we are not done. I don’t know about you, but I can’t tell what is what on there!

Color

There are two options for telling the plots apart: color the lines or make them different patterns (dots, dashes, etc.). I am, fortunately, not color blind, and tend to choose color as my primary differentiator. ROOT defines a number of nice colors for you in the EColor enumeration… but you can’t really use it out of the box. Charitably, I would say the colors were designed to look good on the printed page – some of them are a disaster on a CRT, LCD, or beamer.

First, under no circumstances, under no situation, never. EVER. use the color kYellow. It is almost like using White on a White background. Just never do it. If you want a yellowish color, use kOrange as the color. At least, it looks yellow to me.

Second, try to avoid the default kGreen color. It is a flourecent green. On a white or grey background it tends to bleed into the surrounding colors or backgrounds. Instead, use a dark green color.

Do not use both kPink and kRed on the same plot – they are too close together. kCyan suffers the same problem as kGreen, so don’t use it. kSpring (yes, that is the name) is another color that is too bright a green to be useful – stay away if you can.

After playing around a bit I settled on these colors for my automatic color assignment: kBlack, kBlue, TColor::GetColroDark(kGreen), kRed, kViolet, kOrange, kMagenta. The TColor class has some nice palettes (right there in the docs, even). But it one thing it doesn’t have that it really should is what the constituents of EColor look like. These are the things that you are most likely to use.

Colors are tricky things. The thickness of the line can make a big difference, for example. The default 1 pixel line width isn’t enough in my opinion to really show off these colors (more on fixing that below).

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

After applying the colors I end up with a plot that looks like the following:

image

A Legend and Title

So the plot is starting to look ok… at least, I can tell the difference between the various things. But darned if I can tell what each one is! We need a legend. Now, ROOT comes with the TLegend object. So, we could do all the work of cycling through the histograms and putting up the proper titles, etc. However, it turns out there is a very nice short-cut provided by the ROOT folks: TPad::BuildLegend. So, just using the code:

c1->BuildLegend();

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }where c1 is the pointer to the current TCanvas (the one most often used when you are running from the command line). See below for its effect. The automatic legend has some problems – mainly that it doesn’t automatically detect the best placement when drawing for a stack of histograms (left, right, up or down). One can think of a simple algorithm that would get this right most of the time. But that is for another day.

Next, I’d like to have a decent title up there, similar to what was there previously. This is also easy – we just pass it in when we create the stack of histograms.

THStack *s = new THStack("histstack", "WeightSV0");

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

And we now have something that is at least scientifically serviceable:

image

One thing to note here – there are no x-axis labels. If you add an x-axis label to your plot the THStack doesn’t copy it over. I’d call that a bug, I suppose.

Background And Lines And Fonts

We are getting close to what I think the plot should look like out of the box. The final bit is basically pretty-printing. Note the very ugly white-on-grey around the lines in the Legend box. Or the font (it is pixelated, even when the plot is blown up). Or (to me, at least) the lines are too thin, etc. This plot wouldn’t even make it past first-base if you tried to submit it to a journal.

ROOT has a fairly nice system for dealing with this. All plots and other graphing functions tend to take their queues from a TStyle object. This defines the background, etc. The default set in ROOT is what you get above. HOWEVER… it looks like that is about to change with the new version of ROOT.

Now, a TStyle is funny. A style is applied when you draw the histograms… but it is also applied when it is created. So to really get it right you have to have the proper style applied both when you create and when you draw the histogram. In short: I have an awful time with TStyle! I’m left with the choice of either setting everything in code when I do the drawing, or applying a TStyle everywhere. I’ve gone with the latter. Here is my rootlogon.C file, which contains the TStyle definition. But even this isn’t perfect. After a bunch of work I basically gave up, I’m afraid, and I ended up with this (note the #@*@ title box still has that funny background):

image

Conclusion

So, if you’ve made it this far I’m impressed. As you can tell, getting ROOT to draw nice plots is not trivial. This should work out of the box (using the “SAME” option that I used in the first line we should get behavior that looks a lot like this last plot).

Finally, a word on object ownership. ROOT is written in C++, which means it is very easy to delete an object that is being referenced by some other bit of the system. As a result, code has to carefully keep track of who owns what and when. For example, if I don’t write out the Canvas that I’ve generated right away, sometimes my canvases somehow come out blank. This is because something has deleted the objects from under me (it was my program obviously, but I have no idea what did it). Reference counting would have been the right away to go, but ROOT was started too long ago. Perhaps it is time for someone to start again? Winking smile

The code I used to make the above appears below. My actual code does more (for example, it will take the legend and automatically turn it into “lightJets”, “charmJets”, etc., instead of the full blown titles you see there. It is, obvously, not in C++, but the algorithm should be clear!

        public static ROOTNET.Interface.NTCanvas PlotStacked(this ROOTNET.Interface.NTH1F[] histos, string canvasName, string canvasTitle,
            bool logy = false,
            bool normalize = false,
            bool colorize = true)
        {
            if (histos == null || histos.Length == 0)
                return null;

            var hToPlot = histos;

            ///
            /// If we have to normalize first, we need to normalize first!
            /// 

            if (normalize)
            {
                hToPlot = (from h in hToPlot
                           let clone = h.Clone() as ROOTNET.Interface.NTH1F
                           select clone.Normalize()).ToArray();
            }

            ///
            /// Reset the colors on these guys
            /// 

            if (colorize)
            {
                var cloop = new ColorLoop();
                foreach (var h in hToPlot)
                {
                    h.LineColor = cloop.NextColor();
                }
            }

            ///
            /// Use the nice ROOT utility THStack to make the plot
            /// 

            var stack = new ROOTNET.NTHStack(canvasName + "StacK", canvasTitle);
            foreach (var h in hToPlot)
            {
                stack.Add(h);
            }

            ///
            /// Now do the plotting. Use the THStack to get all the axis stuff correct.
            /// If we are plotting a log plot, then make sure to set that first before
            /// calling it as it will use that information during its painting.
            /// 

            var result = new ROOTNET.NTCanvas(canvasName, canvasTitle);
            result.FillColor = ROOTNET.NTStyle.gStyle.FrameFillColor; // This is not a sticky setting!
            if (logy)
                result.Logy = 1;
            stack.Draw("nostack");

            ///
            /// And a legend!
            /// 

            result.BuildLegend();

            ///
            /// Return the canvas so it can be saved to the file (or whatever).
            /// 

            return result;
        }

        /// <summary>
        /// Normalize this histo and return it.
        /// </summary>
        /// <param name="histo"></param>
        /// <returns></returns>
        public static ROOTNET.Interface.NTH1F Normalize(this ROOTNET.Interface.NTH1F histo, double toArea = 1.0)
        {
            histo.Scale(toArea / histo.Integral());
            return histo;
        }

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

Yes, We may Have Made a Mistake. June 3, 2011

Posted by gordonwatts in ATLAS, computers.
9 comments

No, no. I’m not talking about this. A few months ago I wondered if, short of generating our own reality, ATLAS made a mistake. The discussion was over source control systems:

Subversion, Mercurial, and Git are all source code version control systems. When an experiment says we have 10 million lines of code – all that code is kept in one of these systems. The systems are fantastic – they can track exactly who made what modifications to any file under their control. It is how we keep anarchy from breaking out as >1000 people develop the source code that makes ATLAS (or any other large experiment) go.

Yes, another geeky post. Skip over it if you can’t stand this stuff.

ATLAS has switched some time ago from a system called cvs to svn. The two systems are very much a like: centralized, top-down control. Old school. However, the internet happened. And, more to the point, the Cathedral and the Bazaar happened. New source control systems have sprung up. In particular, Mercurial and git. These systems are distributed. Rather than asking for permission to make modifications to the software, you just point your source control client at the main source and hit copy. Then you can start making modifications to your hearts content. When you are done you let the owner of the repository know and tell them where your repository is – and they then copy your changes back! The key here is that you had your own copy of the repository – so you could make multiple modifications w/out asking the owner. Heck, you could even send your modifications to your friends for testing before asking the owner to copy them back.

That is why it is called distributed source control. Heck, you can even make modifications to the source at 30,000 feet (when no wifi is available).

When I wrote that first blog post I’d never tried anything but the old school source controls. I’ve not spent the last 5 months using Mercurial – one of the new style systems. And I’m sold. Frankly, I have no idea how you’d convert the 10 million+ lines of code in ATLAS to something like this, but if there is a sensible way to convert to git or mercurial then I’m completely in favor. Just about everything is easier with these tools… I’ve never done branch development in SVN, for example. But in Mercurial I use it all the time… because it just works. And I’m constantly flipping my development directory from one branch to another because it takes seconds – not minutes. And despite all of this I’ve only once had to deal with merge conflicts. If you look at SVN the wrong way it will give you merge conflicts.

All this said, I have no idea how git or Mercurial would scale. Clearly it isn’t reasonable to copy the repository for 10+ million lines of code onto your portable to develop one small package. But if we could figure that out, and if it integrated well into the ATLAS production builds, well, that would be fantastic.

If you are starting a small stand alone project and you can choose your source control system, I’d definitely recommend trying one of these two modern tools.

The Ethics and Public Relations Implications of asking for help April 25, 2011

Posted by gordonwatts in Large Collaborations, physics life.
24 comments

I’ve been having a debate with a few friends of mine. I have definite opinions. First, I’ll lay out the questions. The span ethics and also potential PR backlash. These conversations, btw, are all with friends – no one important, so don’t read anything into this! This is long, and my answers are even longer, but I hope a few of you will read and post (yes, everyone is busy)!

Lets take a purely hypothetical situation. A person has joined a large scientific collaboration like CDF, DZERO, ATLAS, or CMS. As part of joining they agree to abide by a set of rules. For example, not discussing an analysis publically before it has been approved by the experiment.

I apologize in advance to those who are not part of this life, or who don’t care. This blog posting will be even less interesting than normal!

Here are the questions. I’m curious about the answers from both an ethics point of view and a political point of view. Or any other point of view you care to bring to bear. I’ve put my answers below. The setup below is hypothetical! And I have some personal issues with #7! #8 is the one I’ve gotten most push back on when talking with people.

  1. You are a member of said collaboration and you anonymously post all or part of an internal document to a blog.
  2. You are a member of said collaboration and you post non-anonymously to a blog.
  3. The blog owner(s) are unaffiliated with any experiment. Are they obligated to take it down?
  4. The blog owner is affiliated with the experiment (e.g. say someone posted an internal DZERO or ATLAS abstract to my blog). Are they obligated to take it down?
  5. Is it ok for the experiment to ask the blogger to reveal the posters information? For example, the wordpress blogging platform, which I use, keeps internally a record, visible to me, of the posters IP address, which might be able to identify the poster. Is the answer any different if the blog owner is a member of the same experiment? How about a member of a competing/different experiment?
  6. Does the blog owner have to respond with the information to the experiment?
  7. What if the blog owner is a member of the same experiment? Do they have to respond then?
  8. Does the experiment have to ask the blog owner for help?

Ok. So, here are my answers. These aren’t completely thought out, so feel free to call me out if I’m not being consistent. And these are my opinions below, no matter how strongly I state them.

  1. This is clearly unethical. You are violating something that you agreed to in the first place, voluntarily. Further, by doing this anonymously you are basically trying to get away without being accountable – so you are taking no responsibility for your actions – which is also unethical. The PR result depends, obviously, on what is posted. If the topic is interesting enough to the mainstream, articles will end up on the mainstream news sites. If this damages the credibility of an actual result when it is released then real harm has been done. It is not likely that it will damage the credibility within the field, however.
  2. For me this is more murky. You clearly have violated the agreement that you signed initially. But you have also made it clear who you were when you posted it – so you are taking responsibility and accepting the consequences for your actions. The first half you are not behaving ethically, but the second half you are. It seems the PR consequences are similar, except they will be much more personal because the press will be able to get in touch with you. A large faceless experiment, like DZERO or ATLAS, will have a much harder time countering this (people make better stories!).
  3. Ethically, I don’t think you are obligated to take it down if you are not affiliated with any experiment. That was someone else’s agreement, and not one that you signed up for. I follow the thinking of various places that deal with whistleblowers. Now, the blog owner may have their own set of ethical guidelines for the blog, for example, “I will not traffic in rumors,” and then ethically they should not make an exception for a particular post. But that is strictly up to them – they could just as easily say that “this blog traffics in rumors!” The PR aspect of this really depends, if the blog is up front about what it is, then the PR won’t reflect on it as much as it will reflect on the rumor. If the blog does something that violates its own guidelines – like normally it ignores rumors except in this particular one because it is a big one – then part of the PR will be focused back on them. This is a wash, in my opinion.
  4. If the blog was owned by a member of the same experiment then I do think they would be obligated to take it down. The blog owner, upon joining the experiment, agreed not to reveal secrets, and the blog is an extension of the person who made the agreement. From a PR perspective, this would put the blog owner in a fairly difficult position! First, most of us small-time blogs allow comments w/out waiting for approval, so it could be up for several hours before it gets taken down. Any of the RSS comment aggregators would easily have time to grab it before it disappeared. So, it would be out there for anyone with a bit of skill even if it had already been taken down. So the PR would, basically, be the same as the other case. But, if any press came to call the blog owner they would have to say “No Comment.” Ha!
  5. So, it is fine for the experiment to ask the blog owner for any identifiable information about the poster. They are not violating any of their ethics. The PR response, however, can vary dramatically. After the experiment asks, the blogger could respond “Yes” or “No”. And then everyone moves on. But the blogger could also post a copy of the request and say something like “This 3000 person scientific organization is putting pressure on my to reveal my sources. This is a clear suppression of free speech, etc. etc.” What happens next is anybody’s guess and really depends on the blogger’s reputation, their popularity, who picks it up and runs with it, etc. So, anything from forgotten to a PR nightmare for the experiment. For a blogger that wants to prove that they will keep their rumor sources confidential – and thus get more rumors, this could be a big plus. Add this to the likelihood that there is no identifiable information, this makes me conclude it isn’t worth it. Now, if the blogger is a member of the experiment, or the blogger is well known to individuals on the experiment, a small conversation can happen over the phone or in person to see if the blogger might be willing to help out.
  6. First, if the blogger is not a member of the experiment. In this case, I do not think there is any ethical reason for the blogger to respond. By the same token, I do not think the experiment can get bent-out-of-shape if the blogger declines to help. I don’t think there is any real PR aspect to this question (other than what was above). Something to keep in mind: depending on the severity of the leak, you may be ending or seriously affecting someone’s career (judge/jury/etc.) by giving up that technical information – which could be spoofed.
  7. Now, if the blogger was on the same experiment, then things get more tricky. Ethically, you agreed to keep your experiment’s secrets, but you didn’t agree to tattle tail on a fellow collaboration member. I feel like I’m on thin ice here, so any comments yes or no to this would be helpful – especially because I could see myself in this position! While that may be the case, the experiment could bring a huge amount of peer pressure to bear on the blog author if they are a member. This effect should not be underestimated.
  8. This may seem like an odd question. Think of it from this point of view. An internal document has just been leaked. You are one of 3000 people working hard on this experiment. Something that you’ve had no input into, and perhaps seriously disagree with, has been put out on the web. You are still bound by the agreement with the collaboration so you can’t counter why you think it is bad. You have to sand by, frustrated, as this document is discussed by everyone except the people it should be discussed by. Worse, what if this person who did the posting gets away with it!? There are no consequences to what they did? Worse, what if the collaboration changes the way it does internal reviews and physics in order to keep things more secret from even its own members to lessen the chances of another leak? Now the person doing the leak has seriously impacted your ability to work and nothing has happened to you. So, should the collaboration do all it can to track this leaker down? Whew. Yes. But what if tracking this person down causes more damage (like the free speech PR nightmare I mentioned above)? I have a lot of trouble answering this question. In isolation the answer to this is clearly yes. However, when the various possible outcomes are considered, it feels to me like it isn’t worth it.

One final thing. As far as I can see, it seems to me that no actual laws have been broken by any of the proposed actions. That is, you couldn’t sue in a court of law for any of the actions. There is no publically recognized contract, for example. Do people agree with that? Any key questions I missed that should be in the above list?

Scientific Integrity April 22, 2011

Posted by gordonwatts in physics, physics life, politics, press, science.
17 comments

… means not telling only half the result

… means not mis-crediting a result

… means an obligation to society to not falsify results

… means not making false claims to gain exposure

… means respecting your fellow scientist and their results

means not talking about things that aren’t public (or, say, that haven’t undergone an internal review)

… means playing by the rules you agreed to when you enter into a collaboration

It means being a scientist!

Integrity is more important that ever given how much the public eye is focused on us in particle physics.

Update: I should mention that this post was authored with Alison Lister.

Global Entry–Just Get It April 20, 2011

Posted by gordonwatts in travel.
add a comment

A month or two ago I was traveling back from Geneva with a friend of mine. Kaori and I were on a flight that was late – about an hour late. We landed at IAD and really had to race to make our connections (we had less than an hour). We raced to immigration and I got in line. Looking around – I couldn’t find her… looking over to the side, I saw her at some kiosk… in about a minute or so she was racing through to the baggage pick up. Me… I hung out in the line for about 5 minutes.

She was using the Global Entry program. Having signed up and used it for my most recent flight… I’m a fan. It is fairly cheap – $100 bucks for 5 years. You do have to give up finger prints and picture to the US government – as far as I know that is the first set of finger prints any government agency has on record for me – so that was a little weird. As an example, on my last flight into IAD the plane doors were opened at 4:10 pm. At 4:22 pm I was in the X-Ray line. This included more than 5 minutes of walking since our plane was waaaay down the terminal. You use a kiosk instead of a person in the immigration area. I’d say it took the same amount of time as dealing with an officer who decided not to ask any question and if there were no lines – about 90 seconds or so. Extra bonus: no filling out those @*#&@ blue custom forms (there is an abbreviated version on the kiosk). And, when you go through customs, there is a separate line that lets you cut to the front (at least, in IAD). You just hand them a bit of paper that the immigration kiosk printed out and you are done.

I could imagine there are a number of circumstances that don’t make this worth it. If you always travel with kids under 14 you can’t use this (well, the kids can’t use this), if you always check baggage the time saved will be a small fraction of your total time, and I think there are only about 20 airports that support it (these are where your international ports-of-entry). Oh, and if you like watching people while standing in lines to relax after that long flight being cooped up… then this isn’t for you either.

My flight into IAD earlier this week was over an hour late. I had less than an hour to connect. A student of mine and I were both on the plane and both were on the connecting flight to Seattle. Neither of us had bags checked. The Seattle flight was in D29 in IAD (which means a long walk). I did a brisk walk and made it before boarding started. He had to sprint some of the way and made it after everyone had already boarded – but he still made it. BTW – I was also able to skip to the front of the X-Ray line which can be killer in IAD because I’d been upgraded on that last leg. That probably saved me an additional 10 minutes or so on this trip.

So… I’d recommend getting this if you flight internationally with any frequency. It definitely made that part of my trip quicker and, thus, more enjoyable!

As a side note… WHY don’t they design the airport so that if you don’t have to pickup your luggage you don’t have to go thought security again?

Cherry Blossoms April 7, 2011

Posted by gordonwatts in life, University of Washington.
1 comment so far

IMG_1464

It happens once a year, of course: Cherry Blossom Season. You can find it all over – Japan is famous for it. But back at the University of Washington we have our own little grove of Yoshino Cherry trees on the Quad. For the two weeks or so the place becomes a bit of a tourist destination – it is packed with people. Some just sitting and reading, but most walking around and snapping pictures. I went a little crazy this year. If you love this stuff, you can find it all over the web. Here are links to some of the stuff I’ve taken:

  • Pictures from a cloudy day on flickr.
  • A large panorama view. This is probably the easiest one to get an understanding of what the square looks like.
  • A giant 451 photo 3D reconstruction (a photosynth). I’m really looking forward to the technology (recently previewed) where you can walk around with a video camera and that is enough to build one of these!
  • A desktop theme pack for Windows 7. If you like having your background image change every 30 minutes to a different view of cherry trees, well, this is for you!

Enough till next year!

Jumping the Gun April 4, 2011

Posted by gordonwatts in Uncategorized.
16 comments

The internet has come to physics. Well, I guess CERN invented the internet, but, when it comes to science, our field usually moves at a reasonable pace – not too fast, but not (I hope) too slow. That is changing, however, and I fear some of the reactions in the field.

The first I heard about this phenomena was some results presented by the PAMELA experiment. The results were very interesting – perhaps indicating dark matter. The scientists showed a plot at a conference to show where they were, but explicitly didn’t put the plot into any public web page or paper to indicate they weren’t done analyzing the results or understanding their systematic errors. A few days later a paper showed up on arXiv (which I cannot locate) using a picture taken during the conference while the plot was being shown. Of course, the obvious thing to do here is: not talk about results before they are ready. I and most other people in the field looked at that and thought that these guys were getting a crash course in how to release results. The rule is: you don’t show anything until you are ready. You keep it hidden. You don’t talk about it. You don’t even acknowledge the existence of an analysis unless you are actually releasing results you are ready for the world to get its hands on and play with it as it may.

I’m sure something like that has happened since, but I’ve not really noticed it. But a paper out on the archives on April 1 (yes) seems to have done it again. This is a paper on a Z’ set of models that might explain a number of the small discrepancies at the Tevatron. A number of the results they reference are released and endorsed by the collaborations. But there is one source that isn’t – it is a thesis: Measurement of WW+WZ Production Cross Section and Study of the Dijet Mass Spectrum in the l-nu + Jets Final State at CDF (really big download). So here are a group of theorists, basically, announcing a CDF result to the world. That makes a bit uncomfortable. What is worse, however, is how they reference it:

In particular, the CDF collaboration has very recently reported the observation of a 3.3 excess in their distribution of events with a leptonically decaying W+- and a pair of jets [12].

I’ve not seen any paper released by the CDF collaboration yet – so that above statement is definitely not true. I’ve heard rumors that the result will soon be released, but they are rumors. And I have no idea what the actual plot will look like once it has gone through the full CDF review process. And neither do the theorists.

Large experiments like CDF, D0, ATLAS, CMS, etc. all have strict rules on what you are allowed to show. If I’m working on a new result and it hasn’t been approved, I am not allowed to even show my work to others in my department except under a very constrained set of circumstances*. The point is to prevent this sort of paper from happening. But a thesis, which was the source here, is a different matter. All universities that I know of demand that a thesis be public (as they should). And frequently a thesis will show work that is in progress from the experiment’s point of view – so they are a great way to look and see what is going on inside the experiment. However, now with search engines one can do exactly the above with relative ease.

There are all sorts of potential for over-reaction here.

On the experiment’s side they may want to put restrictions on what can be written in a thesis. This would be punishing the student for someone else’s actions, which we can’t allow.

On the other hand, there has to be a code-of-standards that is followed by people writing papers based on experimental results. If you can’t find the plot on the experiment’s public results pages then you can’t claim that the collaboration backs it. People scouring the theses for results (as you can bet there will be more now) should get a better understanding of the quality level of those results: sometimes they are exactly the plots that will show up in a paper, other times they are an early version of the result.

Personally, I’d be quite happy if results found in theses would stimulate conversation and models – and those could be published or submitted to the archive – but then one would hold off making experimental comparisons until the results were public by the collaboration.

The internet is here – and this information is now available much more quickly than before. There is much less hiding-thru-obscurity than there has been in the past, so we all have to adjust. Smile

* Exceptions are made for things like job interviews, students presenting at national conventions, etc.

Update: CDF has released the paper

Digitize the world of books March 26, 2011

Posted by gordonwatts in Books, physics life.
4 comments

Those of you watching would have noticed that a judge threw a spanner in the plans of Google to digitize the world’s book collection:

The company’s plan to digitize every book ever published and make them widely available was derailed on Tuesday when a federal judge in New York rejected a sweeping $125 million legal settlement the company had worked out with groups representing authors and publishers.

I am a huge fan of the basic idea. Every book online and digital and accessible from your computer. I’m already almost living the life professionally: all the journal articles I use are online. The physics preprint archive, arivx.org, started this model and as a result has spawned new types of conversation – papers that are never submitted to journals. Pretty much the only time I walk over to the library is to look at some textbook up there. The idea of doing the same thing to all the books – well I’m a huge fan.

However, I do not like the idea of one company being the gateway to something like that. Most of the world’s knowledge is written down in one form or another – it should not be locked away behind some wall that is controlled by one company.

I’d rather see a model where we expect, in the long term, that all books and copyrighted materials will eventually enter the public domain. At that point they should be easily accessible online. When you think of the problem like this it seems like there is an obvious answer: the Library of Congress.

Copyrighted books are a tougher nut to crack. There publishers and authors presumably will still want to make money off this. And making out-of-print books available will offer some income (though not much – there is usually a reason those books are out of print). In this case the Google plan isn’t too bad – but having watched journals price gouge because they can, I’m very leery of seeing this happen again here. I’d rather see an independent entity setup that will act as a clearing house. Perhaps they aren’t consumer facing – rather they sell access and charge for books to various companies that then make the material available to us end users. This model is similar to what is done in the music business. I purchase (or rent) my music through Zune – I don’t deal directly with any of the record labels. The only problem is this model doesn’t have competition to keep prices down (i.e. nothing stops this one entity from price gouging).

Lastly, I think having all this data available will open a number of opportunities for things we can think of now. But I think that we need to make sure the data is also available in a raw form so that people can innovate.

Print books are dying. Some forms will take longer than others – I would expect the coffee table picture book to take longer before it converts to all digital than a paper-back novel. But I’m pretty confident that the switch is well underway now. What we do with all the print books is a crucial question. I do think we should be spending money on moving these books into the digital age. Not only are they the sum of our knowledge, but they are also a record of our society.

Follow

Get every new post delivered to your Inbox.

Join 42 other followers