jump to navigation

16,000 Physics Plots January 12, 2011

Posted by gordonwatts in ATLAS, CDF, CMS, computers, D0, DeepTalk, physics life, Pivot Physics Plots.

Google has 20% time. I have Christmas break. If you work at Google you are supposed to have 20% of your time to work on your own little side project rather than the work you are nominally supposed to be doing. Lots of little projects are started this way (I think GMail, for example, started this way).

Each Christmas break I tend to hack on some project that interests me – but is often not directly related to something that I’m working on. Usually by the end of the break the project is useful enough that I can start to get something out of it. I then steadily improve it over the next months as I figure out what I really wanted. Sometimes they never get used again after that initial hacking time (you know: fail often, and fail early). My deeptalk project came out of this, as did my ROOT.NET libraries. I’m not sure others have gotten a lot of use out of these projects, but I certainly have. The one I tackled this year has turned out to be a total disaster. Interesting, but still a disaster. This plot post is about the project I started a year ago.  This was a fun one. Check this out:


Each of those little rectangles represents a plot released last year by DZERO, CDF, ATLAS, or CMS (the Tevatron and LHC general purpose collider experiments) as a preliminary result. That huge spike is July – 3600 plots (click to enlarge the image) -  is everyone preparing for the ICHEP conference. In all the 4 experiments put out about 6000 preliminary plots last year.

I don’t know about you – but there is no way I can keep up with what the four experiments are doing – let alone the two I’m a member of! That is an awful lot of web pages to check – especially since the experiments, though modern, aren’t modern enough to be using something like an Atom/RSS feed! So my hack project was to write a massive web scraper and a Silverlight front-end to display it. The front-end is based on the Pivot project originally from MSR, which means you can really dig into the data.

For example, I can explode December by clicking on “December”:


and that brings up the two halves of December. Clicking in the same way on the second half of December I can see:


From that it looks like 4 notes were released – so we can organize things by notes that were released:


Note the two funny icons – those allow you to switch between a grid layout of the plots and a histogram layout. And after selecting that we see that it was actually 6 notes:



That left note is title “Z+Jets Inclusive Cross Section” – something I want to see more of, so I can select that to see all the plots at once for that note:


And say I want to look at one plot – I just click on it (or use my mouse scroll wheel) and I see:


I can actually zoom way into the plot if I wish using my mouse scroll wheel (or typical touch-screen gestures, or on the Mac the typical zoom gesture). Note the info-bar that shows up on the right hand side. That includes information about the plot (a caption, for example) as well as a link to the web page where it was pulled from. You can click on that link (see caveat below!) and bring up the web page. Even a link to a PDF note is there if the web scrapper could discover one.

Along the left hand side you’ll see a vertical bar (which I’ve rotated for display purposes here):


You can click on any of the years to get the plots from that year. Recent will give you the last 4 months of plots. Be default, this is where the viewer starts up – seems like a nice compromise between speed and breadth when you want to quickly check what has recently happened. The “FS” button (yeah, I’m not a user-interface guy) is short for “Full Screen”. I definitely recommend viewing this on a large monitor! “BK” and “FW” are like the back and forward buttons on your browser and enable you to undo a selection. The info bar on the left allows you do do some of this if you want too.

Want to play? Go to http://deeptalk.phys.washington.edu/ColliderPlots/… but first read the following. Smile And feel free to leave suggestions! And let me know what you think about the idea behind this (and perhaps a better way to do this).

  • Currently works only on Windows and a Mac. Linux will happen when Moonlight supports v4.0 of Silverlight. For Windows and the Mac you will have to have the Silverlight plug-in installed (if you are on Windows you almost certainly already have it).
  • This thing needs a good network connection and a good CPU/GPU. There is some heavy graphics lifting that goes on (wait till you see the graphics animations – very cool). I can run it on my netbook, but it isn’t that great. And loading when my DSL line is not doing well can take upwards of a minute (when loading from a decent connection it takes about 10 seconds for the first load).
  • You can’t open a link to a physics note or webpage unless you install this so it is running locally. This is a security feature (cross site scripting). The install is lightweight – just right click and select install (control-click on the Mac, if I remember correctly). And I’ve signed it with a certificate, so it won’t get messed up behind your back.
  • The data is only as good as its source. Free-form web pages are a mess. I’ve done my best without investing an inordinate amount of time on the project. Keep that in mind when you find some data that makes no sense. Heck, this is open source, so feel free to contribute! Updating happens about once a day. If an experiment removes a plot from their web pages, then it will disappear from here as well at the next update.
  • Only public web pages are scanned!!
  • The biggest hole is the lack of published papers/plots. This is intentional because I would like to get them from arxiv. But the problem is that my scrapper isn’t intelligent enough when it hits a website – it grabs everything it needs all at once (don’t worry, the second time through it asks only for headers to see if anything has changed). As a result it is bound to set off arxiv’s robot sensor. And the thought of parsing TeX files for captions is just… not appealing. But this is the most obvious big hole that I would like to fix some point soon.
  • This depends on public web pages. That means if an experiment changes its web pages or where they are located, all the plots will disappear from the display! I do my best to fix this as soon as I notice it. Fortunately, these are public facing web pages so this doesn’t happen very often!

Ok, now for some fun. Who has the most broken links on their public pages? CDF by a long shot. Smile Who has the pages that are most machine readable? CMS and DZERO. But while they are that, the images have no captions (which makes searching the image database for text words less useful than it should be). ATLAS is a happy medium – their preliminary results are in a nice automatically produced grid that includes captions.


Maps! Maps! Maps! November 27, 2009

Posted by gordonwatts in computers, DeepTalk, Maps.
add a comment

I have become a big of the DeepZoom technology, as anyone who has been reading these posts a while knows. I’m also a big fan of maps – especially old ones. I’ve never been brave enough to purchase any on eBay or anything like that, but I’d love to eventually own a few and hang them on my wall.


In the meantime I make use of the fantastic resources of the web. UW recently put up a small collection of old maps, from the 16th to the 19th century. Some of them are stunning. I definitely recommend spending some quality time exploring them.

The default interface that is presented to you, however, is a bit of a pain. For each map, scroll down to the “detailed view” entry below the map picture and click on that. They used Zoomify to encode the images in a nice zoom-able interface.

Sweet. I wish they had done one or two things a little differently:

  • Higher resolution images so you can zoom in even further
  • Put all the maps on a single page, with perhaps some information (and a search tool) on the right hand side. Check out Hard Rock’s example.
  • Can’t make it full screen. 🙂

I wish more people would do this for collections of images like these maps. It makes navigating them a lot of fun, and it is still possible to display the metadata.

Zoomify September 22, 2009

Posted by gordonwatts in computers, DeepTalk.

A bit of a technical post.

One of the biggest criticisms I get about DeepTalk (besides the fact that you can’t navigate using the arrow keys) is that it requires Microsoft’s Silverlight. There are two other options I’m aware of. First, to understand the problem that I’m working with, check out this simple conference that I’ve deeptalk’ed. Use the mouse wheel to zoom in/out and see how the display works.

For this discussion it is important to keep in mind the steps that a conference goes through on its way to becoming a DeepTalk:

  1. All the slides are sucked down from the internet, turned into jpgs, and then programmatically laid out.
  2. A rendering program reads the layout and all the images in and slices and dices the images into layers. These slices are stored on a web server with a decent internet connection.
  3. Code is downloaded to the browser that reads the layout and the slices and renders them just like any mapping website with zoom capabilities does.

First, raw javascript. This is an ideal solution. Every browser already has it installed and most modern browsers are pretty efficient. Indeed, all the mapping programs I use like live maps and google maps use this solution for terabytes of data. So why not me!? Well, the first requirement is I’m not willing to re-write the code, so I have to find it on the web. Actually, I did find one (are there others?) – from Microsoft and it can replace the Silverlight code. Ok! They I’m all set, right? Well, not. The code isn’t as capable as I need. For example, it can render only a single image at a time. For DeepTalk a single image is roughly equivalent to a single talk. I could render the whole conference as a single "image” however I do not have the memory on any machine I own to do that.

Second is a commercial Adobe Flash library called Zoomify. Check out their web page – very cool. It does exactly what I need. It requires Flash, which pretty much everyone has (even if they have to update – please do it – old software == hacker target!!!). Further, unlike Silverlight, Flash, works on Linux so – so this would be a big plus. Unfortunately, there are two problems. First, in order to automate the rendering you need the Enterprise version ($800 US – more than was spent on the server that is currently serving the DeepTalk content). Second, the project is well integrated with Adobe Flash – which is all great and fine for people who are used to Flash. But for the rest of us we need to learn a new programming language.

And finally there was the Silverlight version. This had the zooming built-in and the tools, including a rendering library I could link against, were all free. Further, the programming model for Silverlight is any .NET language – which includes C#, which looks a lot like C/C++ – something I can immediately start writing code in without having to buy a reference book.

So. That is why I’m using Silverlight for this project, and why, for the moment at least, it still remains the best choice for me for this project.

Now, as for the most popular criticism I’ve gotten about the project. I now have working on my desktop a version that allows you to use arrow keys to move around. Sadly, it still crashes due to bugs on about 1 in 3 conferences – which means it isn’t good enough to go on the web backend. You all will have to wait, sadly, for a little while longer: classes start next week, so a lot of my summer spare time is going to disappear!! Happy end of the summer!

Timeshifting A Conference: Can we all agree? Please? August 21, 2009

Posted by gordonwatts in computers, Conference, DeepTalk, Video.

A video feed or recording of a big physics conference is a mixed blessing.


If there is a video recording of a huge conference – like DPF – it would be 100’s of hours long. Many of the parallel sessions describe work that is constantly being updated – so it isn’t clear that if you posted the video how long it would be relevant. I’ve seen conferences just post video of plenary sessions and skip the parallel sessions for I imagine this very reason.

I definitely appreciate it when one of the big conferences does furnish video or streaming. But I have a major problem: time shifting. Even if I’m awake during the conference it is rare I can devote real time to watching it. Or if there is a special talk I might have to try to arrange my schedule around the special talk. But, come on folks – we’ve solved this problem, right? Tivo!?!? Or for us old folks, it is called a VCR!!!

Which brings me to the second issue with conference video. Formats. For whatever reason the particle physics world has mostly stuck to using RealMedia of one form or another. Ugh. I was badly burned back in the day with the extra crap that RealMedia installed on my machine so I’m gun shy now. But the format is also hard to manipulate. I tried a recent version of their player (maybe about 6 months ago) and they have a nice recording feature – exactly what I need here. But I couldn’t figure out how to convert its stored format to mp4 or other things to download to my mp3 player! There are some open source implementations out there – but I’ve never encountered one that has been good enough to reliably parse these streams.

This year’s Lepton-Photon is trying something new. They are streaming in RealMedia, but they also have a mp4 stream. And the free VLC player can play it. What is better is the free VLC player can record it! And convert it! Hooray!!! I can now download and convert these guys and listen/watch them on my commute to work and back, which is perfect for me (the picture above is a screen capture of the stream in VLC). The picture isn’t totally rosy, however. VLC seems to loose the stream every now-and-then. So when I’m recording it I have to watch the player like a hawk and restart it. Sometimes it will go two hours between drops, and other times just 10 minutes. It would be nice if it would auto-restart.

Which brings me to the last problem. Discoverability. I really like the way my DeepTalk project puts up a conference as a series of talks. But the only reason it works is because the conference is backed by a standard agenda/conference tool, Indico. My DeepTalk tools can interface with that, grab the agenda in a known format, and render it. We have no such standard for video.

Wouldn’t it be great if everyone did it the same way? You could point your iTunes/Zune/RealMedia/Whatever tool at a conference, it would figure out the times the conference ran, schedule a recording for streams, or if the video was attached, it would download the data… you’d come back after the conference was over, click the “put conference on my mp3 player” and jump on that long plane flight to Europe and drift off to sleep to the dulcet sounds of someone describing the latest update to W mass and how it has moved the most probably Higgs mass a few GeV lower.

Would that be bliss, or what!?

DPF & Lepton-Photon August 20, 2009

Posted by gordonwatts in Conference, DeepTalk, physics.
add a comment

It is conference season! Whee!

A few weeks ago the main American particle physics conference, DPF occurred. This is a big conference with lots of plenary and parallel sessions:


At the time I was a short distance away from Detroit, in Ann Arbor, being a Dad. It was a bummer not to be able to attend. I made sure the conference was rendered on my DeepTalk site (picture grab from above). I spent a few lunches the other day browsing it – there are some excellent talks – I definitely recommend checking it out!

This week it is the big Lepton-Photon conference here in Europe. They are simul-casting it as well, so I’m doing my best to watch and record bits of it (more on that in another post). I see someone already submitted that to DeepTalk, so it is partly rendered already. Unfortunately, DeepTalk can’t yet tell that the conference is still ongoing, so it doesn’t automatically update itself. I’ll make sure that happens over the weekend.

DeepTalk on your Desktop May 4, 2009

Posted by gordonwatts in DeepTalk.
add a comment

After getting back from CERN on Friday I spent a few hours on Saturday night fixing a few bugs (you know… relaxing!). The result is a new version of the deeptalk desktop application. This version has numerous fixes since the version that was released to the web and CHEP. Frankly, it is the version that should have been released at CHEP! Among other things, it will do the layout for a very large conference correctly, and also knows what to do with pptx files (office 2007 PowerPoint).

Eventually I want to be able to do password protected web sites this way (i.e. ATLAS talks, for example). At least, that is the reason why I’ve also got a non-web version of this.

The website remains unchanged for now. I want to make a few very minor updates to it and then roll in these big changes above – and reimage all the conferences. There should be a noticeable improvement when I get to that. As always, this is a hobby, know telling how long it will take!

DeepTalk a Conference April 17, 2009

Posted by gordonwatts in DeepTalk.

Ever wanted to view all the slides from one conference at once, on a really big (or small) screen? And zoom in on just the talks that looked interesting? Well, now you can: DeepTalk. 🙂

This was one of my hobby projects. Most of the work was done the evenings while I was on sabbatical. Since I’ve returned to UW development has slowed way down – but I managed to finish up a web site version of this for the CHEP 2009 conference (and presented a poster on it).

The idea is pretty simple. Download all the slides from a conference in an Indico web site, lay them out on a very large gym floor. Then zoom the camera way way out. That is the initial view when you are looking at a conference (for example, the Chamonix workshop discussing the future of the LHC). You can then zoom in using your mouse scroll wheel (or just doing a single click with your left mouse button), pan around by click-and-dragging, etc. If you have a conference you want rendered, there is a small text box at the bottom of the page – just put in an indico web site URL for the conference main agenda page and it will get queued for rendering (or take you to the correct web page if it has already been rendered).

It is based on Silverlight (which runs on Windows and Mac – Linux coming when Moonlight makes its 2.0 releases). There are some known bugs, but if you see other things or additions you think would be cool, definitely send a comment! I’ve been having a lot of fun using it to browse conferences I’ve missed (which is most of them, obviously).