jump to navigation

Data mining November 7, 2009

Posted by gordonwatts in Health, computers.
1 comment so far

In particle physics this is what we do. We have petabytes (1000 terrabytes!) datasets consisting of billions of physics interactions. For the particularly rare ones we need to pick out several 100 or 1000 and study them in detail. As you might expect, we are drowning in data and have developed many tools to help us. Computers are central – without them we would not be able to do the science we currently do!

The most common public example of data mining I’ve heard about is looking at all the receipts from Wallmart purchases. This is why grocery stores like you to sign up for their frequent-use cards – they can track everything you buy, sell that data, and, more importantly, send you ads that are likely to get you in and get you to buy other things. It is an amazingly powerful tool. In business it has been getting a bit of a bad name recently because it has been connected to some fairly serious invasion of privacy issues (i.e. creepy things – like knowing what hour of the day you check your email, how much the average person in your zip code makes, etc.).

But one place that it could obviously be applied for the greater good that I’d never really given much thought to is medicine. Check out this long article from the NYT on the topic – Making Health Care Better. It starts with some history – and the bromide “The amount of death and disease would be less if all disease were left to itself.” from 1835… to the present day:

“Medicine adopted the scientific method,” James said… “It transformed medicine, and it’s easy to make the case.”

He talks about the testing and science applied to any new method, drug, procedure before it is allowed to be used by mainstream doctors. But…

But there is one important way in which medicine never quite adopted the scientific method… …once a treatment enters the mainstream — once we know whether it works in certain situations — science is largely left behind. The next questions — when to use it and on which patients — become matters of judgment, not measurement.

The article provides a dizzying array of treatments available to a doctor that is trying to treat heart disease. And what treatment is left up to the doctor’s judgment.

Cleary some hospitals and doctors have better average outcomes than others – so some doctors must have better judgment than others. Wouldn’t it be great if every doctor could start with a default procedure that has been shown to work for a patient that looks like the one the doctor is trying to treat and then modify it to fit the patient’s specifics?

Well…

“I thought there wasn’t anybody better in the world at twiddling the knobs than I was,” Jim Orme, a critical-care doctor, told me later, “so I was skeptical that any protocol generated by a group of people could do better.”

And that is just it – there are so many variations in treatments. Today’s instruments are quite complex and have many settings – so how do you know what works correctly? When the procedure is approved there is a fair amount of science recorded for each setting, and presumable most doctors follow it. But not all of them!

And this is where I think data-mining could come in. What if every single modern instrument was hooked into the network, and each adjustment was recorded? And linked to a patients medical file (so you could see history). Each time a nurse or doctor did something it was recorded. All of that in some standard format – and then shared across hospitals and doctors the country or world over?

This has, of course, been a dream for a while of electronic health care records. It always struck me as obvious that you would attach x-rays, CT scans, descriptions of medicine given, etc., but it never occurred to me the level of detail you could go into! From a technical point of view this is hard – the data is so non-uniform, unlike the particle physics experiments I work on, but the long term benefits could be quite good. The article describes when this data mining technique was applied ad-hoc in just a single hospital:

One widely circulated national study overseen by doctors at Massachusetts General Hospital had found an ARDS [Acute respiratory distress syndrome] survival rate of about 10 percent. For those in Intermountain’s study, the rate was 40 percent.

At any rate, this tickled my fancy, which is why I wrote about it. I found it ironic that on the Health home page yesterday there was also the following article:

Five years later, Medicare underwrites more than half of the $4 billion the nation now spends annually on defibrillators, but the agency is no closer to knowing how many lives that big investment is saving.

My impression of the health care bills working their way through congress right now is none of them really go after cost-savings. Science can help*.

* Ok – making devices that can spit out data in a common format will add to their cost. But you can do simple things like the Intermountain study to start as better devices come online!

Hybrid cars… Hybrid accelerators November 3, 2009

Posted by gordonwatts in History, accelerator, physics.
4 comments

By now I think most people know how the Prius and other hybrid cards operate. Most cars’ breaks are just like a bycycle break: a clamp that generates a large amount of friction and slows the car down. This is a terrible waste of energy: the car’s motion is converted into heat and damage (to the brake pads) and can never be reclaimed. Think of it as wasted gas, excess pollution, etc.

Bicycle-Brakes[1] Hybrids are much more clever. They attach an electric motor/generator directly to the wheel and when you want to break then use the wheel’s motion to run the generator. This requires work – which slows down the car. Instead of the energy being lost, however, it is poured into a battery. The energy can then be reused to get the car started again. Huge savings in gas! This is also why hybrids tend to amazing at city driving, but not long distance driving (where this doesn’t help much because you aren’t stop/start).

Before we got sophisticated with generators and batteries we did something much more mechanical. At least for public transportation: the gyrobus:

Gyrobus_G3-1[1]

Instead of a battery, however, a giant flywheel was used to store the energy. These things were built back in the 1950’s.

Guess what… the same technology has been used for particle accelerators – specifically the Bevatron!

Bevatron[1]

Blah

96602956.lowres[1]

These are 65 ton flywheels, and there are two of them. Here is an abstract from a paper that describes the control system that ran these puppies:

The Bevatron/Bevalac main guide field power supply stores 680 MJ in the flywheel-shaft systems of two independent motor-generator sets. During the normal acceleration cycle of various heavy-ion beams, the energies of the rotating shafts are converted to energy stored in the main magnet guide field. At the end of the acceleration cycle, the magnet energy is inverted back to the shafts. Generally, this takes place from 10 to 15 times per minute. The rapid switching of ions, energy, and beam lines at the Bevalac has required various control techniques for fast switching between all operational Bevalac fields within 1 min. The power supply control systems and operating parameters are described.

The principle is same as with the hybrid car, or the gyrobus, but all the sizes and power are extreme (as usual for the field of particle physics). Imagine spinning up and down those flywheels at a rate of once every 10 seconds or so! Of course, that system would never have fit in a car!

While I don’t know the answer to this, I suspect that flywheels are still one of the best ways to store energy that has to be quickly extracted over the timescale of seconds. Batteries probably can’t do it without costing a huge amount, and capacitors probably have a much lower energy density – though they are ideal for other stored energy applications that require much faster discharge times!

Tuition Rates Going Up == Evil Universities October 29, 2009

Posted by gordonwatts in university.
4 comments

The CollegeBoard recently did a study for college tuition prices with the sub-title Public Four-Year Tuition Continues to Rise at Faster Rate than Private Four-Year Tuition. The report actually isn’t that bad:

The College Board announced today that college prices for the 2009-10 academic year continue to rise as state funding and endowment values decline. The financial difficulties facing households across the nation are putting increased pressure on financial aid budgets.

This was picked up by lots of news paper articles – for example this one from the AP:

With the economy struggling, parents and students dared to hope this year might offer a break from rising college costs. Instead, they got another sharp increase.

Average tuition at four-year public colleges in the U.S. climbed 6.5 percent, or $429, to $7,020 this fall as schools apologetically passed on much of their own financial problems, according to an annual report from the College Board, released Tuesday. At private colleges, tuition rose 4.4 percent, or $1,096, to $26,273.

From there it turned into articles talking about how universities were taking advantage of the students and families. At least the article that appeared in the New York Times got the real reason right – here is paragraph 2:

Hit hard by state budget cuts, four-year public colleges raised tuition and fees by an average of 6.5 percent last year. Prices at private colleges rose 4.4 percent, according to a report issued Tuesday by the College Board.

The next quote in that article takes a sharp left turn into.. well:

Patrick Callan, president of the National Center for Public Policy and Higher Education, called the increases “hugely disappointing.”

“Given the financial hardship of the country, it’s simply astonishing that colleges and universities would have this kind of increases,” Mr. Callan said. “It tells you that higher education is still a seller’s market. The level of debt we’re asking people to undertake is unsustainable.

I’m sorry, but give me a break. I totally understand the tuition problem. My university is going to raise tuition by 30% over the course of two years. Ouch. That will certainly strain students that don’t have financial aid. But what exactly were people expecting?

The state of Washington cut almost 30% of the UW budget. The voters in Washington made it clear that there were other priorities. So, UW has two choices: shrink by 30% in 6 months (about the length of time we knew what was going to happen). Shrinking by 30% is certainly possible – but it would be huge. We’d have to take about 30% less students than we do now – that probably would mean no incoming students this year at all (or we would have to kick out students that were already here), fire 30% of the faculty, close lots of departments. Probably have to completely kill off research. Actually, that would help with firing 30% of the faculty – most of us would just leave as fast as we could. Students who came to a major research university for learning would now be at what was basically a teaching college full of very pissed off professors – not what they signed up for. So Seattle raised tuition by 30% and took a 6% over all cut to the operating budget. All signs point to the same thing happening in the next two year budget as stimulus money disappears.

So look – we like to call these things public universities – but that implies public support. Frankly, the more the state backs out of its implied contract with the university, the more like a private university these institutions will look. At some point the state support will be small enough that the universities will want to change their relationship with the state. Heck, why deal with the oversight if they aren’t getting anything in return for it!?

Somewhere out there there is a year-by-year trend plot of state support of universities. It has been steadily falling for over 20 years. This last year was particularly bad, but not really that different from the trend overall. California is at risk of destroying one of the best university systems in the country over this very same issue.

Want to keep tuition down? Keep public universities accessible? Don’t just yell “cut costs, get rid of waste” at the universities. Make sure your state legislature continues to support the university as well. The budget has to balance. If the state gives less, then that extra money has to come from somewhere!

Ah, the soap box. How I have missed thee.

Dark Matter Discovered – Loosing Control Of Your Data October 26, 2009

Posted by gordonwatts in GLAST, physics, physics life, science.
2 comments

Ok, so it is a sensationalist title. But it was triggered by archive submission with the following title: Possible Evidence For Dark Matter Annihilation In The Inner Milky Way From The Fermi Gamma Ray Space Telescope. Wow! That is quite a title!

First, a bit of background on this paper. This is authored by two theorists who analyzed publically released FermiLAT/GLAST data. Fermi is a NASA funded project and one of its stipulations is that all data it collects must be made publically available 6 months after it has been collected. The authors of the paper downloaded the data, used a simple background model, added in their dark matter theory, and did a fit. And pow:

image The red points are the data from Fermi, the dash-dot line and the dotted line are backgrounds (galactic diffuse, and a single TeV source), and the dashed line is their model. Nice fit, eh? Yep – looking at this my first reaction is “Wow – is this right? This is big – how did Fermi miss this?” and then I run across the hall to find someone that actually knows this data well.

It turns out the basic problem with this analysis is that not all sources of background are included. This is the galactic center, and, as one would imagine, there are lots of sources there. Not just one TeV source modeled above. My impression from hallway conversations is that when you take into account all of these sources there is much less (if any) room left for the dark matter model. I don’t think that Fermi has published a paper on this yet, but I suspect they will try a some point soon.

Ok, so all’s well. Fermi will publish the paper and everyone will know the right way to do this non-trivial analysis. Except that things got away from them. Nature news has picked it up and wrote a short update. This is pretty widely read. Now Fermi has a PR problem on its hands – people are running around talking about their data and they’ve not really had a voice yet (the science coordinator for Fermi was interviewed for this bit, but her comments were relegated to the end of the post). Fermi is a big collaboration (yes, not the size of the LHC), even if their paper is close to publication it would probably be at least a month or more before the collaboration could agree on a response. So what to do?

There are a lot of issues surrounding making data public. To first order, it is the tax payers that are paying for these experiments, so the data should be public. On the other hand, you can already see that besides the work and infrastructure of making the data public (which costs real $$ – especially for a big experiment like Fermi or one of the LHC experiments), you have to respond to other folks that analyze your data – basically pointing out their mistakes and trying to help them along, even when they might be in competition with some of your internal analyses. In NASA’s case all the data has to be made public – it is written into every grant submission and NASA even provides money for it. This is not currently the case for particle physics. In many of these advanced experiments the data is quite complex – and someone that can’t depend on the large infrastructure of the experiment to help interpret it is bound to have some difficulties.

One only wishes that the authors had gotten in contact with some Fermi folks before submitting their note to the archive…

Units, Units, Units October 23, 2009

Posted by gordonwatts in physics, physics life.
4 comments

Undergraduates know that Physics Professors get all wound up about units. We can’t help ourselves.

But in reading a nytimes article this morning I couldn’t help myself:

In addition, Mr. Holder said, the authorities have seized more than $32 million in American currency, 2,700 pounds of methamphetamine, 4,400 pounds of cocaine, 16,000 pounds of marijuana and 29 pounds of heroin. More arrests are expected.

Well… this is what happens when you wait until the evening to write a blog post you spotted in the morning – they change the article. That 2700 pounds? It was 2700 kilograms (which is significantly more). In short – they had mixed kilograms and pounds. I was going to get on my high horse and… well, seems someone at the times is as sensitive about this as us physicists are.

But it also occured to me that the notion of units is rather flexible. For example, when we do particle physics calculations we often set the speed of light to 1. Normally it is 300000000 meters/second (really fast!). Seriously. We just set it to 1. We are so annoyed by having to carry around that number in our calculations that we just up and set it to one. We do that with an other constant as well (called h-bar). Your unit system ends up being very weird when you do that:

Normal Every Day Units Units in h-bar = c = 1
Energy Energy
Time 1/Energy
Mass Energy
Length 1/Energy

I know this seems weird – but you see it all the time. This is just like making the following unit conversion in the list of drugs: instead of telling us the number of pounds or kilograms, tell us how much pot they got in terms of its street value. And to tell the truth, that would have been a very useful number to have in that article.

Heck, in the old days, the unit of measure in the market was the length of the king’s forearm. When the king changed, the whole country would change its unit system…

Un physics professors getting wound up with units is ironic – we don’t really use them that heavily when we get to more advanced calculations. On the other hand, we can only drop them because we have already learned how to use them. At least, that is what we tell ourselves and everyone else! ;-)

Zoomify September 22, 2009

Posted by gordonwatts in DeepTalk, computers.
3 comments

A bit of a technical post.

One of the biggest criticisms I get about DeepTalk (besides the fact that you can’t navigate using the arrow keys) is that it requires Microsoft’s Silverlight. There are two other options I’m aware of. First, to understand the problem that I’m working with, check out this simple conference that I’ve deeptalk’ed. Use the mouse wheel to zoom in/out and see how the display works.

For this discussion it is important to keep in mind the steps that a conference goes through on its way to becoming a DeepTalk:

  1. All the slides are sucked down from the internet, turned into jpgs, and then programmatically laid out.
  2. A rendering program reads the layout and all the images in and slices and dices the images into layers. These slices are stored on a web server with a decent internet connection.
  3. Code is downloaded to the browser that reads the layout and the slices and renders them just like any mapping website with zoom capabilities does.

First, raw javascript. This is an ideal solution. Every browser already has it installed and most modern browsers are pretty efficient. Indeed, all the mapping programs I use like live maps and google maps use this solution for terabytes of data. So why not me!? Well, the first requirement is I’m not willing to re-write the code, so I have to find it on the web. Actually, I did find one (are there others?) – from Microsoft and it can replace the Silverlight code. Ok! They I’m all set, right? Well, not. The code isn’t as capable as I need. For example, it can render only a single image at a time. For DeepTalk a single image is roughly equivalent to a single talk. I could render the whole conference as a single "image” however I do not have the memory on any machine I own to do that.

Second is a commercial Adobe Flash library called Zoomify. Check out their web page – very cool. It does exactly what I need. It requires Flash, which pretty much everyone has (even if they have to update – please do it – old software == hacker target!!!). Further, unlike Silverlight, Flash, works on Linux so – so this would be a big plus. Unfortunately, there are two problems. First, in order to automate the rendering you need the Enterprise version ($800 US – more than was spent on the server that is currently serving the DeepTalk content). Second, the project is well integrated with Adobe Flash – which is all great and fine for people who are used to Flash. But for the rest of us we need to learn a new programming language.

And finally there was the Silverlight version. This had the zooming built-in and the tools, including a rendering library I could link against, were all free. Further, the programming model for Silverlight is any .NET language – which includes C#, which looks a lot like C/C++ – something I can immediately start writing code in without having to buy a reference book.

So. That is why I’m using Silverlight for this project, and why, for the moment at least, it still remains the best choice for me for this project.

Now, as for the most popular criticism I’ve gotten about the project. I now have working on my desktop a version that allows you to use arrow keys to move around. Sadly, it still crashes due to bugs on about 1 in 3 conferences – which means it isn’t good enough to go on the web backend. You all will have to wait, sadly, for a little while longer: classes start next week, so a lot of my summer spare time is going to disappear!! Happy end of the summer!

Presentation September 20, 2009

Posted by gordonwatts in Conference, computers.
2 comments

Ok. Really. This is my last post of Video for a while. Ever since I started the DeepTalk project I’ve started to be much more aware of how conference data is put out on the web. So it has become a bit of a soap-box for me. :-) But this is the last one for a while, I promise.

During my last several posts on this there have been a bunch of comments on how other conferences have presented their video online. I thought I’d give you my opinion. :-)

  • Pycon 2009 – the annual Python conference. At first I was hopeful about this – the web page is quite nice and you’ll notice right at the top there is a nice iCal link so you can download the schedule. However, the schedule is just that – a schedule. You can’t get access to the links to the talks or video from there. Associated with the web page is a RSS feed too – which is excellent – I could now use my pod-cast software (any software should be able to read it) and I could download the audio of the whole event. Sweet. However, there is no way to connect the slides and the video or audio together via a program (as far as I can tell). The video looks like it is all archived on blip.tv. The beauty of this system is that it makes files availible in lots of formats (see this talk, click on the “files and links” to see). AND there is a small little RSS link at the bottom – so I can get all the talks down as video to my podcast software (the default seems to be the MP4 format, which satisfies most of my requirements as a good video format). So this conference has made its schedule available in a standard format (iCal), made all of its videos available in a standard format (blip.tv). I’d like to see some integration between the two so that one could find the slides, abstract, and video together, using a program. :-)
  • Strings ‘07 Conference – a conference on strings. The conference website is basically a series of static web pages – including the schedule (I’ve extracted that page – but you can get to it by looking at the home page –> Scientific Program –> Speakers&Titles). There are links to the slides and Video. The video is in MP4 format (fantastic!). None of this is discoverable, unfortunately, by a program – you would have to scrape the web page in order to find it. Chimpanzee, who has left a lot of comments on these video posting, has done some work with this conference, putting it in iTunes as a show. Unfortunately, unless you have iTunes installed, this is not very useful as it brings you to an Apple page that asks you to download and install ITunes. However, Chimpanzee did put this on blip.tv as a several shows (one show per day – I think from the point of view of subscribing I’d have preferred a single show for the whole conference). Also, the nice RSS feeds to blip.tv are well hidden. So, well done with mp4 and PDF files up there. The blip.tv solution is quite nice, again. The static web page that links them together isn’t so good – it isn’t very discoverable, unfortunately.
  • Lepton-Photon 2009 – The agenda is posted in the standard agenda software in use in HEP, Indico, which makes it easily exportable. Each talk has a link to the PDF as well as a Video link. Unfortunately, the Video leads to a RealMedia file – which my open source tools cannot play. So the video format doesn’t pass muster.

I am pleasantly surprised by blip.tv. It looks like a very nice service. I have no idea what their business model is. The good news is that people won’t watch most talks from a physics conference very much – so they will require very little bandwidth.

No conference gets it quite right (IMHO), but they all come close. From my point-of-view, combining Indico with blip.tv seems like a fairly ideal solution given current technology constraints.

Two quick notes. First, there has been a hope that perhaps HTML5 would standardize a single video format – and we could all just depend on all browsers running it without having to install plugins like the security-ridden Flash or RealMedia. This is not to be, however. There is an excellent blog series for those of you who want to know what is happening to HTML5 that I stumbled on. This posting makes it clear that a preferred video format no longer exists in the standard (for details, see the change log for the standard).

Second, I keep holding up Indico as a nice way to post meeting agendas. But perhaps there is a standard for this sort of thing? A microformat or perhaps something form the Semantic Web? Then Indico (and everyone else) could produce that for various tools to parse. I only took a brief search, but didn’t find anything.

Bjarne Stroustrup September 8, 2009

Posted by gordonwatts in CERN, ROOT, computers.
3 comments

IMG_2253If you are even semi-conscious of the computing world you know this name: Bjarne Stroustrup. He is the father of C++. He started designing the language sometime in the very late 1970’s and continues to this day trying to keep it from getting too “weird” (his words).

He visited CERN this last week, invited by the ROOT team (I took few pictures). I couldn’t see his big plenary talk due to a meeting conflict, but my friend Axel, on the ROOT team, was nice enough to invite me along to a smaller discussion. Presentations made at this discussion should be posted soon here. The big lecture is posted here, along with video (sadly, in flash and wmv format – not quite mp4 as I’ve been discussing!!)! I see that Axel also has a blog and he is posting a summary there too – in more detail than I am.

The C++ standard – which defines the language – is currently overseen by a ISO Standards Committee. Collectively they decide on the features and changes to the language. The members are made up of compiler vendors, library vendors, library authors, large banking organizations, Intel, Microsoft, etc. – people who have a little $$ and  make heavy use of C++. Even high energy physics is represented – Walter Brown from Fermilab. Apparently the committee membership is basically open – it costs about $10K/year to send someone to all the meetings. That is it. Not very expensive. The committee is currently finishing off a new version of the C++ language, commonly referred to as C++0x.

The visit was fascinating. I’ve always known there was plenty of politics when a group of people get together and try to decide things. Heck, I’m in High Energy Physics! But I guess I’d never given much thought to a programming language! Part of the reason it was as fascinating as it was was because several additions to the language that folks in HEP were interested in were taken out at the last minute – for a variety of reasons – so we were all curious as to what happened.

I learned a whole bunch of things during this discussion (sorry for going technical on everyone here!):

  • Bjarne yelled at us multiple times: people like HEP are not well represented on the committee. So join the thing and get views like ours better represented (though he worried if all 150 labs joined at once that might cause a problem).
  • In many ways HEP is now pushing several multi-core computing boundaries. Both in numbers of cores we wish to run on and how we use memory. Memory is, in particular, becoming an acute problem. Some support in the standard would be very helpful.  Minimal support is going in to the new standard, but Bjarne said, amazingly enough, there are very few people on the committee who are willing to work on these aspects. Many have the attitude that one core is really all that is needed!!! Crazy!
  • In particle physics we leak memory like a sieve. Many times our jobs crash because of it. Most of the leaks are pretty simple and a decent garbage collector could efficiently pick up everything and allow our programs to run longer. Apparently this almost made it into the standard until a coalition of the authors of the boost library killed it: if you need a garbage collector then you have a bug; just fix it. Which is all good and glorious in an ideal world, but give me a break! In a 50 million line code base!? One thing Bjarne pointed out was it takes 40 people to get something done on the committee, but it takes only 10 to stop it. Sort of like health insurance. :-)
  • Built in support for memory pools would probably be quite helpful here too. The idea is that when you read in a particle physics event you allocated all the data for that event in a special memory pool. The data from an event is pretty self-contained – you don’t need it once you have done processing that event and move onto the next one. If it is all in its own memory pool, then you can just wipe it out all at once – who cares about actually carefully deleting each object. As part of the discussion of why something like this wasn’t in there (scoped allocators sounds like it might be partway there) he mentioned that HP was “on our side”, Intel was “not”, and Microsoft was one of the most aggressive when it came to adding new features to the language.
  • I started a discussion of how the STL is used in HEP – pointing out that we make very heavy use of vector and map, and then very little else. Bjarne expressed the general frustration that no one was really writing their own containers. In the ensuing discussion he dissed something that I often make use of – the for_each loop algorithm. His biggest complaint was who much stuff it added – you had to create a whole new class – which involves lots of extra lines of code – and that the code is no longer near where it is being used (non-locality can make source code hard to read). He is right both are problems, but to him they are big enough to nix its used except in rare circumstances. Perhaps I’ll have to re-look at the way I use them.
  • He is not a fan of OpenMP. I don’t like it either, but sometimes people trot it out as the only game in town. Surely we know enough to do better now. Tasked based parallelism? By slots?
  • Bjarne is very uncomfortable with Lambda’s functions – a short hand way to write one-off functions. To me this is the single best thing being added to the language – it will not be possible to totally avoid having to write another mem_fun or bind2nd template. That is huge, because those things never worked anyway – you could spend hours trying to make the code build, and they added so much cruft to your code you could never understand what you were trying to do in the first place! He is nervous that people will start adding large amounts of code directly into lambda functions – as he said “if it is more than one line, it is important enough to be given a name!!” We’ll have to see how use develops.
  • He was pretty dismissive of proprietary languages. Java and C# both were put in this category (both have international standards behind them, just like C++, however) – citing vendor lock-in. But the most venom I detected was when he was discussing the LLVM open source project. This is a C++ interpreter and JIT. This project was loosely run but has now been taken over by Apple – presumably to be, among other things, packaged with their machines. His comment was basically “I used to think that was very good, but now that it has been taken over by Apple I’d have to take a close look at it and see what direction they were taking it.”
  • Run Time Type Information. C++ came into its own around 1983 or so. No modern language is without the ability to inspect itself. Given an object, you can usually determine what methods are on the object, what the arguments of those methods are, etc. – and most importantly, build a call to that method without having ever seen the code in source form. C++ does not have it. We all thought there was a big reason this wasn’t the case. The real reason: no one has pushed hard enough or is interested enough on the committee. For folks doing dynamic coding or writing interpreters this is crucial. We have to do that in our code and adding the information in after-the-fact is cumbersome and causes code bloat. Apparently we just need to pack the C++ committee!

Usually as someone rises in importance in their field they get more and more diplomatic – it is almost a necessity. If that is the case, Bjarne must have been pretty rough when he was younger! It was great to see someone who was attempting to steer-by-committee something he invented vent his frustrations, show his passion, name names, and at one point threaten to give out phone numbers (well, not really, but he almost gave out phone numbers). He can no longer steer the language exactly as he wants it, but he is clearly still very much guiding it.

You can find slides that were used to guide the informal discussion here. I think archived video from the plenary presentation will appear linked to here eventually if you are curious.

Congratulations to Sasha Rozanov! September 8, 2009

Posted by gordonwatts in Marseille, physics life.
add a comment

IMG_2378Each ear the French Science agency – CNRS – awards silver and gold medal’s to its researchers. Sasha got the silver one this year. This is a big deal – people were coming from all over France (and from CERN) to take part in the party and short symposia held in his honor. It couldn’t have happened to a nicer guy. If you know him, definitely send him congrats!

BTW, that is a picture of Sasha killing his cell phone during the ceremony. I’ve got more dignified pictures in the usual spot.

Time shifting Video: Recording September 6, 2009

Posted by gordonwatts in Conference, computers.
2 comments

In my first post on video there were a few comments on the effort required to record the video in the room. The basic question from Chip was the following:

The question I have is the on-site effort and expense. Take the PyCon setup: any clue what synch software they used? Because of the zooming, they had a person with a camera. Maybe I’ve not noticed, but having the slides small and the person large is an interesting idea. With the slides separately available in full-resolution, one could use the on-screen slide images as just a key to tell you when to actually click on the full size ones. Usually, it’s the other way with the person being very small and the slides larger. In fact, pedagogically, having the viewer then have to manipulate something during the talk would keep them in the game, so to speak.

Ok, there are several questions. First point: I want to be able to view this stuff on my MP3 player – so “keeping someone in the game” is not what I have in mind for that sort of viewing. :-)

Now, the more important thing: cost of recording. There was a reply to this from Tim:

Why don’t you just record the video from the camera and the input to the projector? This would seem like an easy way to get synchronized slides.

For some dumb reason that hadn’t occurred to me – get a VGA splitting and hook its input up to your computer. The Lepton-Photon folks seem to have basically done that:

image

Judging from the quality of the slides (which is worse here because this was a low resolution image), I’d guess they had a dedicated camera recording the slides rather than actually looking at the computer output. A second stream focused on the presenter and they can use common post-processing tools to combine the two streams as they have above. In fact, the above is from a real-time stream. I don’t know what tool they used, but I can think of a few open-source ones that wouldn’t have too much difficulty as long as you had a decent CPU behind you. On caveat here: in a production environment I have no idea how hard it is to capture two streams and keep them in sync. If they are on two computers they you need software to make sure they start at the same time. Or if there is a glitch and you loose one, etc.

Chip also asks the key question:

what did it cost?

I’m not sure what the biggest expense for these things is – but it is usually culprit is the person doing the work – so I’ll go with that. To record a conference I assume you need to setup the video, run the system while it is recording, and then post-process the video to make it available on the web. The post processing could be fairly time consuming: you have to find where each talk ends and the next one begins, cut the talks, render the final video, etc.

Thinking about this, it seems like one could invest a little money up front and perhaps drop the price quite a bit. First, making software to record the two streams and keeping track of the sync can’t be too hard to write. On the windows platform I’ve seen plenty of samples using video and doing real-time interpretation. Basically, at the end of the day you would want two files with synchronization information: one with video focused on the slides, and the other on the person (with a decent audio pickup!)

If one wants to stream the conference live – that is harder. I don’t know enough about streaming technology to know how it would fit in above without impacting the timing – which is fairly important for the next step.

A human could probably recognize almost the complete structure of the conference from the slide stream alone. I suspect we could write a computer program to do something similar. Especially if we also handed the computer program the PDF’s of all the talks. Image comparison is probably decent enough that it could match almost every slide to the slide stream. As a result you’d get a complete set of the timings for the conference – when the title slide when up, when the last slide was reached, when the next talk started. Heck, even when every single slide was transitioned. You could then use these timings to automatically split the video into talk-by-talk video files. Or generate a timing file with detailed information (I’d love slide-by-slide timing for my deeptalk project). During this step you could also combine the two streams, much as is done in the above live stream I recorded. You could even discard the slide stream and put high quality images from the PDF in its place.

I doubt this would be perfect, but I bet it would get you 90% of the way there. It would have trouble at the edges – before the conference started, for example. Or if someone gives a talk with no slides or slides that are very different from the ones it is given to parse. But, heck, that is to be fixed in Version 2.0. I do not know if 90% is good enough for a project like this.

Seems like a perfect small inter-disciplinary project between CS and physics (with a small grant for one year of work). :-) I wonder how far fetched this is?