Time shifting Video: Recording September 6, 2009Posted by gordonwatts in computers, Conference.
The question I have is the on-site effort and expense. Take the PyCon setup: any clue what synch software they used? Because of the zooming, they had a person with a camera. Maybe I’ve not noticed, but having the slides small and the person large is an interesting idea. With the slides separately available in full-resolution, one could use the on-screen slide images as just a key to tell you when to actually click on the full size ones. Usually, it’s the other way with the person being very small and the slides larger. In fact, pedagogically, having the viewer then have to manipulate something during the talk would keep them in the game, so to speak.
Ok, there are several questions. First point: I want to be able to view this stuff on my MP3 player – so “keeping someone in the game” is not what I have in mind for that sort of viewing.
Now, the more important thing: cost of recording. There was a reply to this from Tim:
Why don’t you just record the video from the camera and the input to the projector? This would seem like an easy way to get synchronized slides.
For some dumb reason that hadn’t occurred to me – get a VGA splitting and hook its input up to your computer. The Lepton-Photon folks seem to have basically done that:
Judging from the quality of the slides (which is worse here because this was a low resolution image), I’d guess they had a dedicated camera recording the slides rather than actually looking at the computer output. A second stream focused on the presenter and they can use common post-processing tools to combine the two streams as they have above. In fact, the above is from a real-time stream. I don’t know what tool they used, but I can think of a few open-source ones that wouldn’t have too much difficulty as long as you had a decent CPU behind you. On caveat here: in a production environment I have no idea how hard it is to capture two streams and keep them in sync. If they are on two computers they you need software to make sure they start at the same time. Or if there is a glitch and you loose one, etc.
Chip also asks the key question:
what did it cost?
I’m not sure what the biggest expense for these things is – but it is usually culprit is the person doing the work – so I’ll go with that. To record a conference I assume you need to setup the video, run the system while it is recording, and then post-process the video to make it available on the web. The post processing could be fairly time consuming: you have to find where each talk ends and the next one begins, cut the talks, render the final video, etc.
Thinking about this, it seems like one could invest a little money up front and perhaps drop the price quite a bit. First, making software to record the two streams and keeping track of the sync can’t be too hard to write. On the windows platform I’ve seen plenty of samples using video and doing real-time interpretation. Basically, at the end of the day you would want two files with synchronization information: one with video focused on the slides, and the other on the person (with a decent audio pickup!)
If one wants to stream the conference live – that is harder. I don’t know enough about streaming technology to know how it would fit in above without impacting the timing – which is fairly important for the next step.
A human could probably recognize almost the complete structure of the conference from the slide stream alone. I suspect we could write a computer program to do something similar. Especially if we also handed the computer program the PDF’s of all the talks. Image comparison is probably decent enough that it could match almost every slide to the slide stream. As a result you’d get a complete set of the timings for the conference – when the title slide when up, when the last slide was reached, when the next talk started. Heck, even when every single slide was transitioned. You could then use these timings to automatically split the video into talk-by-talk video files. Or generate a timing file with detailed information (I’d love slide-by-slide timing for my deeptalk project). During this step you could also combine the two streams, much as is done in the above live stream I recorded. You could even discard the slide stream and put high quality images from the PDF in its place.
I doubt this would be perfect, but I bet it would get you 90% of the way there. It would have trouble at the edges – before the conference started, for example. Or if someone gives a talk with no slides or slides that are very different from the ones it is given to parse. But, heck, that is to be fixed in Version 2.0. I do not know if 90% is good enough for a project like this.
Seems like a perfect small inter-disciplinary project between CS and physics (with a small grant for one year of work). I wonder how far fetched this is?