jump to navigation

Research notes in the Internet Archive December 18, 2007

Posted by gordonwatts in archive, computers, physics life.

Articles like this one in Ars-Technica always fascinate me: Internet Archive to store researchers’ notes, raw data. The idea is to collect everything an academic does — web pages, notes, local files, etc. Then they get uploaded to the Internet Archive. Check out the article for more details.

They fascinate me because there is a good part of me that is a***-retentive — I want to keep track of everything I do. For example, I use Onenote as my electronic logbook (along with a Tablet PC so I can write in it). I keep a huge amount in there — gigabytes – plots, notes I’ve proof-read, etc. All of it searchable (it is pretty cool — it will OCR any PDF’s I put in there, so they become searchable; and my handwriting too — something my second grade teacher could never do).

Would it be useful? Timmer, who wrote the Ars article, took a stab:

Will the material that’s uploaded be of any value? Based on my personal experience, the answer here will be mixed. I’ve taken notes and made annotations for everything from peer-reviewed publications to articles for Ars, but only a fraction of the ideas ever make it into the publication. Within the remainder, there are some genuine insights that don’t make the cut due to a lack of direct relevance or space constraints. But there are also a lot of spur-of-the-moment thoughts that I later reject due to further reading or analysis. Unless all contributors are careful about what they upload, this effort may produce a storehouse of bad ideas.

Let me go further: No. It will not be very useful. At least, not my research notes. I make so many mistakes, try out so many ideas that are just plain dumb in retrospect (and many that were dumb in the first place – I just missed that fact). And I can’t spell.

There is also the issue of unpublished data. My logbook contains a great deal of D0 (and hopefully soon, ATLAS) data that has never been published. Plots with unrefined selection cuts (with interesting bumps!). Items that have not been vetted by the collaboration for publish release. Perhaps after some statute of limitations one could release this into the public, but certainly not immediately.

On the other hand, this is personally incredibly useful. Further, it is available on all the computers I use (well, the Windows ones), sync-ed across the net automatically, and so much lighter to carry around than a real logbook. So, while I’m 100% behind logbooks, mine will remain private for the near future.

I guess the biggest question I have is: what would this be useful for? I don’t see it other than as a way to save for posterity (can you say “information overload”)? What am I missing here?



1. Robert cudmore - December 18, 2007

Hi Gordon,

I find this interesting as well, if there was an elegant solution it would blow us out of the water. I have been searching for such a solution to the electronic lab notebook but have not found a suitable fix. I have been trying to use tiddlywiki (http://www.tiddlywiki.com/) for some time but have found it more useful to just make meaningful folder hierarchies on my hardrive which I then mirror to the different computers I use. Problem with the second technique is lame searching (spotlight and quicksilver on the mac may be a future fix to this?) and that is why I am always drawn back to tiddlywiki. If it was easy to make hyper-links to different word (processing) documents, code, and images then the lab notebook in tiddlywiki would work. Dammit, I am going to go back to try again. Darn you Gordon… I have to work.

2. gordonwatts - December 19, 2007

Ha! “Elegant and blow us out of the water!” Let me talk to you about compact cameras… 🙂

The features you list are what caused me to settle on OneNote (yes, a Microsoft, Windows only product currently – pretty sure it isn’t in the new Mac office 2008 upcoming release). I can paste pictures in, links to other sections, totally searchable (even from the Windows eqiv of spotlight and quicksliver). And it OCR’s my documents. And it can deal with a really large bunch of data (I’m above 2 gigs and it I’ve not seen signs of strain).

The big thing missing is I would like to make a plot on my Linux machine (where I make most of my plots), click or run something trivial over there, and have it show up as a picture in the current open page of my notebook. There are API’s to do all of that, but I’ve been too lazy…

Many of my friends seem to have just stopped keeping a paper logbook — the most ambitious do something like what you do. Some keep an internal web page of what they do (basically, a blog). Others just text files, and others keep records in email. I’ve not seen one solution take our branh of science over (for example, the way Linux has taken over CPU heavy tasks). Part of the problem is HEP is not willing to spend too much money of software. Against our religion (but you should see what we pay in GRID development costs! ;-)).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: