jump to navigation

ArXiv on the iPhone – Time to get rid of PDF May 28, 2009

Posted by gordonwatts in archive, computers.
trackback

A friend of mine, Ann Heinson (thanks!), sent me a link to a UK Telegraph blog posting: ArXiv on the iPhone: a pocketful of Science. ArXiv is, of course, the main paper repository for physics papers of all sorts. The iPhone applications allows you to browse the latest papers, or perhaps search for a particular paper. It was particularly gratifying to note that the best written one was done by David Bacon, a fellow faculty member at University of Washington and also a great blogger. Included in his app is the ability to store papers for later offline reading.

Which brings me to my own personal soap box: offline reading. Here is an expert from the original text on the UK Telegraph:

As the full texts of the papers are not available and the PDFs are often divided into columns, reading on the iPhone’s small screen is not always a very pleasurable experience. The only way to change the font size for easier reading is to zoom in on the text, but as the columns do not change you will probably find yourself scrolling horizontally as well as vertically, which can be wearing.

Before I rant on PDF, let me say that when it arrived, and mostly to date, there isn’t much better than PDF. But it’s time for it to go. Its biggest sin, in my mind: static formatting. I’ve been known to read ArXiv papers on my portable 1440×1050 screen, my 1080p desktop screen, my 48 inch TV screen, a old 1280×1024 screen, and a tiny 800×600 screen. And that doesn’t include a Kindle or an iPhone. PDFs problem: it displays exactly the same way on all those screens! More next time.

Disclaimer: I do not own an iPhone myself, so I’ve not actually tested any of these apps.

Comments»

1. Anonymous - May 29, 2009

wouldn’t it be possible to write a PDF reader for the iPhone that would dynamically reformat onto the screen? Is it the fault of PDF, or the Acrobat reader?

2. gordonwatts - May 29, 2009

Right – parse the PDF. Adobe and others even release tools that allow you to inspect and extract information from a PDF. Hey – my windows OS automatically indexes the 2 GB of PDF papers I keep in my own personal archive.

The problem is the flow. When you have multi-column flow the PDF doesn’t care. The PDF is a layout language – not a text flow language. So I suspect you could write code that would extract the code and get it right for some class of documents. But why do that when you have access to the source tex – which is much closer in meaning to what the author meant?

A way to test this is open up adobe reader (or any other reader), turn on the text selection tool, then select every bit of text on a two column page, and then paste it into a formated editor document (or even just a text file).

So, sure, you could do it, but I don’t think people would consider it a viable solution for most PDF documents.

3. rahul m - May 29, 2009

I wonder, what do you think of the djvu format? It seems to be a lot better if it’s used to scan old papers. The problem of it displaying identically on all kinds of screens remains, but the benefits (no bloatware needed, drastically smaller file sizes) look to be worth it.

4. gordonwatts - May 29, 2009

I’ve not tried it out. I’ve had plenty of problems with the PDF format and bloatware (though Acro has gotten _a lot_ better since version 6 – though I still turn off most of what it wants to install to try to keep my system clean). But I really want a technology that will re-flow the text for what I’m talking about here. For that you need the source. Or the PDF needs to be encoded with enough information so you can extract the original intent (see first commenter). You want the mark-up (the TeX, or the HTML, or something).

As far as a portable format, however, PDF is here to stay, I’m afraid. It is the layout format.

5. Ian Douglas - May 29, 2009

Gordon

Thanks for the link. I agree absolutely that PDF is a terrible format for web documents. Any static layout is going to be a disaster as soon as you put it onto a variety of different displays. Easy extraction of the text for re-flowing is the dream of course but I’m not sure how keen Adobe would be on that, and they still have the printing industry sown up. What I spend an increasing amount of my time wondering is why academic journals want to print on paper at all? Just what does it get them that electronic publishing doesn’t?

6. gordonwatts - May 29, 2009

I agree! But it is history. Actually, still is. In HEP the big journals in the USA are PRL and PRD. Those are still paper. The archives are all made so you can submit the same stuff there that you can to the journals. Because, of course, when they started the only thing in town was the journals.

As long as journals remain powerful in that sense I htink we will be stuck with the basic page format. I just don’t see it changing. Just about every theorist and experimentalist I know has a few TeX template files they just copy out of a directory and start filling in. That is then output to PDF. That cycle will be hard to break. The best hope is if someone can figure out how to output TeX to a new re-flow format. My next post has a few comments on some of the stuff I know about (I know there is probably more out there).

There are other formats around now. There is an open eBoook format, for exmaple. I’m pretty sure it can’t do math yet, but that is just a matter of time, right? (right!? Please!?! :-)).

In some sense we have the capability to reflow on the fly with arXiv. Almost all the documents are in TeX (but Word and ODF work too). In TeX you could respecify the page size and re-run. An iPhone probably doesn’t have the power to do that, and you have to deal with shrinking the pictures, which TeX won’t do for you automatically, but… you’d be close.

P.S. Thank Ann – she is the one that forwarded your link to me and got me started on this two part riff.

7. Ian Douglas - May 29, 2009

Thank you Ann.

Some UK scientists (none of whom wanted to be named) have told me that funding bodies look down upon electronic publishing and that attempts they’ve made to start journals that would live entirely online have been stymied by fears of missing their publishing targets. Do you think the same might be true of US money men?

8. Ian Douglas - May 29, 2009

Thanks addressed to Ann, comment to you Gordon, by the way.

9. Moving Beyond PDF « Life as a Physicist - May 30, 2009

[…] Beyond PDF May 30, 2009 Posted by gordonwatts in Uncategorized. trackback Last time I used iPhone app’s as a lead in to writing about why PDF is no longer the best format to get […]

10. gordonwatts - June 1, 2009

So, I think in the USA there is definately that view. But it is deeper than just the funding agencies.

I’m not sure it is print vs non-print – even if that is effectively what it ends up being. It is prestige. And at the moment the big ones happen to be all print ones (with electronic editions, of course).

The grant agencies also rely on peer reviews, and if your peers think the same way, the grant agencies get that feedback in the peer review reports.

So, it will be a difficult cycle to break, I’m afraid. But it is like arXiv – it will start slowly and gain traction.

11. Dave Bacon - June 1, 2009

Hey Gordon,

Thanks for the kind words. It was fun writing arXiview.

I totally agree that the PDF is really really annoying on the iPhone. I actually surprised by how much I time I do spend reading papers using arXiview on the bus, considering the print size.

I’ve thought a bit about how one might get around this for the arXiv. I don’t think it’s viable to parse the tex from the arXiv on the iPhone because of the computing power needed, but I need to run some tests to confirm this suspicion. A more viable architecture, I think, would be to do this parsing on my own server and then feed this to the app.

On the other hand, stanza seems to be able to take pdfs and reform their text size on the iPhone. So maybe I need to think a bit harder about that path.

12. Dave Bacon - June 1, 2009

Ah, actually it seems stanza does the conversion on a desktop and not on the iPhone. My bad.

13. gordonwatts - June 1, 2009

Dave! Really? Fun!? I’ve looked at objective-c and just about ran screaming from the room. It is such an old language. Or were you able to use something more modern?

Yeah – I think the reformatting would have to be done on the fly, or you would have to translate it into html (see the post after this one). But I think it should be generally solved. I’ve got a 24 inch monitor at home. It shoudl fit 6-8 columns when I read, not just one or two pages facing. I think with some thought it should be possible to do both with the same solution. The problem is the math. 🙂

14. Dave Bacon - June 2, 2009

Hey SmallTalk is fun! I guess I’ve spent too much time here in the CS department: now I think learning any new language is fun. For objective C the barrier is so low (supposing you know any modern OO language) that it wasn’t that bad. The iPhone SDK is fairly well thought out with only one or two annoying issues that I ran into. Plus there is now a HUGE amount of online problems/solutions available for programming the iPhone which is a fantastic resource.

15. gordonwatts - June 2, 2009

SmallTalk! 🙂 Well, I guess. I agree- new languages are fun. But OC is an old old one! But you are right, those online resources are key. And if you have a good library behind you it almost doesn’t matter what language you are writing in.

16. PDFs are as popular as James Purnell at a Gordon Brown appreciation society meeting | News in brief - June 5, 2009

[…] are rumbling. My post last week reviewing academic paper readers for the iPhone sparked a blog post from Gordon Watts, professor of physics at the university of Washington. PDFs are pretty poor for screen reading, he […]

17. ScienceBlogs Channel : Physical Science | BlogCABLE.COM - July 3, 2009

[…] a similar note, I highly recommend Life as a Physicist who discusses issues with reading pdfs on small mobile displays. I’ve been playing around with some ideas for how […]

18. ScienceBlogs Channel : Technology | BlogCABLE.COM - July 3, 2009

[…] a similar note, I highly recommend Life as a Physicist who discusses issues with reading pdfs on small mobile displays. I’ve been playing around with some ideas for how […]

19. Jon - July 3, 2009
20. ken - September 15, 2009

I’m using Stanza iPhone / iPod Touch app, when is coming to change the font size of a PDF document to make it more readable.

21. Gordon Watts - September 15, 2009

According to their web page you can already do that. I’ve not tried it before – but sounds interesting!

22. ArXiview 1.2 for iPhone OS 3.0 Out - The Quantum Pontiff - July 15, 2010

[…] a similar note, I highly recommend Life as a Physicist who discusses issues with reading pdfs on small mobile displays. I’ve been playing around with some ideas for how […]

23. George Alverson - October 4, 2010

Hi Gordon,

Those interested in presenting arXiv documents on the web may find this article from 2006 on the preservation of TeX/LaTeX documents of use. LaTeX was designed to separate content from presentation, and the author, Ian Barnes, of the Australian National University, discusses transformation of TeX documents to other formats for web display.

This wouldn’t help for those documents submitted to the arXiv without TeX source, nor would it help with the question of who would actually perform the generation of the web format. You wouldn’t want to have the entire TeXLive distro in your ipod app, after all, plus you’ve still got all the figures…

Cheers,
George

24. George Alverson - October 4, 2010
25. Gordon Watts - October 7, 2010

Geroge -thanks. I’ve seen that – there is a blogger out there at UW AUstin that uses this latex -> mathml conversion to do this inline in his blog.

If I were to do this, I’d put a front-end in front of archixv that would download the tex files and process them and then cache them locally (100 bucks for a 2TB disk right now).

I really like the idea of the docbook and the mathml – especially if some way can be found to preserve meta-data (section headings, title, abstract, etc.). One could do some pretty amazing display things with something like that I would think.


Leave a comment