jump to navigation

arXiv Accepts OOXML!? February 28, 2008

Posted by gordonwatts in computers.

image This just floored me. Dave mentioned this in a comment in one of my last posts. It looks like the major pre-print archive accepts the OOXML format now. I thought they only accepted PDF and .tex submissions.

This makes be beg the question (though it might not do it to you), but is there a tex2ooXML converter (some found here, but one or two I tried didn’t seem to work)? Hear me out before you write me off as crazy (as I think some of you already have). The reason that interests me is that PDF is a static page format. I now read almost all my papers on a screen. The size of the screen and its resolution rarely match with letter. For example, my last laptop purchase was delayed 6 months because I needed a high resolution screen so I can read my PDF’s full screen when it was rotated on its side – in tablet mode. The increased screen resolution, btw, makes PDF’s look a lot better – especially when the computer modern font is used (sorry, couldn’t resist that dig).

What I’d love is to be able to re-flow the documents on the fly to adjust to the screen size. Now, what I could do is run latex in the background. The upside is that no format translation is required and TeX is certainly up to the task. The downside is that automation of this isn’t trivial – some programing work would be required (column sizes, screen sizes and resolutions, font rasterization?). On the other hand, Word will do re-flow and columns automatically, as well as resizing fonts. The downside is you would loose fidelity in the translation.

The place this I’m particularly interested in this is these new MID devices – slow CPU’s and relatively low resolution (and small screen sizes). They have relatively weak CPUs. I’ve seen Word reflow documents on these devices – it performs ok on a 100 page document (not ideal). I have no idea how long it would take to regenerate a 20 page latex paper (which would then be 100 pages or something). Would it be fast enough to be usable?



1. Joseph - February 28, 2008

So, if they’re doing OOXML, where’s the ODF support? It’s only fair IMHO.

2. gordonwatts - February 28, 2008

I would have thought they were doing that already. Does everyone release templates in ODF now? I’ve tried OpenOffice but never been very satisfied with it. I suppose it won’t matter much longer as there are all sorts of converters being written now that both specs are public. Soon both programs should be able to save and load things in either format.

3. gordonwatts - February 28, 2008

Huh. ODF is not listed as supported. Here is the list of formats: http://arxiv.org/help/submit – they accept HTML. Interestingly, they prefer PDF over the OOXML, which is crazy I think as the OOXML and latex are flexible formats and PDF isn’t (i.e. if you have to do some reformatting or extract a figure you can, in PDF it is much more difficult).

They must have had that question. If they haven’t, then I’m sure they are being pounded with it now. Sort of like what happens when someone insults the Mac on the net. ๐Ÿ™‚

4. Joseph - February 28, 2008

No, very few people have ODF templates for their stuff. As a Linux user, I find it incredibly annoying, particularly when secret formats (e.g. MS Office pre-OOXML) are required (the reasons behind the annoyance with secret formats will have to wait if you’re interested, because I have to collect my thoughts and put together a fully coherent post, which is more involved than I can be atm due to the looming March Meeting). At least arxiv still does LaTeX, though. Better than many things I’ve heard of (personally, doing a paper in a word processor seems absurd after learning LaTeX, but that’s a subject for another post ๐Ÿ˜‰

Regardless, at least it’s a (semi-) open format, although I sincerely hope that more scientific ventures standardize on ODF. With ODF, you can join the committee and at least have a reasonable chance of changing the format to be truly interoperable and useful for our purposes, as opposed to being limited to querying Microsoft for more information to flesh out their spec (which you have minimal control over). If more scientific organizations were to join the ODF committee, we might finally have a shot at having a format which isn’t annoying for doing scientific work.

5. gordonwatts - February 28, 2008

Joseph – interesting. I learned LaTeX years – 10 years? – before I learned anything else (like Word). If I could figure out how to make Word work on figures and equations I’d probably never go back.

BTW, MS has now published the binary specification – so it is no longer secret. But this just happened a few weeks ago (trying to head of the EU fine… too bad!).

You can join the committee, huh? ๐Ÿ™‚ Good luck with that. Also, there is quite a bit of stuff out on the web talking about which format is better specified (ODF: “OOXML is so complex you can’t use it and too-boot, you’ve stacked all the standards bodies!”, OOXML: “Uh, did you guys actually specify how you write formuli in your spreadsheet? And did you actually have *any* scrutiny when you were standardized!?”). I have no idea how it will turn out (i.e. will OOXML get standardized by ISO), but certianly the mess should improve both specifications. Frankly, as long as the formats are publically specified, I don’t care too much.

6. Joseph - February 28, 2008

Regarding “Sort of like what happens when someone insults the Mac on the net.” I have to take issue with this. Certainly, if you criticize a Mac, the RDF kicks in and the Mac zealots start a-flamin’. However, as a veteran of many flamefests, I can tell you that this behavior is not unique to Apple. Indeed, every platform has its zealots, including Linux and “even” Windows. Criticizing any platform will potentially result in flamefests; which conversation will devolve into a flamefest is entirely dependent upon the specific people involved and not their platform of choice, despite prevalent stereotypes to the contrary. Although I have noticed that those who are more intimately involved with the software itself can be some of the most rational about it (although there are always going to be exceptions), likely owing to the fact that they are all to painfully aware of the warts in it. I particularly notice it with the hardcore Linux coders (and have gone through the progression myself, although I’m still far from being able to be called nonpartisan) as they tend to be much harsher with themselves than I’ve seen proprietary coders being. I don’t know whether this is due to my more extensive experience with Linux coders (specifically GNOME) than with proprietary vendor coders or whether it is quite true in general.

7. gordonwatts - February 28, 2008

Joseph – what you say is correct — but the apple folks have a reputation, and thus it is easy to pick on them. I’ve used Apple extensively in my past (was all I used for about 5 years of my life), and I’m a heavy Linux and Windows user now. I don’t code at the OS level when I can help it (the best I’ve done is some driver debugging in Linux). The only community I’ve ever really been a member of is the Apple one, and that was before these flamefests started. So now I tend to stay out of it – I know what I like, and I post about it sometimes. ๐Ÿ˜‰ But the FUD on all sides over this format war is just out of hand. I don’t understand how people on both sides have so much time on their hands to post this stuff. Then again, here I am maintaining a blog…

8. Joseph - February 28, 2008

Certainly LaTeX and word processors have their warts, both in the particular implementation as well as in the general system, but I personally find that having the instructions inline with the content is of significant benefit.

“so it is no longer secret.” Yes. Sort of. And its value may be limited by the restrictions placed on it (particularly for GPL software, which their currently biggest competitor is).

“trying to head of the EU fineโ€ฆ too bad!” IIRC, the EU fine was regarding things outside of the MS Office formats, specifically secret communications protocols between the Microsoft client and server operating systems. I wish I could get Microsoft’s breaks–roughly 1/12 of the a yearly *profit* in exchange for not complying with a government order for about 4 years? Dang.

Regarding joining the OASIS ODF committee–anyone is free to join. Why the sarcastic “good luck with that”?

Regarding zealots’ posturing on the intarwebs: I agree that there’s too much posturing and too little factual content.

9. Joseph - February 28, 2008

“I donโ€™t understand how people on both sides have so much time on their hands to post this stuff. ” Well certainly some are white propaganda (e.g. Rob Weir and Brian Jones). There is certainly black propaganda going on as well (see for example Microsoft’s internal documentation that came out in the Comes lawesuit) about buying yourself journalists/experts to give biased (but not overtly so) and stacking a panel). And then there are the zealots who are either there because they’re emotionally committed (me ๐Ÿ˜‰ or who are just spending their free time.

10. Joseph - February 28, 2008

As a FOSS developer, I’d like to latch on to “I know what I like and I post about it sometimes.”

What do you like about the software you use? Please be as specific as possible about features, behaviors, etc. that you find useful (or annoying ๐Ÿ˜‰

11. gordonwatts - February 28, 2008

So, RW and BJ are two I don’t get — where do they have the time to write the posts they write? RW, in particular, writes these long detailed posts! I guess when you are part of the standards body you can write that stuff without having to think about it. Also, frankly, from the outside, I just don’t get what the war is over – or at least, I don’t see it worth a fight the size that it has been made by MS, IBM, and probably others I’m not very aware of (even Google has now sent around letters).

So — the difference between Word and latex is easy: a good UI is like a manual. In latex if I can’t remember how to do something I have no recourse but to try and search the local tex macro code or search the web. It took me forever, for example, to find out how to make a paragraph dissapear in one version of my document or appear in another (the ifthanelse or comment package will do it) – mostly because I didn’t know how to tell the search engines what to look for. With a good UI you hunt around a while and you find what you need. I’m not holding up Word as the best UI ever, but it is quite good.

Things I don’t like about UI programs is they all tend to grab more resources. I use Outlook as my email/calendar/task/computer sync program (I have about 4 gigs of stuff in it right now), and it can really be quite slow, especially when search decides to index all of my email.

I think a good UI is a more natural way to interact with a computer. However, it is much harder to design a good UI than it is to design a good command line interface. The other downside to a UI is it isn’t composable the way a good set of commands are (i.e. pipes in most *nix shells). I wonder if anyone has solved that problem yet?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: