jump to navigation

ROOT, Python, and .NET February 25, 2007

Posted by gordonwatts in computers.
trackback

Nothing is ever fast enough, except when it takes too long to code…

I’ve long been a fan of the python language. As I’ve said previously, I think C++ is a dead language: there is very little research going on and proposed language improvements are quite limited. Languages like Python, C#, and Java (along with others) are all adding features at an impressive rate that, I think, make them much more productive languages.

I’ve used python for years. And I’ve used ROOT for years. Finally, Wim, down at LBNL, married the two. Previously, the only way to use ROOT was the C++ interpreter, called CINT, bundled with ROOT. With pyROOT, however, one could use python. It was fantastic. I wrote lots of analysis code in python. And I got it done so much faster.

Only one problem — it is slow! The interface between python and ROOT has a fair amount of work to do. Further, it is quite flexible and must make a number of decisions at runtime. The result is that each time you call into ROOT from python some extra processing must occur.

I’m also a big fan of C#. However, calling into real C++ code (i.e. ROOT) from C# is difficult. You have to leave the interpreted world (i.e. the CLR) and go into the C++ world. Other than writing managed C++ this is a big pain. I’ve often generated small stubs of code to enable some small project. It occurred to me just before going on vacation to Vancouver that this process could be automated: ROOT includes full class meta-data.

Add to this there is now an implementing of Python 2.4 on the CLR — IronPython. I could do a apples-to-apples comparison: normal python vs CLR python!

So, that is what I did on my short vacation trip to Vancouver, while Paula was asleep. I wrote some code that would automatically produce wrapper files for a small set of ROOT classes. Enough to create a file, fill a histogram, and close the file. And then I timed them with ROOT’s TStopwatch class. Here are the results:

Raw C++: 7.2 seconds

C#: 11.3 seconds

Raw Python: 143 seconds

IronPython: 64 seconds

Wow — IronPython is a x2 faster. Note this is a statement about how one moves from the python to the C++ world — not the overall speed of IronPython. But it is interesting. Here we are going from Python -> .NET CLR -> C++ and it is x2 faster than going from Python -> C++. Now, I’m willing to bet you good money the reason for this difference is the Python implementation of the ROOT interface is much more flexible than the .NET one I’ve created: the regular python interface has so-called late-binding. That is — it can handle any object you give to it, my .NET translation can only handle objects that were previously converted. If pyROOT implemented that method that I suspect it would be hugely faster than it is now. Hmmm — one can build .NET objects on the fly – I wonder what it would be like if one automated that translation? That is a kind-a cool idea, isn’t it?🙂

If you are curious about the IronPython source code, here it is:

import clr
clr.AddReferenceToFileAndPath(“G:\\users\\gwatts\\Documents\\Visual Studio 2005\\Projects\\ROOT Class Interface Maker\\Test – TH1F\\bin\\Release\\ROOTDotNet.dll”)

from ROOT import TFile, TH1F, TStopwatch

sw = TStopwatch ()

sw.Start(1)

f = TFile (“bogus.root”, “RECREATE”, “”, 1)

h = TH1F (“hi”, “there”, 10, 0.0, 10.0)

for i in xrange(100000000):
h.Fill(9)
f.Write(“0”, 0, 0)
f.Close(“”)

sw.Stop()
sw.Print(“”)

Those of you how know pyROOT will note that other than the first two lines the code is basically the same (note: I’ve not implemented default arguments, which is why every single argument is spelled out).

Now, the only thing left in this proof-of-principle is that I can open the file I wrote out, grab the TH1F object, and see if I can get the contents of bin #9. This is complex because TFile::Get returns an anonymous TObject and it has to be turned into a TH1F interface. This is tough in .NET because it doesn’t support multiple inheritance, something that ROOT takes full advantage of.

Comments»

1. Cheat Code » ROOT, Python, and .NET - February 25, 2007

[…] post by gordonwatts and software by […]

2. superweak - February 25, 2007

I’m a great fan of Python and PyROOT, and know very little about .NET or IronPython (though I should learn sometime) — so pardon my dumb questions —

1. I assume the “normal Python” results don’t feature something like psyco?
2. Is it typical for IronPython to be 6 times slower than C# even though they’re calling the same wrapper code?

3. gordonwatts - February 25, 2007

@superweak:

1. Yes — nothing like psyco. What is that? I’m not familiar with it. This is a build of python 2.4.2 downloaded from the python web site. Nothing extra was downloaded (other than the ROOT build).
2. So we have to be careful here: this isn’t a comparison of the speed of python to IronPython or to C# or C++. Python and IrionPython are dynamic languages. C# and C++ are both typed languages. This means that a lot of the work that the C# and C++ compilers do a compile time has to be done at run time in python and IronPython. That said, I don’t think it is very suprising. The canonical wisdom around here is that you use Python to stich together comput intensive tasks and write those tasks in C++ (because it is fast). But I bet there is a speed comparison between C++ and python somewhere on the net. Finally, I can’t find it now, but I remember the guy writing IronPython posting some speed tests and comparisons to python. IronPython was faster in some and slower in others — but never by a lot. So I susepct the speeds are, for the most part, the same. Finally, you can see the “hit” that moving from the .NET world to C++ takes — that is the difference (mostly, I suspect) between the .NET and C++ programs I have listed up there. So it is about 4 seconds or so (once compiled .NET and C++ are very close for something like this). Again, I don’t know the bench marks, but they must be out there.

4. superweak - February 25, 2007

Psyco is something like a JIT for Python. One of its great features is it figures out what types you’re actually using at runtime and emits specialized code for those cases, so you don’t take that hit every time through the loop. I routinely get 2-10x speedups with it, at the expense of a lot of memory use. I find it brings PyROOT up to CINT speeds or better. (Only runs on x86 though.)

My understanding is that the CLR does JIT compilation of IronPython’s bytecode?

I agree with the canonical wisdom but I place the bar for “compute intensive” a lot higher — I happily analyze 10 million event ntuples using PyROOT, since my jobs tend to be I/O bound anyway. Track finding would of course be a different story.

Anyway have fun with IronPython! (Now I just have to find me one of those mysterious Ruby interface users. Maybe they’re all at KEK.)

5. gordonwatts - February 25, 2007

Ah, nice! I’ll give that a try and see how that compares to the current numbers i have. I was thinking most of the CPU time was spent in the pyROOT interface I wouldn’t have thought you’d get much speed up. But I’ll test it out.

Yes — the CLR does JIT on IP’s bytecode. But if pysco emits special code that is customized by type, it may be able to do a better job than what I’ve read of IP’s method. I wonder if there are speed comparisons out here?

Yeah — that sounds pretty nice. Have you ever tried out proof? Now that I have some duel core machines I should see if that improves the speed (that was the recommended way to handle dual core machines from several people in the root community).

6. Gordon Chalmers - February 26, 2007

Check this article out Gordon: you got yourself supersymmetry.

6. physics/0605114 [abs, ps, pdf, other] :
Title: Supersymmetry and B_s, DO, and Aleph i
Authors: Gordon Chalmers
Comments: 14 pages, LaTeX
Subj-class: General Physics

7. superweak - February 27, 2007

Never tried proof (though the idea sounds nice) … we were all excited by our local computers reporting dual cores, then sighed when we realized it was HyperThreading, not an actual second core.

8. Jeff - March 5, 2007

Hi Gordon and superweak,

There is a bench marking paper comparing heavily templated c++ with generics in both java and c# .net 2.0. Using the scimark 4.0 bench marking code (with the usual caveats); java did a little better than c++. I don’t want to quote the c# comparison because it used the *beta* .net 2.0 and was therefore not optimized.

Also, you might want to mention the .net framework tool ngen.exe. It compiles CIL .net byte code down to native code…ANY .net byte code whether it originated from VB, C#, C++, Ironpython or fortran .net for that matter (many languages have been ported to .net including Eiffal). Thus you can avoid the JIT delay in your programs.
See http://msdn2.microsoft.com/en-us/library/6t9t5wcf(VS.80).aspx

For a discussion of Ironpython’s architecture see,
http://msdn.microsoft.com//msdnmag/issues/06/10/clrinsideout/default.aspx

A similar facility was available for python called py2exe (I don’t know the present status of it though).

So, I would think an accurate flat-out flop comparision of Ironpython to python would be ngen.exe compiled ironpython to py2exe compiled python.

As far as c++ vs. c# or c++ vs. MANAGED c++ the same arguments apply. So, you can remove the JIT delays however you still have some delays that are present in all garbage-collected run-times.

This would be very interesting as .net will optimize across files AND produces processor-specific code.

Finally, as a .net fan you maybe interested in
cs-script: http://csscript.net/

Since hep phys is linux dominated, you may wish to look at Grasshopper: An asp.net cross-compiler to java byte code, i.e., run asp.net 2.0 on tomcat or geronimo or any j2ee compliant server.
See http://www.mainsoft.com/index.aspx

Cheers and keep having fun…Jeff

9. gordonwatts - March 5, 2007

Jeff — wow — thanks for all your pointers! I have heard of ngen and always ignored it — I have always thought the JIT overhead was quite small. However you are correct — I should test it.

On the other hand, this test was meant to test how fast one could call a C++ program from python, C#, and IronPython. The code I wrote is, as you can see, very simple — so I doubt there was much garbage collecting going on in the .NET code (though perhaps in the IronPython code). I’m also curious to know how well mono does in something like this. I’m willing to be, however, that the translation between the managed world and the C++ world is totally different (but I have no idea).

Can you point me to this paper that does these speed comparisons? I’m a bit suprised that Java did better than compiled C++.

I’m also glad to heard about the java byte code translators. I’ve been wondering if it was because of the web sites I looked at or something else that made it seem like a lot of the language development was going on for the .NET CLR rather than the the java engine. Good times.

P.S. Sorry your comment didn’t appear right away — it got labeled as spam.

10. JefferyB - March 6, 2007

Hi Gordon,

I figured the links in my previous post might get it sent to the spam can.

Here is the ref to the paper:
http://www.csd.uwo.ca/~watt/pub/reprints/2005-synasc-scigmark.pdf

I too was a bit surprised by the results. Note that c++ w/templates beat the generic version of java. However, “specialized” java beat “specialized” c++.

I think the progenitor paper that launched this study can be found here:
http://msdn.microsoft.com/msdnmag/issues/04/03/ScientificC/

Note the Dragan and Watt paper wanted to compare *compilers*, NOT languages. Therefore, they coded c# exactly as if it were java which, although useful for their intended study, is nonsensical for real world c# apps. C# structs, including generic structs, can go on the stack: no boxing/unboxing. C# has “unsafe” code blocks where you can “pin” objects in memory and do pointer arithmetic etc.

In anycase, neither the Dragan paper or Gilani in the microsoft column employed the above but did mention these coding techniques.

As far as I’m aware, C# (in theory) should be able to outperform java. I’m not sure why it is not pursued more rigorously.

Java generics are all objects because the design team didn’t want to break/alter the JVM. .net simply releases a new run time .net 2.0, 3.0 (now 3.5).

If interested, a nice (new) description of generics (from Essential C# 2.0) can be found here:
http://mark.michaelis.net/EssentialCSharp/Generics_ch11.pdf

See p 39. in above chapter and
http://www.artima.com/intv/generics.htm
for c# java generic comparision.

One more package you maybe interested in:

ZedGraph. It’s on sourceforge. It’s a graphing/charting package in c# (winforms and asp.net) and it has a nice strip chart component as well.

Take care…Jeff

11. superweak - March 14, 2007

Just to clear something up on py2exe: that’s really just a bundling of the Python interpreter with Python bytecode for distribution on Windows systems (where the typical user doesn’t have Python installed). As far as I know the only way to take CPython down to native code is psyco, where you will incur JIT overhead.

12. Gordon Watts - March 15, 2007

Thanks — there is a awful lot of information here and it will take me some time to digest. Term is over now, and I have some “spare” time… I hope to get to it.

13. A nice article on ROOT, Python, PyROOT, C++, C# and IronPython « Riccardo-Maria Bianchi - April 28, 2007
14. Wim Lavrijsen - May 25, 2007

Hi,

just for the record:

o) late-binding or providing a fixed set of bindings makes, by the very nature of python classes (which are always dynamic) no difference

o) TH1F::Fill() is heavily overloaded, so the actual method dispatching implementation makes all the difference (really, nothing else counts)

o) Psyco can does not deal with C/C++ extension calls (as all of PyROOT is), so it won’t optimize it, although it could be “taught” to understand PyROOT dispatching

o) Getting a factor of 2 extra out of PyROOT on your example above is rather easily accomplished by adding these lines at the top:

import ROOT
ROOT.SetSignalPolicy( ROOT.kSignalFast )

which remove a safety net that on an averate program is

15. Wim Lavrijsen - May 25, 2007

Hi again,

my last comment got cut off (apparently it doesn’t like the sign for less than). Anyway, I intended to say that on an average program the safety net has an overhead of less than 10 per cent, whereas it is a complete killer in tight loops like above.

Cheers,
Wim


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: