VMs, architectures and snakes eating their tails
TL;DR: PyPy is insanely cool and you should go check it out
When this mad idea first came to my mind I thought it would be cool to create a compiler (which would have been awesome for some obvious reasons), but I’ve decided to stick to interpreters for this time mainly because for some reason I like them more and they’re more flexible (everything exists in runtime).
The next step was to decide how to structure inori and, at first, I kinda wanted to follow the path Lua chose a while ago (so basically a really really small C VM with easy embeddability) but after some digging I discovered PyPy.
So what is it exactly? First of all it’s a Python implementation written in RPython and it’s also a translation toolchain that makes it an ideal platform to implement VMs (and fast ones).
It basically works this way: RPython is a Python subset on which you can perform static analysis and thus translate it to multiple backends (currently C, JVM and CLI) on top of which the toolchain automatically adds a meta-tracing JIT compiler and a garbage collector (all for free, no strings attached!). As some of you may know, a tracing JIT is a compiler which traces the path your code follows, optimizes hot-loops (LOCs that are executed frequently) and compiles them to native code for later execution, but meta-tracing JIT? Yeah, it’s one of my favourite parts.
When you have an interpreter and you want to add a JIT compiler you make a lot of assumptions, based on the object model of your language and other things in order to make it blazing fast. These choices make it really hard to implement a platform that is general purpose and language independent, so how did these guys do it?
The trick is to consider how an interpreter works… it’s basically a GIGANTIC switch that loops as long as the program runs. This means that on the same input, your interpreter is going to choose the same path. This is exactly what PyPy’s toolchain exploits, as it doesn’t care about what your programming language actually is structured, as all it traces and optimizes, providing a few hints, is the code the interpreter executes while running your program (hence the name meta-tracing)! Obviously since the interpreter is written in a language already known (RPython) we can do all sort of optimizations and they’re totally transparent to the implementation of the language :D
If you’re interested you should go and check out topaz which is an awesomely written implementation of Ruby written in RPython and compiled to native code (it outperforms MRI by ~4x)
Some articles i really found enlightening
The Impact of Meta-Tracing on VM Design and Implementation
Fast Enough VMs in Fast Enough Time