Статьи

Как Java 8 обрабатывает JavaScript — взгляд на компилятор New Nashorn

mainImage

В Компиляции лямбда-выражений: Scala против Java 8  я рассмотрел, как Java 8 и Scala реализовали лямбда-выражения. Как мы знаем, Java 8 не только вносит улучшения в компилятор javac, но и представляет совершенно новый — Nashorn.

Этот новый движок предназначен для замены существующего в Java интерпретатора JavaScript Rhino. Это должно вывести JVM на передний план, когда дело доходит до выполнения JavaScript на скорости, прямо там с V8 мира (надеюсь, мы наконец-то пройдем эту машину, чтобы ковровить  штуку  :)) Итак, я думал, что это будет хорошее время, чтобы познакомить Nashorn с миксом, заглянув под капот и посмотрев, как он компилирует лямбда-выражения (особенно по сравнению с Java и Scala).

Лямбда-выражение, которое мы рассмотрим, похоже на то, которое мы тестировали на Java и Scala.

Вот код —

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("nashorn");

String js;

js = "var map = Array.prototype.map \n";
js += "var names = [\"john\", \"jerry\", \"bob\"]\n";
js += "var a = map.call(names, function(name) { return name.length() })\n";
js += "print(a)";

engine.eval(js);

Кажется, ты невиновен. Но просто подожди и посмотри …

Получение к байт-коду

Наша первая задача — получить действительный байт-код, который видит JVM. В отличие от Java и Scala, чьи компиляторы являются постоянными (т.е. генерируют файлы .class / jar на диск), Nashorn компилирует все в памяти и передает байт-код непосредственно в JVM. К счастью, у нас есть  Java-агенты,  которые могут нам помочь. Я написал простой Java-агент для захвата и сохранения полученного байт-кода. С этого момента это простой javap для печати кода.

If you remember, I was pretty happy to see how the new Java 8 compiler uses the invokeDynamic instruction introduced in Java 7 to link to the Lambda function code. Well, with Nashorn they really went to the races with it. Everything now is completely based on it. Take a look below.

Reading the bytecode

invokeDynamic. Just so we’re all on the same page, the invokeDynamic instruction was added in Java 7 to allow folks writing their own dynamic languages to decide at runtime how to link code.

For static languages like Java and Scala, the compiler decides at compile time which method would be invoked (with some help from the JVM runtime for polymorphism). The runtime linking is done via standard ClassLoaders to lookup the class. Even things like method overload resolution are done at compile time.

Dynamic vs. static linkage. Unfortunately, for languages which are more dynamic in nature (and JS is a good example) static resolution may not be possible. When we say obj.foo() in Java, either the class of obj has a foo() method or it doesn’t. In a language like JS that will depend on the actual object referenced by obj at runtime — a nightmare scenario for a static compiler. A compile-time approach to linking in this case just doesn’t work. But invokeDynamic does.

InvokeDynamic enables deferring of linkage back to the writers of the language at run-time, so they can guide the JVM as to which method they would like to call, based on their own language semantics. This is a win-win situation. The JVM gets an actual method to link to, optimize and execute against, and the language makers control its resolution. Dynamic linking is something we’ve had to work hard to support in Takipi.

How Nashorn links

Nashorn really makes effective use of this. Let’s look at our example to understand how this works. Here’s the first invokeDynamic instruction in the Lambda code, that’s used to retrieve the value of the JS Array class —

invokedynamic 0 "dyn:getProp|getElem|getMethod:prototype":(Ljava/lang/Object;)Ljava/lang/Object;

Nashorn is asking the JVM to pass it this string at runtime, and in exchange it will return a handle to a method which accepts an Object and returns one. As long as the JVM gets a handle to such a method, it can link.

The method responsible for returning this handle (also known as a bootstrap method) is specified in a special section in the .class file which holds a list of available bootstrap methods. The 0 value you see is the index within that table of the method which the JVM will invoke to get the method handle to which it will link.

The Nashorn folks did a very cool thing in my opinion, and instead of writing their own library for resolving and linking code, they went ahead and integrated dynalink, an open source project aimed at helping dynamic languages link code based on a unified platform. That’s also why you see that “dyn:” prefix at the beginning of each string.

The actual flow

Now that we’ve gotten a hang of the approach used by Nashorn, let’s look at the actual flow. I’ve removed some of the instructions for brevity. The full code can be found here.

1. This first group of instructions loads the array map function into the script.

//load JS array
invokedynamic 0 "dyn:getProp|getElem|getMethod:Array":(Ljava/lang/Object;)Ljava/lang/Object;

//load its prototype element
invokedynamic 0 "dyn:getProp|getElem|getMethod:prototype":(Ljava/lang/Object;)Ljava/lang/Object;

//load the map method
invokedynamic 0 "dyn:getProp|getElem|getMethod:map":(Ljava/lang/Object;)Ljava/lang/Object;

//set it to the map local
invokedynamic 0 #0:"dyn:setProp|setElem:map":(Ljava/lang/Object;Ljava/lang/Object;)V

2. Next we allocate the names array

//allocate the names array as a JS object
invokestatic jdk/nashorn/internal/objects/Global.allocate:([Ljava/lang/Object;)Ljdk/nashorn/internal/objects/NativeArray;

//places it into names
invokedynamic 0 #0:"dyn:setProp|setElem:names":(Ljava/lang/Object;Ljava/lang/Object;)V

invokedynamic 0 #0:"dyn:getProp|getElem|getMethod:names":(Ljava/lang/Object;)Ljava/lang/Object;

3. Find and load the Lambda function

//load the constants field for this script compiled and filled at runtime by Nashorn
getstatic constants

//refer to the 2nd entry, where Nashorn will place a handle to the lambda code
iconst_2

//get it from the constants array
aaload

//ensure it’s a JS function object
checkcast class jdk/nashorn/internal/runtime/RecompilableScriptFunctionData

4. Call map with names and the Lambda, and place the result in a

//call the map function, passing it names and the Lambda function from the stack

invokedynamic 0 #1:"dyn:call":(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljdk/nashorn/internal/runtime/ScriptFunction;)Ljava/lang/Object;

//put the result in a
invokedynamic 0 #0:"dyn:setProp|setElem:a":(Ljava/lang/Object;Ljava/lang/Object;)V

5. Find the print function and call it on a

//load the print function
invokedynamic 0 #0:"dyn:getMethod|getProp|getElem:print":(Ljava/lang/Object;)Ljava/lang/Object;

//load a
invokedynamic 0 #0:"dyn:getProp|getElem|getMethod:a":(Ljava/lang/Object;)Ljava/lang/Object;

// call print on it
invokedynamic 0 #2:"dyn:call":(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;

The lambda function itself is compiled and placed in the same class as the script as a private function. This is very similar to what we’ve with Java 8 lambdas. The code itself is straightforward. We load the string, find its length function and call it.

//Load the name argument (var #1)
aload_1

//find its length() function
invokedynamic 0 "dyn:getMethod|getProp|getElem:length":(Ljava/lang/Object;)Ljava/lang/Object;

//call length
invokedynamic 0 "dyn:call":(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;

//return the result
areturn

Bonus round — the final bytecode

The code we’ve been dealing with so far isn’t really what the JVM will execute at run-time. Remember, each invokeDynamic instruction will be resolved to a physical bytecode method which the JVM will then compile into machine code and execute.

To see the actual bytecode which the JVM runs I used a simple trick. I wrapped the call to length() with a simple Java method call in my class. This enabled me to place a breakpoint and see the final call stack which the JVM executes to get into the Lambda.

Here’s the code —

js += "var a = map.call(names, function(name) {
return Java.type("LambdaTest”).wrap(name.length())
})";

Here’s wrap —

public static int wrap(String s)
{
return s.length();
}

Now let’s play a game. How many frames will be in that stack? Think about it for a second. If you guessed less < 100 — you owe me a beer. The full call stack can be found here.

The reason why that is so is very interesting as well, and that’s a story for whole new post coming down the road.

Originally posted in Takipi blog