Статьи

Байтовый код

Эта запись в блоге является первой из серии статей, состоящей из нескольких частей, в которой обсуждаются преимущества разработки байт-кода и его применения. Разработка байтового кода включает создание нового байтового кода в форме классов и модификацию существующего байтового кода. Байтовый код имеет много приложений. Он используется в инструментах для компиляторов, перезагрузки классов, обнаружения утечек памяти и мониторинга производительности. Кроме того, большинство серверов приложений используют библиотеки байт-кода для генерации классов во время выполнения. Проектирование байт-кода используется чаще, чем вы думаете. На самом деле, вы можете найти популярные библиотеки разработки байт-кода, включенные в JRE, включая  BCEL  и  ASM, Несмотря на его широкое использование, кажется, очень мало университетских или колледжских курсов, которые преподают разработку байт-кода. Это аспект программирования, который разработчики должны изучать самостоятельно, а для тех, кто этого не делает, он остается загадочным черным искусством. Дело в том, что библиотеки разработки байт-кода облегчают изучение этой области и являются входом в более глубокое понимание внутренних возможностей JVM. Цель этих статей — предоставить отправную точку, а затем задокументировать некоторые продвинутые концепции, которые, мы надеемся, вдохновят читателей на развитие собственных навыков.

Документация

Есть несколько ресурсов, которые каждый, кто изучает разработку байт-кода, должен всегда иметь под рукой. Первая — это  спецификация виртуальной машины Java  (к сведению, на этой странице есть ссылки как на  язык, так  и на   спецификации JVM ). Глава 4,  Формат файла класса  является обязательным. Вторым ресурсом, который полезен для быстрого ознакомления, является  страница Википедии, озаглавленная «  Списки инструкций байт-кода Java» . С точки зрения инструкций байтового кода, это более кратко и информативно, чем сама спецификация JVM. Другим ресурсом, который пригодится новичку, является таблица формата внутреннего дескриптора для типов полей. Эта таблица взята непосредственно из спецификации JVM.

BaseType  Character Тип интерпретация
В байт подписанный  байт
С голец Кодовая  точка символа Unicode  в базовой многоязычной
плоскости, закодированная с помощью UTF-16
D двойной значение с плавающей точкой двойной точности
F поплавок значение с плавающей точкой одинарной точности
я ИНТ целое число
J долго длинное целое
L <ИмяКласса>; ссылка экземпляр класса <ClassName>
S короткая подписанный короткий
Z логический правда или ложь
[ ссылка одно измерение массива

Большинство примитивных типов полей просто использовать первый начальный представлять тип типа поля в внутренне (то есть я для междунар, F для поплавка и т.д.), однако,  долгое время  является  J  и  байтами  являются  Z . Типы объектов не интуитивно понятны. Тип объекта начинается с буквы  L  и заканчивается точкой с запятой. Между этими символами находится полное имя класса, каждое имя которого отделено косой чертой. Например, внутренним дескриптором для типа поля  java.lang.Integer  является Ljava / lang / Integer; , Наконец, размеры массива обозначаются символом «[». Для каждого измерения вставьте символ «[». Например, двумерный массив int будет
[[Iтогда как двумерный массив java.lang.Integer будет  [[Ljava / lang / Integer;

Методы также имеют внутренний формат дескриптора. Формат:  (<типы параметров>) <тип возврата> . Все типы используют описанный выше формат дескриптора типа поля. Пустота типа возвращаемого представлен буква  V . Разделителя для типов параметров не существует. Вот некоторые примеры:

  • Программный метод точки входа для  public static final void main (String args [])  будет  ([Ljava / lang / String;) V
  • Конструктор формы  public Info (int index, java.lang.Object types [], byte bytes [])  будет  (I [Ljava / lang / Object; [Z) V
  • Метод с подписью  int getCount ()  будет  () I

Говоря о конструкторах, я должен также упомянуть, что все конструкторы имеют внутреннее имя метода  <init> . Кроме того, все статические инициализаторы в исходном коде помещаются в один метод статического инициализатора с внутренним именем метода  <clinit> .

Программное обеспечение

Прежде чем обсуждать библиотеки разработки байт-кода, в каталоге bin JDK есть необходимый инструмент обучения, называемый javap. Javap — это программа, которая разбирает байт-код и предоставляет текстовое представление. Давайте рассмотрим, что он может делать с скомпилированной версией следующего кода:

package ca.discotek.helloworld;

public class HelloWorld {

    static String message =
            "Hello World!";

    public static void main(String[] args) {
        try {
            System.out.println(message);
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Вот вывод команды  javap -help  :

Usage: javap  ...

where options include:
   -c                        Disassemble the code
   -classpath <pathlist>     Specify where to find user class files
   -extdirs <dirs>           Override location of installed extensions
   -help                     Print this usage message
   -J<flag>                  Pass  directly to the runtime system
   -l                        Print line number and local variable tables
   -public                   Show only public classes and members
   -protected                Show protected/public classes and members
   -package                  Show package/protected/public classes
                             and members (default)
   -private                  Show all classes and members
   -s                        Print internal type signatures
   -bootclasspath <pathlist> Override location of class files loaded
                             by the bootstrap class loader
   -verbose                  Print stack size, number of locals and args for methods
                             If verifying, print reasons for failure

Вот вывод, когда мы используем javap для дизассемблирования программы HelloWorld:


javap.exe -classpath "C:\projects\sandbox2\bin" -c -private -s -verbose ca.discotek.helloworld.HelloWorld
Compiled from "HelloWorld.java"
public class ca.discotek.helloworld.HelloWorld extends java.lang.Object
  SourceFile: "HelloWorld.java"
  minor version: 0
  major version: 50
  Constant pool:
const #1 = class        #2;     //  ca/discotek/helloworld/HelloWorld
const #2 = Asciz        ca/discotek/helloworld/HelloWorld;
const #3 = class        #4;     //  java/lang/Object
const #4 = Asciz        java/lang/Object;
const #5 = Asciz        message;
const #6 = Asciz        Ljava/lang/String;;
const #7 = Asciz        <clinit>;
const #8 = Asciz        ()V;
const #9 = Asciz        Code;
const #10 = String      #11;    //  Hello World!
const #11 = Asciz       Hello World!;
const #12 = Field       #1.#13; //  ca/discotek/helloworld/HelloWorld.message:Ljava/lang/String;
const #13 = NameAndType #5:#6;//  message:Ljava/lang/String;
const #14 = Asciz       LineNumberTable;
const #15 = Asciz       LocalVariableTable;
const #16 = Asciz       <init>;
const #17 = Method      #3.#18; //  java/lang/Object."<init>":()V
const #18 = NameAndType #16:#8;//  "<init>":()V
const #19 = Asciz       this;
const #20 = Asciz       Lca/discotek/helloworld/HelloWorld;;
const #21 = Asciz       main;
const #22 = Asciz       ([Ljava/lang/String;)V;
const #23 = Field       #24.#26;        //  java/lang/System.out:Ljava/io/PrintStream;
const #24 = class       #25;    //  java/lang/System
const #25 = Asciz       java/lang/System;
const #26 = NameAndType #27:#28;//  out:Ljava/io/PrintStream;
const #27 = Asciz       out;
const #28 = Asciz       Ljava/io/PrintStream;;
const #29 = Method      #30.#32;        //  java/io/PrintStream.println:(Ljava/lang/String;)V
const #30 = class       #31;    //  java/io/PrintStream
const #31 = Asciz       java/io/PrintStream;
const #32 = NameAndType #33:#34;//  println:(Ljava/lang/String;)V
const #33 = Asciz       println;
const #34 = Asciz       (Ljava/lang/String;)V;
const #35 = Method      #36.#38;        //  java/lang/Exception.printStackTrace:()V
const #36 = class       #37;    //  java/lang/Exception
const #37 = Asciz       java/lang/Exception;
const #38 = NameAndType #39:#8;//  printStackTrace:()V
const #39 = Asciz       printStackTrace;
const #40 = Asciz       args;
const #41 = Asciz       [Ljava/lang/String;;
const #42 = Asciz       e;
const #43 = Asciz       Ljava/lang/Exception;;
const #44 = Asciz       StackMapTable;
const #45 = Asciz       SourceFile;
const #46 = Asciz       HelloWorld.java;

{
static java.lang.String message;
  Signature: Ljava/lang/String;

static {};
  Signature: ()V
  Code:
   Stack=1, Locals=0, Args_size=0
   0:   ldc     #10; //String Hello World!
   2:   putstatic       #12; //Field message:Ljava/lang/String;
   5:   return
  LineNumberTable:
   line 6: 0
   line 5: 2
   line 6: 5

public ca.discotek.helloworld.HelloWorld();
  Signature: ()V
  Code:
   Stack=1, Locals=1, Args_size=1
   0:   aload_0
   1:   invokespecial   #17; //Method java/lang/Object."<init>":()V
   4:   return
  LineNumberTable:
   line 3: 0

  LocalVariableTable:
   Start  Length  Slot  Name   Signature
   0      5      0    this       Lca/discotek/helloworld/HelloWorld;

public static void main(java.lang.String[]);
  Signature: ([Ljava/lang/String;)V
  Code:
   Stack=2, Locals=2, Args_size=1
   0:   getstatic       #23; //Field java/lang/System.out:Ljava/io/PrintStream;
   3:   getstatic       #12; //Field message:Ljava/lang/String;
   6:   invokevirtual   #29; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   9:   goto    17
   12:  astore_1
   13:  aload_1
   14:  invokevirtual   #35; //Method java/lang/Exception.printStackTrace:()V
   17:  return
  Exception table:
   from   to  target type
     0     9    12   Class java/lang/Exception

  LineNumberTable:
   line 10: 0
   line 11: 9
   line 12: 12
   line 13: 13
   line 15: 17

  LocalVariableTable:
   Start  Length  Slot  Name   Signature
   0      18      0    args       [Ljava/lang/String;
   13      4      1    e       Ljava/lang/Exception;

  StackMapTable: number_of_entries = 2
   frame_type = 76 /* same_locals_1_stack_item */
     stack = [ class java/lang/Exception ]
   frame_type = 4 /* same */

}

You should note that the -l flag to output line number information was purposely omitted. The -verbose flag outputs other relevant information including line numbers. If both are used the line number information will be printed twice.

Here is an overview of the output:

Line Numbers Description
2 Command line to invoke javap. See javap -help output above for explanation of parameters.
3 Source code file provided by debug information included in byte code.
4 Class signature
5 Source code file provided by debug information included in byte code.
6-7 Major and Minor versions. 50.0 indicates the class was compiled with Java 6.
8-54 The class constant pool.
57-58 Declaration of the message field.
60 Declaration of the static initializer method.
61 Internal method descriptor for method.
63 Stack=1 indicates 1 slot is required on the operand stack. Locals=0 indicates no local variables are required.
Args_size=0 is the number of arguments to the method.
64-66 The byte code instructions to assign the String value Hello World! to the message field.
67-77 If compiled with debug information, each method will have a LineNumberTable. The format of each entry is
<line number of source code>: <starting instruction offset in byte code>. You’ll notice that the LineNumberTable
has duplicate entries and seamingly out of order (i.e. 6, 5, 6). It may not seem intuitive, but the compiler assembles the byte code
instructions will target the stack based JVM, which means it will often have to re-arrange instructions.
72 Default constructor signature
73 Default constructor internal method descriptor
75 Stack=1 indicates 1 slot is required on the operand stack. Locals=1 indicates there is one local variable. Method
parameters are treated as local variables. In this case, its the args parameter.
Args_size=1 is the number of arguments to the method.
76-78 Default constructor code. Simply invokes the default constructor of the super class, java.lang.Object.
79-80 Although the default constructor is not explicitly defined, the LineNumberTableindicates that the
default constructor is associated with line 3, where the class signature resides.
82-84 You might be surprised to see an entry in a LocalVariableTable because the default constructor
defines no local variables and has no parameters. However, all non-static methods will define the «this» local
variable, which is what is seen here. The start and length values indicate the scope of the local variable within the method.
The start value indicates the index in the method’s byte code array where the scope begins and the length value
indicates the location in the array where the scope ends (i.e. start + length = end). In the constructor, «this»
starts at index 0. This corresponds to the a_load0 instruction at line 78. The length is 5, which covers the entire method as
the last instruction is at index 4. The slot value indicates the order in which it is defined in the method. The name
attribute is the variable name as defined in the source code. The Signature attribute represents the type of variable.
You should note that local variable table information is added for debugging purposes. Assigning identifiers to chunks of memory
is entirely to help humans understand programs better. This information can be excluded from byte code.
86 Main method declaration
87 Main method internal descriptor.
89 Stack=2 indicates 2 slots are required on the operand stack. Locals=2 indicates two local variables are required
(The args and exception e from the catch block). Args_size=1 is the number of arguments to the method (args).
90-97 Byte code associated with printing the message and catching any exceptions.
98-100 Byte code does not have try/catch constructs, but it does have exception handling, which is implemented in the Exception table.
Each row in the table is an exception handling instruction. The from and to values indicate the range of instructions to
which the exception handling applies. If the given type of instruction occurs between the from and to instructions
(inclusively), execution will skip to the target instruction index. The value 12 represents the start of the catch block.
You’ll also notice the goto instruction after the invokevirtual instruction, which cause execution to skip to the end
of the method if no exception occurs.
102-107 Main method’s line number table which matches source code with byte code instructions.
109-112 Main methods’ LocalVariableTable, which defines the scope of the args parameter and the e exception variable.
114-117 The JVM uses StackMapTable entries to verify type safety for each code block defined within a method. This information
can be ignored for now. It is most likely that your compiler or byte code engineering library will generate this byte code
for you.

Byte Code Engineering Libraries

The most popular byte code engineering libraries are BCELSERPJavassist, and ASM. All of these libraries have their own merits, but overall, ASM is far superior for its speed and versatility. There are plenty of articles and blogs entries discussing these libraries in addition to the documentation on their web sites. Instead of duplicating these efforts, the following will provide links and hopefully other useful information.

BCEL

The most obvious detractor for BCEL (Byte Code Engineering Library) has been its inconsistent support. If you look at the BCEL News and Status page, there have been releases in 2001, 2003, 2006, and 2011. Four releases spread over 10 years is not confidence inspiring. However, it should be noted that there appears to be a version 6 release candidate, which can be downloaded from GitHub, but not Apache. Additionally, the enhancements and bug fixes discussed in the download’s RELEASE-NOTES.txt file are substantial, including support for the language features of Java 6, 7, and 8.

BCEL is a natural starting place for the uninitiated byte code developer because it has the prestige of the Apache Software Foundation. Often, it may serve the developer’s purpose. One of BCEL’s benefits is that it has an API for both the SAX and DOM approaches to parsing byte code. However, when byte code manipulation is more complex, BCEL will likely end in frustration due to its API documentation and community support. It should be noted that BCEL is bundled with a BCELifier utility which parses byte code and will output the BCEL API Java code to produce the parsed byte code. If you choose BCEL as your byte code engineering library, this utility will be invaluable (but note that ASM has an equivalent ASMifier).

SERP

SERP is a lesser known library. My experience with it is limited, but I did find it useful for building a Javadoc-style tool for byte code. SERP was the only API that could give me program counter information so I could hyperlink branching instructions to their targets. Although the SERP release documentation indicates there is support for Java 8’s invokedynamic instruction, it is not clear to me that it receives continuous support from the author and there is very little community support. The author also discusses its limitations which include issues with speed, memory consumption, and thread safety.

Javassist

Javassist is the only library that provides some functionality not supported by ASM… and its pretty awesome. Javassist allows you to insert Java source code into existing byte code. You can insert Java code before a method body or append it after the method body. You
can also wrap a method body in a try-block and add your own catch-block (of Java code). You can also subsitute an entire method body or other smaller constructs with your own Java source code. Lastly, you can add methods to a class which contain your own Java source code. This feature is extremely powerful as it allows a Java developer to manipulate byte code without requiring an in-depth understanding of the underlying byte code. However, this feature does have its limitations. For instance, if you introduce variables in an insertBefore() block of code, they cannot be referenced later in an insertAfter() block of code. Additionally, ASM is generally faster than Javassist, but the benefits in Javassist’s simplicity may outweigh gains in ASM’s performance. Javassists is continually supported by the authors at JBoss and receives much community support.

ASM

ASM has it all. It is well supported, it is fast, and it can do just about anything. ASM has both SAX and DOM style APIs for parsing byte code. ASM also has an ASMifier which can parse byte code and generate the corresponding Java source code, which when run will produce the parsed byte code. This is an invaluable tool. It is expected that the developer has some knowledge of byte code, but ASM can update frame information for you if you add local variables etc. It also has many utility classes for common tasks in its commons package. Further, common byte code transformations are documented in exceptional detail. You can also get help from the ASM mailing list. Lastly, forums like StackOverflow provide additional support. Almost certainly any problem you have has already been discussed in the ASM documentation or in a StackOverflow thread.

Useful Links

Summary

Admittedly, this blog entry has not been particularly instructional. The intention is to give the beginner a place to start. In my experience, the best way to learn is to have a project in mind to which you’ll apply what you are learning. Documenting a few basic byte code engineering tasks will only duplicate other’s efforts. I developed my byte code skills from an interest in reverse engineering. I would prefer not to document those skills as it would be counter-productive to my other efforts (I built a commerical byte code obfuscator called Modifly, which can perform obfuscation transformations at run-time). However, I am willing to share what I have learned by demonstrating how to apply byte code engineering to class reloading and memory leak detection (and perhaps other areas if there is interest).

Next Blog in the Series Teaser

Even if you don’t use JRebel, you probably haven’t escaped their ads. JRebel’s home page claims «Reload Code Changes Instantly. Skip the build and redeploy process. JRebel reloads changes to Java classes, resources, and over 90 frameworks.». Have you ever wondered how they do it? I’ll show you exactly how they do it with working code in my next blog in this series.

If you enjoyed this blog, you may wish to follow discotek.ca on twitter.