Эта запись в блоге является первой из серии статей, состоящей из нескольких частей, в которой обсуждаются преимущества разработки байт-кода и его применения. Разработка байтового кода включает создание нового байтового кода в форме классов и модификацию существующего байтового кода. Байтовый код имеет много приложений. Он используется в инструментах для компиляторов, перезагрузки классов, обнаружения утечек памяти и мониторинга производительности. Кроме того, большинство серверов приложений используют библиотеки байт-кода для генерации классов во время выполнения. Проектирование байт-кода используется чаще, чем вы думаете. На самом деле, вы можете найти популярные библиотеки разработки байт-кода, включенные в JRE, включая BCEL и ASM, Несмотря на его широкое использование, кажется, очень мало университетских или колледжских курсов, которые преподают разработку байт-кода. Это аспект программирования, который разработчики должны изучать самостоятельно, а для тех, кто этого не делает, он остается загадочным черным искусством. Дело в том, что библиотеки разработки байт-кода облегчают изучение этой области и являются входом в более глубокое понимание внутренних возможностей JVM. Цель этих статей — предоставить отправную точку, а затем задокументировать некоторые продвинутые концепции, которые, мы надеемся, вдохновят читателей на развитие собственных навыков.
Документация
Есть несколько ресурсов, которые каждый, кто изучает разработку байт-кода, должен всегда иметь под рукой. Первая — это спецификация виртуальной машины Java (к сведению, на этой странице есть ссылки как на язык, так и на спецификации JVM ). Глава 4, Формат файла класса является обязательным. Вторым ресурсом, который полезен для быстрого ознакомления, является страница Википедии, озаглавленная « Списки инструкций байт-кода Java» . С точки зрения инструкций байтового кода, это более кратко и информативно, чем сама спецификация JVM. Другим ресурсом, который пригодится новичку, является таблица формата внутреннего дескриптора для типов полей. Эта таблица взята непосредственно из спецификации JVM.
BaseType Character | Тип | интерпретация |
---|---|---|
В | байт | подписанный байт |
С | голец | Кодовая точка символа Unicode в базовой многоязычной плоскости, закодированная с помощью UTF-16 |
D | двойной | значение с плавающей точкой двойной точности |
F | поплавок | значение с плавающей точкой одинарной точности |
я | ИНТ | целое число |
J | долго | длинное целое |
L <ИмяКласса>; | ссылка | экземпляр класса <ClassName> |
S | короткая | подписанный короткий |
Z | логический | правда или ложь |
[ | ссылка | одно измерение массива |
Большинство примитивных типов полей просто использовать первый начальный представлять тип типа поля в внутренне (то есть я для междунар, F для поплавка и т.д.), однако, долгое время является J и байтами являются Z . Типы объектов не интуитивно понятны. Тип объекта начинается с буквы L и заканчивается точкой с запятой. Между этими символами находится полное имя класса, каждое имя которого отделено косой чертой. Например, внутренним дескриптором для типа поля java.lang.Integer является Ljava / lang / Integer; , Наконец, размеры массива обозначаются символом «[». Для каждого измерения вставьте символ «[». Например, двумерный массив int будет
[[Iтогда как двумерный массив java.lang.Integer будет [[Ljava / lang / Integer;
Методы также имеют внутренний формат дескриптора. Формат: (<типы параметров>) <тип возврата> . Все типы используют описанный выше формат дескриптора типа поля. Пустота типа возвращаемого представлен буква V . Разделителя для типов параметров не существует. Вот некоторые примеры:
- Программный метод точки входа для public static final void main (String args []) будет ([Ljava / lang / String;) V
- Конструктор формы public Info (int index, java.lang.Object types [], byte bytes []) будет (I [Ljava / lang / Object; [Z) V
- Метод с подписью int getCount () будет () I
Говоря о конструкторах, я должен также упомянуть, что все конструкторы имеют внутреннее имя метода <init> . Кроме того, все статические инициализаторы в исходном коде помещаются в один метод статического инициализатора с внутренним именем метода <clinit> .
Программное обеспечение
Прежде чем обсуждать библиотеки разработки байт-кода, в каталоге bin JDK есть необходимый инструмент обучения, называемый javap. Javap — это программа, которая разбирает байт-код и предоставляет текстовое представление. Давайте рассмотрим, что он может делать с скомпилированной версией следующего кода:
package ca.discotek.helloworld;
public class HelloWorld {
static String message =
"Hello World!";
public static void main(String[] args) {
try {
System.out.println(message);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Вот вывод команды javap -help :
Usage: javap ...
where options include:
-c Disassemble the code
-classpath <pathlist> Specify where to find user class files
-extdirs <dirs> Override location of installed extensions
-help Print this usage message
-J<flag> Pass directly to the runtime system
-l Print line number and local variable tables
-public Show only public classes and members
-protected Show protected/public classes and members
-package Show package/protected/public classes
and members (default)
-private Show all classes and members
-s Print internal type signatures
-bootclasspath <pathlist> Override location of class files loaded
by the bootstrap class loader
-verbose Print stack size, number of locals and args for methods
If verifying, print reasons for failure
Вот вывод, когда мы используем javap для дизассемблирования программы HelloWorld:
javap.exe -classpath "C:\projects\sandbox2\bin" -c -private -s -verbose ca.discotek.helloworld.HelloWorld
Compiled from "HelloWorld.java"
public class ca.discotek.helloworld.HelloWorld extends java.lang.Object
SourceFile: "HelloWorld.java"
minor version: 0
major version: 50
Constant pool:
const #1 = class #2; // ca/discotek/helloworld/HelloWorld
const #2 = Asciz ca/discotek/helloworld/HelloWorld;
const #3 = class #4; // java/lang/Object
const #4 = Asciz java/lang/Object;
const #5 = Asciz message;
const #6 = Asciz Ljava/lang/String;;
const #7 = Asciz <clinit>;
const #8 = Asciz ()V;
const #9 = Asciz Code;
const #10 = String #11; // Hello World!
const #11 = Asciz Hello World!;
const #12 = Field #1.#13; // ca/discotek/helloworld/HelloWorld.message:Ljava/lang/String;
const #13 = NameAndType #5:#6;// message:Ljava/lang/String;
const #14 = Asciz LineNumberTable;
const #15 = Asciz LocalVariableTable;
const #16 = Asciz <init>;
const #17 = Method #3.#18; // java/lang/Object."<init>":()V
const #18 = NameAndType #16:#8;// "<init>":()V
const #19 = Asciz this;
const #20 = Asciz Lca/discotek/helloworld/HelloWorld;;
const #21 = Asciz main;
const #22 = Asciz ([Ljava/lang/String;)V;
const #23 = Field #24.#26; // java/lang/System.out:Ljava/io/PrintStream;
const #24 = class #25; // java/lang/System
const #25 = Asciz java/lang/System;
const #26 = NameAndType #27:#28;// out:Ljava/io/PrintStream;
const #27 = Asciz out;
const #28 = Asciz Ljava/io/PrintStream;;
const #29 = Method #30.#32; // java/io/PrintStream.println:(Ljava/lang/String;)V
const #30 = class #31; // java/io/PrintStream
const #31 = Asciz java/io/PrintStream;
const #32 = NameAndType #33:#34;// println:(Ljava/lang/String;)V
const #33 = Asciz println;
const #34 = Asciz (Ljava/lang/String;)V;
const #35 = Method #36.#38; // java/lang/Exception.printStackTrace:()V
const #36 = class #37; // java/lang/Exception
const #37 = Asciz java/lang/Exception;
const #38 = NameAndType #39:#8;// printStackTrace:()V
const #39 = Asciz printStackTrace;
const #40 = Asciz args;
const #41 = Asciz [Ljava/lang/String;;
const #42 = Asciz e;
const #43 = Asciz Ljava/lang/Exception;;
const #44 = Asciz StackMapTable;
const #45 = Asciz SourceFile;
const #46 = Asciz HelloWorld.java;
{
static java.lang.String message;
Signature: Ljava/lang/String;
static {};
Signature: ()V
Code:
Stack=1, Locals=0, Args_size=0
0: ldc #10; //String Hello World!
2: putstatic #12; //Field message:Ljava/lang/String;
5: return
LineNumberTable:
line 6: 0
line 5: 2
line 6: 5
public ca.discotek.helloworld.HelloWorld();
Signature: ()V
Code:
Stack=1, Locals=1, Args_size=1
0: aload_0
1: invokespecial #17; //Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 3: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lca/discotek/helloworld/HelloWorld;
public static void main(java.lang.String[]);
Signature: ([Ljava/lang/String;)V
Code:
Stack=2, Locals=2, Args_size=1
0: getstatic #23; //Field java/lang/System.out:Ljava/io/PrintStream;
3: getstatic #12; //Field message:Ljava/lang/String;
6: invokevirtual #29; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
9: goto 17
12: astore_1
13: aload_1
14: invokevirtual #35; //Method java/lang/Exception.printStackTrace:()V
17: return
Exception table:
from to target type
0 9 12 Class java/lang/Exception
LineNumberTable:
line 10: 0
line 11: 9
line 12: 12
line 13: 13
line 15: 17
LocalVariableTable:
Start Length Slot Name Signature
0 18 0 args [Ljava/lang/String;
13 4 1 e Ljava/lang/Exception;
StackMapTable: number_of_entries = 2
frame_type = 76 /* same_locals_1_stack_item */
stack = [ class java/lang/Exception ]
frame_type = 4 /* same */
}
You should note that the -l flag to output line number information was purposely omitted. The -verbose flag outputs other relevant information including line numbers. If both are used the line number information will be printed twice.
Here is an overview of the output:
Line Numbers | Description |
---|---|
2 | Command line to invoke javap. See javap -help output above for explanation of parameters. |
3 | Source code file provided by debug information included in byte code. |
4 | Class signature |
5 | Source code file provided by debug information included in byte code. |
6-7 | Major and Minor versions. 50.0 indicates the class was compiled with Java 6. |
8-54 | The class constant pool. |
57-58 | Declaration of the message field. |
60 | Declaration of the static initializer method. |
61 | Internal method descriptor for method. |
63 | Stack=1 indicates 1 slot is required on the operand stack. Locals=0 indicates no local variables are required. Args_size=0 is the number of arguments to the method. |
64-66 | The byte code instructions to assign the String value Hello World! to the message field. |
67-77 | If compiled with debug information, each method will have a LineNumberTable. The format of each entry is <line number of source code>: <starting instruction offset in byte code>. You’ll notice that the LineNumberTable has duplicate entries and seamingly out of order (i.e. 6, 5, 6). It may not seem intuitive, but the compiler assembles the byte code instructions will target the stack based JVM, which means it will often have to re-arrange instructions. |
72 | Default constructor signature |
73 | Default constructor internal method descriptor |
75 | Stack=1 indicates 1 slot is required on the operand stack. Locals=1 indicates there is one local variable. Method parameters are treated as local variables. In this case, its the args parameter. Args_size=1 is the number of arguments to the method. |
76-78 | Default constructor code. Simply invokes the default constructor of the super class, java.lang.Object. |
79-80 | Although the default constructor is not explicitly defined, the LineNumberTableindicates that the default constructor is associated with line 3, where the class signature resides. |
82-84 | You might be surprised to see an entry in a LocalVariableTable because the default constructor defines no local variables and has no parameters. However, all non-static methods will define the «this» local variable, which is what is seen here. The start and length values indicate the scope of the local variable within the method. The start value indicates the index in the method’s byte code array where the scope begins and the length value indicates the location in the array where the scope ends (i.e. start + length = end). In the constructor, «this» starts at index 0. This corresponds to the a_load0 instruction at line 78. The length is 5, which covers the entire method as the last instruction is at index 4. The slot value indicates the order in which it is defined in the method. The name attribute is the variable name as defined in the source code. The Signature attribute represents the type of variable. You should note that local variable table information is added for debugging purposes. Assigning identifiers to chunks of memory is entirely to help humans understand programs better. This information can be excluded from byte code. |
86 | Main method declaration |
87 | Main method internal descriptor. |
89 | Stack=2 indicates 2 slots are required on the operand stack. Locals=2 indicates two local variables are required (The args and exception e from the catch block). Args_size=1 is the number of arguments to the method (args). |
90-97 | Byte code associated with printing the message and catching any exceptions. |
98-100 | Byte code does not have try/catch constructs, but it does have exception handling, which is implemented in the Exception table. Each row in the table is an exception handling instruction. The from and to values indicate the range of instructions to which the exception handling applies. If the given type of instruction occurs between the from and to instructions (inclusively), execution will skip to the target instruction index. The value 12 represents the start of the catch block. You’ll also notice the goto instruction after the invokevirtual instruction, which cause execution to skip to the end of the method if no exception occurs. |
102-107 | Main method’s line number table which matches source code with byte code instructions. |
109-112 | Main methods’ LocalVariableTable, which defines the scope of the args parameter and the e exception variable. |
114-117 | The JVM uses StackMapTable entries to verify type safety for each code block defined within a method. This information can be ignored for now. It is most likely that your compiler or byte code engineering library will generate this byte code for you. |
Byte Code Engineering Libraries
The most popular byte code engineering libraries are BCEL, SERP, Javassist, and ASM. All of these libraries have their own merits, but overall, ASM is far superior for its speed and versatility. There are plenty of articles and blogs entries discussing these libraries in addition to the documentation on their web sites. Instead of duplicating these efforts, the following will provide links and hopefully other useful information.
BCEL
The most obvious detractor for BCEL (Byte Code Engineering Library) has been its inconsistent support. If you look at the BCEL News and Status page, there have been releases in 2001, 2003, 2006, and 2011. Four releases spread over 10 years is not confidence inspiring. However, it should be noted that there appears to be a version 6 release candidate, which can be downloaded from GitHub, but not Apache. Additionally, the enhancements and bug fixes discussed in the download’s RELEASE-NOTES.txt file are substantial, including support for the language features of Java 6, 7, and 8.
BCEL is a natural starting place for the uninitiated byte code developer because it has the prestige of the Apache Software Foundation. Often, it may serve the developer’s purpose. One of BCEL’s benefits is that it has an API for both the SAX and DOM approaches to parsing byte code. However, when byte code manipulation is more complex, BCEL will likely end in frustration due to its API documentation and community support. It should be noted that BCEL is bundled with a BCELifier utility which parses byte code and will output the BCEL API Java code to produce the parsed byte code. If you choose BCEL as your byte code engineering library, this utility will be invaluable (but note that ASM has an equivalent ASMifier).
SERP
SERP is a lesser known library. My experience with it is limited, but I did find it useful for building a Javadoc-style tool for byte code. SERP was the only API that could give me program counter information so I could hyperlink branching instructions to their targets. Although the SERP release documentation indicates there is support for Java 8’s invokedynamic instruction, it is not clear to me that it receives continuous support from the author and there is very little community support. The author also discusses its limitations which include issues with speed, memory consumption, and thread safety.
Javassist
Javassist is the only library that provides some functionality not supported by ASM… and its pretty awesome. Javassist allows you to insert Java source code into existing byte code. You can insert Java code before a method body or append it after the method body. You
can also wrap a method body in a try-block and add your own catch-block (of Java code). You can also subsitute an entire method body or other smaller constructs with your own Java source code. Lastly, you can add methods to a class which contain your own Java source code. This feature is extremely powerful as it allows a Java developer to manipulate byte code without requiring an in-depth understanding of the underlying byte code. However, this feature does have its limitations. For instance, if you introduce variables in an insertBefore() block of code, they cannot be referenced later in an insertAfter() block of code. Additionally, ASM is generally faster than Javassist, but the benefits in Javassist’s simplicity may outweigh gains in ASM’s performance. Javassists is continually supported by the authors at JBoss and receives much community support.
ASM
ASM has it all. It is well supported, it is fast, and it can do just about anything. ASM has both SAX and DOM style APIs for parsing byte code. ASM also has an ASMifier which can parse byte code and generate the corresponding Java source code, which when run will produce the parsed byte code. This is an invaluable tool. It is expected that the developer has some knowledge of byte code, but ASM can update frame information for you if you add local variables etc. It also has many utility classes for common tasks in its commons package. Further, common byte code transformations are documented in exceptional detail. You can also get help from the ASM mailing list. Lastly, forums like StackOverflow provide additional support. Almost certainly any problem you have has already been discussed in the ASM documentation or in a StackOverflow thread.
Useful Links
- Understanding Byte Code
- BCEL
- SERP
- Javassist
- ASM
Summary
Admittedly, this blog entry has not been particularly instructional. The intention is to give the beginner a place to start. In my experience, the best way to learn is to have a project in mind to which you’ll apply what you are learning. Documenting a few basic byte code engineering tasks will only duplicate other’s efforts. I developed my byte code skills from an interest in reverse engineering. I would prefer not to document those skills as it would be counter-productive to my other efforts (I built a commerical byte code obfuscator called Modifly, which can perform obfuscation transformations at run-time). However, I am willing to share what I have learned by demonstrating how to apply byte code engineering to class reloading and memory leak detection (and perhaps other areas if there is interest).
Next Blog in the Series Teaser
Even if you don’t use JRebel, you probably haven’t escaped their ads. JRebel’s home page claims «Reload Code Changes Instantly. Skip the build and redeploy process. JRebel reloads changes to Java classes, resources, and over 90 frameworks.». Have you ever wondered how they do it? I’ll show you exactly how they do it with working code in my next blog in this series.
If you enjoyed this blog, you may wish to follow discotek.ca on twitter.