Abstract
The Java language's safety features often make it difficult to get to information about a Java class other than what the Java virtual machine wants you to know. This month I'll take a look at class files with an eye toward understanding what is in them and how you can use that knowledge in your Java applications. (3,300 words)
By Chuck McManis
elcome to this month's installment of "Java In Depth."
One of the earliest challenges for Java was whether or not it could
stand as a capable "systems" language. The root of the
question involved Java's safety features that prevent a Java class from
knowing other classes that are running alongside it in the virtual
machine. This ability to "look inside" the classes is called
introspection. In the first public Java release, known as
Alpha3, the strict language rules regarding visibility of the internal
components of a class could be circumvented though the use
of the
ObjectScope
class. Then, during beta, when
ObjectScope
was removed from the run time because of
security concerns, many people declared Java to be unfit for
"serious" development.
Why is introspection necessary in order for a language to be considered a "systems" language? One part of the answer is fairly mundane: Getting from "nothing" (that is, an uninitialized VM) to "something" (that is, a running Java class) requires that some part of the system be able to inspect the classes to be run so as to figure out just what to do with them. The canonical example of this problem is simply the following: "How does a program, written in a language that cannot look 'inside' another language component, begin executing the first language component, which is the starting point of execution for all other components?"
There are two ways to deal with introspection in Java: class file inspection and the new reflection API that is part of Java 1.1.x. I'll cover both techniques, but in this column I'll focus on the first -- class file inspection. In a future column I will look at how the reflection API solves this problem. (Links to complete source code for this column are available in the Resources section.)
Look deeply into my files...
In the 1.0.x releases of Java, one of the biggest warts on the Java run
time is the way in which the Java executable starts a program. What is
the problem? Execution is transiting from the domain of the host
operating system (Win 95, SunOS, and so on) into the domain of the Java
virtual machine. Typing the line "java MyClass arg1
arg2
" sets in motion a series of events that are completely
hard-coded by the Java interpreter.
As the first event, the operating system command shell loads the
Java interpreter and passes it the string "MyClass arg1 arg2"
as its argument. The next event occurs when the Java interpreter
attempts to locate a class named MyClass
in one of the
directories identified in the class path. If the class is found, the
third event is to locate a method inside the class named
main
, whose signature has the modifiers "public" and
"static" and which takes an array of String
objects as its
argument. If this method is found, a primordial thread is constructed
and the method is invoked. The Java interpreter then converts
"arg1 arg2" into an array of strings. Once this method is
invoked, everything else is pure Java.
This is all well and good except that the main
method has to be static because the run time can't invoke it with a
Java environment that doesn't exist yet. Further, the first method has to be
named main
because there isn't any way to tell the
interpreter the method's name on the command line. Even if you did tell
the interpreter the name of the method, there isn't any general way in
which to find out if it was in the class you had named in the first
place. Finally, because the main
method is static, you
can't declare it in an interface, and that means you can't specify an
interface like this:
public interface Application { public void main(String args[]); }
If the above interface was defined, and classes implemented it, then at
least you could use the instanceof
operator in Java to
determine if you had an application or not and thus determine whether
or not it was suitable for invoking from the command line. The bottom
line is that you can't (define the interface), it wasn't (built into
the Java interpreter), and so you can't (determine if a class file is
an application easily). So what can you do?
Actually, you can do quite a bit if you know what to look for and how to use it.
Decompiling class files
The Java class file is architecture-neutral, which means it is the same
set of bits whether it is loaded from a Windows 95 machine or a Sun
Solaris machine. It is also very well documented in the book The
Java Virtual Machine Specification by Lindholm and Yellin. The
class file structure was designed, in part, to be easily loaded into
the SPARC address space. Basically, the class file could be mapped into
the virtual address space, then the relative pointers inside the
class fixed up, and presto! You had instant class structure. This was
less useful on the Intel architecture machines, but the heritage left
the class file format easy to comprehend, and even easier to break
down.
In the summer of 1994, I was working in the Java group and building what is known as a "least privilege" security model for Java. I had just finished figuring out that what I really wanted to do was to look inside a Java class, excise those pieces that were not allowed by the current privilege level, and then load the result through a custom class loader. It was then that I discovered there weren't any classes in the main run time that knew about the construction of class files. There were versions in the compiler class tree (which had to generate class files from the compiled code), but I was more interested in building something for manipulating pre-existing class files.
I started by building a Java class that could decompose a Java class
file that was presented to it on an input stream. I gave it the
less-than-original name ClassFile
. The beginning of this
class is shown below.
public class ClassFile { int magic; short majorVersion; short minorVersion; ConstantPoolInfo constantPool[]; short accessFlags; ConstantPoolInfo thisClass; ConstantPoolInfo superClass; ConstantPoolInfo interfaces[]; FieldInfo fields[]; MethodInfo methods[]; AttributeInfo attributes[]; boolean isValidClass = false; public static final int ACC_PUBLIC = 0x1; public static final int ACC_PRIVATE = 0x2; public static final int ACC_PROTECTED = 0x4; public static final int ACC_STATIC = 0x8; public static final int ACC_FINAL = 0x10; public static final int ACC_SYNCHRONIZED = 0x20; public static final int ACC_THREADSAFE = 0x40; public static final int ACC_TRANSIENT = 0x80; public static final int ACC_NATIVE = 0x100; public static final int ACC_INTERFACE = 0x200; public static final int ACC_ABSTRACT = 0x400;
As you can see, the instance variables for class ClassFile
define the major components of a Java class file. In particular, the
central data structure for a Java class file is known as the constant
pool. Other interesting chunks of class file get classes of their own:
MethodInfo
for methods, FieldInfo
for fields
(which are the variable declarations in the class),
AttributeInfo
to hold class file attributes, and a set of
constants that was taken directly from the specification on class files to
decode the various modifiers that apply to field, method, and class
declarations.
The primary method of this class is read
, which is used to
read a class file from disk and create a new ClassFile
instance from the data. The code for the read
method is
shown below. I've interspersed the description with the code since the
method tends to be pretty long.
1 public boolean read(InputStream in) 2 throws IOException { 3 DataInputStream di = new DataInputStream(in); 4 int count; 5 6 magic = di.readInt(); 7 if (magic != (int) 0xCAFEBABE) { 8 return (false); 9 } 10 11 majorVersion = di.readShort(); 12 minorVersion = di.readShort(); 13 count = di.readShort(); 14 constantPool = new ConstantPoolInfo[count]; 15 if (debug) 16 System.out.println("read(): Read header..."); 17 constantPool[0] = new ConstantPoolInfo(); 18 for (int i = 1; i < constantPool.length; i++) { 19 constantPool[i] = new ConstantPoolInfo(); 20 if (! constantPool[i].read(di)) { 21 return (false); 22 } 23 // These two types take up "two" spots in the table 24 if ((constantPool[i].type == ConstantPoolInfo.LONG) || 25 (constantPool[i].type == ConstantPoolInfo.DOUBLE)) 26 i++; 27 }
As you can see, the code above begins by first wrapping a
DataInputStream
around the input stream referenced by
the variable in. Further, in lines 6 through 12, all of the
information necessary to determine that the code is indeed looking at a
valid class file is present. This information consists of the magic
"cookie" 0xCAFEBABE, and the version numbers 45 and 3 for the
major and minor values respectively. Next, in lines 13 through 27, the
constant pool is read into an array of
ConstantPoolInfo
objects. The source code to
ConstantPoolInfo
is unremarkable -- it simply reads in
data and identifies it based on its type. Later elements from the
constant pool are used to display information about the class.
Following the above code, the read
method re-scans the
constant pool and "fixes up" references in the constant pool
that refer to other items in the constant pool. The fix-up code is shown
below. This fix-up is necessary since the references typically are
indexes into the constant pool, and it is useful to have those indexes
already resolved. This also provides a check for the reader to know
that the class file isn't corrupt at the constant pool level.
28 for (int i = 1; i < constantPool.length; i++) { 29 if (constantPool[i] == null) 30 continue; 31 if (constantPool[i].index1 > 0) 32 constantPool[i].arg1 = constantPool[constantPool[i].index1]; 33 if (constantPool[i].index2 > 0) 34 constantPool[i].arg2 = constantPool[constantPool[i].index2]; 35 } 36 37 if (dumpConstants) { 38 for (int i = 1; i < constantPool.length; i++) { 39 System.out.println("C"+i+" - "+constantPool[i]); 30 } 31 }
In the above code each constant pool entry uses the index values to figure out the reference to another constant pool entry. When complete in line 36, the entire pool is optionally dumped out.
Once the code has scanned past the constant pool, the class file defines the primary class information: its class name, superclass name, and implementing interfaces. The read code scans for these values as shown below.
32 accessFlags = di.readShort(); 33 34 thisClass = constantPool[di.readShort()]; 35 superClass = constantPool[di.readShort()]; 36 if (debug) 37 System.out.println("read(): Read class info..."); 38 39 /* 30 * Identify all of the interfaces implemented by this class 31 */ 32 count = di.readShort(); 33 if (count != 0) { 34 if (debug) 35 System.out.println("Class implements "+count+" interfaces."); 36 interfaces = new ConstantPoolInfo[count]; 37 for (int i = 0; i < count; i++) { 38 int iindex = di.readShort(); 39 if ((iindex < 1) || (iindex > constantPool.length - 1)) 40 return (false); 41 interfaces[i] = constantPool[iindex]; 42 if (debug) 43 System.out.println("I"+i+": "+interfaces[i]); 44 } 45 } 46 if (debug) 47 System.out.println("read(): Read interface info...");
Once this code is complete, the read
method has built up a
pretty good idea of the structure of the class. All that remains is to
collect the field definitions, the method definitions, and, perhaps most
importantly, the class file attributes.
The class file format breaks each of these three groups into a section consisting of a number, followed by that number of instances of the thing you are looking for. So, for fields, the class file has the number of defined fields, and then that many field definitions. The code to scan in the fields is shown below.
48 count = di.readShort(); 49 if (debug) 50 System.out.println("This class has "+count+" fields."); 51 if (count != 0) { 52 fields = new FieldInfo[count]; 53 for (int i = 0; i < count; i++) { 54 fields[i] = new FieldInfo(); 55 if (! fields[i].read(di, constantPool)) { 56 return (false); 57 } 58 if (debug) 59 System.out.println("F"+i+": "+ 60 fields[i].toString(constantPool)); 61 } 62 } 63 if (debug) 64 System.out.println("read(): Read field info...");
The above code starts by reading a count in line #48, then, while the
count is non-zero, it reads in new fields using the
FieldInfo
class. The FieldInfo
class simply
fills out data that define a field to the Java virtual machine. The
code to read methods and attributes is the same, simply replacing the
references to FieldInfo
with references to
MethodInfo
or AttributeInfo
as appropriate.
That source is not included here, however you can look at the source
using the links in the Resources section
below.
Ok, so now what?
At this point you might be asking, "What good does this do me?"
The answer is "Quite a bit."
If you've compiled up these classes and have them in your class path, the simplest thing you can do is to print them out and have a look.
The ClassFile
class defines a method named
display
for dumping the structure of the class file out to
the terminal. I wrote a simple program named dumpclass
to
show how it is used. The source code to dumpclass
is shown
below.
import java.io.*; import java.util.*; import util.*; public class dumpclass { public static void main(String args[]) { try { FileInputStream fi = new FileInputStream(args[0]); util.ClassFile cf = new util.ClassFile(); // cf.debug = true; // cf.dumpConstants = true; if (! cf.read(fi)) { System.out.println("Unable to read class file."); System.exit(1); } cf.display(System.out); } catch (Exception e) { e.printStackTrace(); } } }
The code above shows how dumpclass
easily reads in a named
class file and then displays it using the display
method.
The output of the display is shown below. If you look at the output you
will see that generic imports in the source such as import
java.io.*;
are regenerated with the specific files that the
dumpclass
code actually imports. If nothing else, using
dumpclass
on your class files, and cutting and pasting the
specific imports in for your generic imports, will save compile time on
some compilers. The other interesting thing is that the source code
looks like, well, source code. This is because the class file structure
contains structural as well as implementation information. You should
not use such information to illegally decompile other people's
class files.
import java.io.FileInputStream; import java.io.PrintStream; import java.lang.Exception; import java.lang.System; import java.lang.Throwable; import util.ClassFile; /* * This class has 1 optional class attributes. * These attributes are: * Attribute 1 is of type SourceFile * SourceFile : dumpclass.java */ public synchronized class dumpclass extends java.lang.Object { /* Methods */ public static void main(java.lang.String a[]); public void dumpclass(); }
More interesting to me when I wrote these classes was the optional
class file attribute. Since the ClassFile
class can
write as well as read class files, it is ideal for "adding
on" an optional class file attribute.
For those of you who haven't seen the specification on class files, the optional class file attribute is a chunk of opaque data that has a string typename and a chunk of opaque binary data. Sun defines a few well-known attributes (the "SourceFile" attribute shown above is one such attribute), but you can use the attributes to store arbitrarily interesting data. In my secure system prototype I had space reserved in an optional class attribute for a public key signature and a capabilities certificate.
Another interesting application of class file attribute is demonstrated by the SBKTech application Jinstall, which uses an attribute to store the compressed data for its self-extracting archive process. Using these classes and the new ZIP file routines in 1.1 makes it pretty easy to generate this type of application.
Finally, perhaps the most intriguing application of reading and rewriting class files uses attributes and class loaders. Referring back to my article on writing class loaders, and knowing that attributes can be associated with methods, in addition to being generic to the class (and in fact there is an attribute with the method to indicate the exceptions it throws), consider the following application.
Let's say you have a Java class whose method code was stored in an
attribute associated with that method and encrypted by a key known only
to the author's server. The actual code associated with a method was
some Java code that simply threw an
UnlicensedUsageException
. (Note that this is a fictional
exception used to illustrate the design.) Now bundle with an
application a custom class loader that was designed to load such a
class. This class loader would work in the following way.
First, the code for the class would be read. Then the class would be
decomposed into a ClassFile
structure. After this,
the methods in the class would be checked for encryption. The class
loader, once satisfied such a thing was allowed, would contact, via the
Internet, the author's server and request a decryption key. That key
would be applied to the encrypted code, and the decrypted code would be
substituted for the place holder code. The class would be rewritten
into a byte stream and then fed into the class loader for loading and
execution.
The result of these steps would be a Java class file that was very much more difficult to decompile than a "normal" Java class. Further, since the decryption happens on the fly, only a modified virtual machine could be used to extract the running code (assuming a secure decrypting key exchange).
I had thought about coding an example but realized that such a class loader would no doubt be declared to be a munition and I would be branded an arms dealer. So this description will have to suffice!
Wrapping up and further thoughts
Being able to see inside a Java class can enable a Java application to
manipulate that class in useful ways. I've looked at reading and
writing class files directly, and then through a custom class loader
importing the class into the Java run time. Being able to write
classes enables such applications as "self extracting"
classes. These are meta classes around a distribution of classes.
Another interesting application is the notion of an encrypted class
whose contents are self-decrypted just prior to running by accessing a
remote key. It all goes to show that we can learn new skills by looking
inside ourselves!
Next month we will look at the Reflection API and how it achieves
introspection while keeping a rein on security, and I'll show you how
I'd write the initial code of the Java interpreter if I had an
opportunity to update that code.
About the author
Chuck McManis currently is the director of system software at FreeGate
Corp., a venture-funded start-up that is exploring opportunities in the
Internet marketplace. Before joining FreeGate, Chuck was a member of
the Java Group. He joined the Java Group just after the formation of
FirstPerson Inc. and was a member of the portable OS group (the group
responsible for the OS portion of Java). Later, when FirstPerson was
dissolved, he stayed with the group through the development of the
alpha and beta versions of the Java platform. He created the first
"all Java" home page on the Internet when he did the
programming for the Java version of the Sun home page in May 1995. He
also developed a cryptographic library for Java and versions of the
Java class loader that could screen classes based on digital
signatures. Before joining FirstPerson, Chuck worked in the operating
systems area of SunSoft, developing networking applications, where he
did the initial design of NIS+. Check out his home page.
Reach Chuck at chuck.mcmanis@javaworld.com.
If you have problems with this magazine, contact
webmaster@javaworld.com
URL: http://www.javaworld.com/javaworld/jw-08-1997/jw-08-indepth.html
Last modified: