Wednesday, October 18, 2006

How to fix the dreaded "java.lang.OutOfMemoryError: PermGen space" exception (classloader leaks)

In the previous blog entry Classloader leaks: the dreaded "java.lang.OutOfMemoryError: PermGen space" exception  I explained how this type of problem can originate in the application code that you deploy to an application server.  In this post I'll explain how to track down the leak so that you can fix it.

Profilers

Memory leak? Use a profiler. Right? Well... generally speaking the answer is yes, but classloader leaks are a bit special...

To refresh your memory (pardon the pun), a memory leak is an object that the system unintentionally hangs on to, thereby making it impossible for the garbage collector to remove this object. The way that profilers find memory leaks is to trace references to a leaked object.

What do I mean by "tracing"? A leaked object can be referenced by another object which itself is a leak. In turn, this object may also be a leak, and so on. This repeats until an object is found that references a leaked object by mistake. This reference is where the problem is, and what you need to fix. Let me try to clarify this by illustrating this with a picture from my previous blog:

In this picture the AppClassloader, LeakServlet.class, STATICNAME, CUSTOMLEVEL, LeakServlet$1.class are all leaked objects. Due to static objects (e.g. STATICNAME) in the picture, that may in turn reference other objects, the number of leaked objects may be in the thousands. Going over each leaked object manually to check if there are any incidental references to it (the red reference in the picture) until you find the troublesome object (CUSTOMLEVEL) is laborious. You would rather have a program find the violating reference for you.

A profiler doesn't tell you which leaked object is interesting to look at (CUSTOMLEVEL). Instead it gives you all leaked objects. Let's say that you would look at STATICNAME. The profiler now should find the route STATICNAME to LEAKSERVLET.class to AppClassloader to LeakServlet1$1.class, to CUSTOMLEVEL to Level.class. In this route, the red line in the picture is the reference that actually causes the leak. I said the profiler should find this route. However, all the profilers that we tried, stop tracing as soon as they reach a class object or classloader. There's a good reason for that: the number of traces grows enormous if it follows through the references through classes. And in most cases, these traces are not very useful.

So no luck with profilers! We need to try something else.

JDK 6.0 to the rescue

When Edward Chou and I worked on tracking down classloader leaks last year, we tried to run the JVM with HPROF and tried to trigger a memory dump; we looked at using Hat to interpret the dump. Hat stands for Heap Analysis Tool, and was developed to read dump files generated with HPROF. Unfortunately, the hat tool blew up reading our dump files. Because we didn't think it was difficult to parse the dump file, we wrote a utility to read the file and track the memory leak.

That was last year. This year we have JDK 6.0; this new JDK comes with a few tools that make looking at the VM's memory a lot simpler. First of all, there's a tool called jmap. This command line tool allows you to trigger a dump file without HPROF. It is as simple as typing something like:
    jmap -dump:format=b,file=leak 3144
Here leak is the filename of the dump, and 3144 is the PID of the process. To find the PID, you can use jps.

Secondly, Hat is now part of the JDK. It is now called jhat. You can run it using a command line like:
    jhat -J-Xmx512m leak
Here leak is the name of the dump file, and as you may have guessed, -J-Xmx512m is a parameter to specify how much memory jhat is allowed to allocate.

When you start jhat it reads the dump file and then listens on an HTTP port. You point your browser to that port (7000 by default) and through that you can browse the memory heap dump. It's a very convenient way of looking at what objects are in memory and how they are connected.

So, it seemed like a good idea to check out what can be done with these new tools to find classloader leaks.

... or not?

Unfortunately, jhat, just like the profilers we tried, also stops tracing when it encounters a class. Now what? I decided to download the JDK source code and find out what the problem is.  Building the whole JDK is a difficult task from what I gather from the documentation. Fortunately, jhat is a nicely modularized program; I could just take the com.sun.tools.hat-packages out of the source tree, load them in my favorite editor and compile the code. The patched code was easily packaged and run: I just jar-ed it and added it to the lib/ext directory of the JDK:
    jar -cf C:\\apps\\Java\\jdk1.6.0\\jre\\lib\\ext\\ahat.jar -C hat\\bin .
    jhat leak

This was really as easy as pie. So after running the program in the debugger for some time, I figured out how it works and what changes I wanted to make. The change is that when you follow the references from a classloader, the modified jhat will follow through all traces from all the instances of the classes that it loaded. With that change, finding the cause of a classloader leak is simple.

An example

Let's look at the example from my previous blog as depicted in the picture above. Using NetBeans I created the following servlet and deployed it to Glassfish:

 1 package com.stc.test;
 2
 3 import java.io.\*;
 4 import java.net.\*;
 5 import java.util.logging.Level;
 6 import java.util.logging.Logger;
 7 import javax.servlet.\*;
 8 import javax.servlet.http.\*;
 9
10 public class Leak extends HttpServlet {
11
12     protected void processRequest(HttpServletRequest request, HttpServletResponse response)
13     throws ServletException, IOException {
14         response.setContentType("text/html;charset=UTF-8");
15         PrintWriter out = response.getWriter();
16         out.println("<html><body><pre>");
17         Level custom = new Level("LEAK", 950) {};
18         Logger.getLogger(this.getClass().getName()).log(custom, "New level created");
19         out.println("</pre></body></html>");
20         out.close();
21     }
22+    HTTPServlet methods. Click on the + sign on the left to edit the code
48 }
49

I invoked the servlet to cause the leak. Next I undeployed the servlet. Then I triggered a heap dump:
    jmap -dump:format=b,file=leak 3144
and fired up the modified jhat:
    jhat -J-Xmx512m leak
and brought up the browser. The opening screen shows amongst other things, all classes that are found in the dump:

Finding objects that were leaked is easy since I know that I shouldn't see any objects of the classes that I deployed. Recall that I deployed a class com.stc.test.Leak; so I searched in the browser for the com.stc.test package, and found these classes (never mind the NoLeak class: I used it for testing).

Clicking on the link class com.stc.test.Leak brings up the following screen:


Clicking on the classloader link brings up the following screen:


Scrolling down, I see Reference Chains from Rootset / Exclude weak refs . Clicking on this link invokes the code that I modified; the following screen comes up:


And there's the link to java.util.Logging.Level that we've been looking for!

Easy as pie!

Summarizing, the steps are:

  1. undeploy the application that is leaking
  2. trigger a memory dump
  3. run jhat (with modification)
  4. find a leaked class
  5. locate the classloader
  6. find the "Reference chains from root set"
  7. inspect the chains, locate the accidental reference, and fix the code

I'll contact the JDK team to see if they are willing to accept the changes I made to jhat. If you cannot wait, send me an email or leave a comment.

Update (April 2007): Java SE SDK 6.0 update 1 has the updated code.

Other Permgen space tidbits

After fixing the classloader leak, you of course want to test to see if the memory leak has disappeared. You could again trigger a memory dump and run jhat. What you also could try is to see if the amount of used permgen space memory goes up continuously after each deployment/undeployment of your application.

You can monitor permgen space usage using jconsole. You can see the memory usage go up when you repeatedly deploy and undeploy an application. However, this may not be a classloader / memory leak. As it turns out, it's difficult to predict when the garbage collector cleans up permgen space. Pressing the button in Run GC in jconsole does not do the trick. Only when you encounter a java.lang.OutOfMemoryError: PermGen space exception can you be sure that there really was no memory. This is a bit more involved than it should be!

How can we force the garbage collector to kick in? We can force a java.lang.OutOfMemoryError: PermGen space and then releasing the memory after which we force the garbage collector to kick in. I wrote the following servlet to do that:

package com.stc.test;

import java.io.\*; import java.util.ArrayList; import javax.servlet.\*; import javax.servlet.http.\*;
public class RunGC extends HttpServlet {
private static class XClassloader extends ClassLoader { private byte[] data; private int len;
public XClassloader(byte[] data, int len) { super(RunGC.class.getClassLoader()); this.data = data; this.len = len; }
public Class findClass(String name) { return defineClass(name, data, 0, len); } }
protected void processRequest(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); out.println("<html><body><pre>");
try { // Load class data byte[] buf = new byte[1000000]; InputStream inp = this.getClass().getClassLoader() .getResourceAsStream("com/stc/test/BigFatClass.class"); int n = inp.read(buf); inp.close(); out.println(n + " bytes read of class data");
// Exhaust permgen ArrayList keep = new ArrayList(); int nLoadedAtError = 0; try { for (int i = 0; i < Integer.MAX_VALUE; i++) { XClassloader loader = new XClassloader(buf, n); Class c = loader.findClass("com.stc.test.BigFatClass"); keep.add(c); } } catch (Error e) { nLoadedAtError = keep.size(); }
// Release memory keep = null; out.println("Error at " + nLoadedAtError);
// Load one more; this should trigger GC XClassloader loader = new XClassloader(buf, n); Class c = loader.findClass("com.stc.test.BigFatClass"); out.println("Loaded one more"); } catch (Exception e) { e.printStackTrace(out); }
out.println("</pre></body></html>"); out.close(); }
In this servlet a custom classloader is instantiated which loads a class in that classloader. That class is really present in the web classloader, but the custom classloader is tricked by not delegating to the parent classloader; instead the classloader is instantiating the class using the bytes of the class obtained through getResourceAsStream().

In the servlet it tries to allocate as many of these custom classes as possible, i.e. until the memory exception occurs. Next, the memory is made eligible for garbage collection, and one more classloader is allocated thereby forcing garbage collection.

The number of custom classes that can be loaded until a memory exception occurs, is a good measure of how much permgen space memory is available. As it turns out, this metric is a much more reliable than the one that you get from jconsole.

And more

Edward Chou is thinking of some other ideas to further automate the process of determining exactly where the cause of a classloader leak is. E.g. it should be possible to identifiy the erroneous reference (the red line in the picture) automatically, since this reference is from one classloader to another. Check his blog in the coming days.

Update (April 2007): You can find an interesting usage of jhat's Object Query Language on Sundarajan's blog to compute histograms of reference chains.

Post a Comment