|
Another way to improve performance is with performance analysis.
Performance analysis is looking at program execution to pinpoint
where bottlenecks or other performance problems such as memory
leaks might occur. Once you know where potential touble spots are,
you can change your code to remove or reduce their impact.
Profiling
The JavaTM Virtual Machines (VMs) have had
the ability to provide simple profile reports since Java Development Kit
(JDKTM) 1.0.2. However, the information they
provided was limited to a sorted list of method calls a program had called.
The Java® 2 platform software provides much better profiling capabilities than
previously available and analysis of this generated data has been made easier
by the emergence of a Heap Analysis Tool (HAT). The heap analysis tool,
as its name implies, lets you analyze profile reports of the heap.
The heap is a block of memory the Java VM uses when it is running.
The heap analysis tool lets you generate reports on objects that were
used to run your application. Not only can you get a listing of the
most frequently called methods and the memory used in calling those
methods, but you can also track down memory leaks. Memory leaks can
have a significant impact on performance.
Download the Heap Analysis tool:
Analyze a Program
To analyze the TableExample3 program included in the
demo/jfc/Table directory in the Java 2 platform download,
you need to generate a profile report. The simplest report to generate is
a text profile. To generate a text profile, run the application with the
-Xhprof parameter. In the final release of the Java 2 platform
software, this option was renamed -Xrunhprof. To see a list of the
currently available options run the command
java -Xrunhprof:help
Hprof usage: -Xrunhprof[:help]|[<option>=<value>, ...]
Option Name and Value |
Description |
Default |
--------------------- |
----------- |
------- |
heap=dump|sites|all |
heap profiling |
all |
cpu=samples|times|old |
CPU usage |
off |
monitor=y|n |
monitor contention |
n |
format=a|b |
ascii or binary output |
a |
file=<file> |
write data to file |
java.hprof(.txt for ascii) |
net=<host>:<port> |
send data over a socket |
write to file |
depth=<size> |
stack trace depth |
4 |
cutoff=<value> |
output cutoff point |
0.0001 |
lineno=y|n |
line number in traces |
y |
thread=y|n |
thread in traces? |
n |
doe=y|n |
dump on exit? |
y |
Example: java -Xrunhprof:cpu=samples,file=log.txt,
depth=3 FooClass
The following invocation creates a text output file that you can view
without the heap analysis tool called java.hprof.txt
when the program generates a stack trace or exits. A different invocation
is used to create a binary file to use with the heap analysis tool.
java -Xrunhprof TableExample3
d:\jdk12\demo\jfc\Table> java -Xrunhprof TableExample3
Dumping Java heap ... allocation sites ... done.
The profile option literally logs every object created on the heap, so
even just starting and stopping the small TableExample3
program results in a four megabyte report file. Although the heap analysis
tool uses a binary version of this file and provides a summary, there are
some quick and easy things you can learn from the text file without using
the heap analysis tool.
Note: To list all available options, use
java -Xrunhprof:help
View the Text File
Choose an editor that can handle large files and go to the end of this
file. There could be hundreds of thousands of lines, so use a shortcut
instead of scrolling, or search for the words SITES BEGIN.
You should see a list of lines that start with an increasing rank number
followed by two percentage numbers. The first entry in this list, should
look similar to the example below:
SITES BEGIN (ordered by live bytes)
Sun Dec 20 16:33:28 1998
|
percent |
live |
alloc'ed |
stack |
class |
rank |
self |
accum |
bytes |
objs |
bytes |
objs |
trace |
name |
1 |
55.86% |
55.86% |
826516 |
5 |
826516 |
5 |
3981 |
[S |
The [S notation at the end of the last line above indicates the
first entry is an array of the short, a primitive type. This is expected
with Swing or Abstract Window Toolkit (AWT) applications. The five count under
the objs header mean there are currently five of these arrays,
there have only been five in the lifetime of this application, and they take
up 826516 bytes. The reference key to this object is the value listed under
stack trace. To find where this object was created in this
example, search for TRACE 3981. You will see the following:
TRACE 3981:
java/awt/image/DataBufferUShort.<init>(
DataBufferUShort.java:50)
java/awt/image/Raster.createPackedRaster(
Raster.java:400)
java/awt/image/DirectColorModel.
createCompatibleWritableRaster(
DirectColorModel.java:641)
sun/awt/windows/WComponentPeer.createImage(
WComponentPeer.java:186)
The TableExample3 code sets a scrollpane that is
700 by 300. When you look at the source of Raster.java, which
is in the src.jar file, you find these statements at line 400:
case DataBuffer.TYPE_USHORT:
d = new DataBufferUShort(w*h);
break;
The values w and h are the width and height from
the createImage call at the start of TRACE 3981.
The DataBufferUShort constructor creates and array of
shorts as follows:
data = new short[size];
where size is w*h. So, in theory there should be an
entry for an array of 210000 elements. You look for each instantiation of this
class by searching for trace=3981. One of the five entries will
look like this:
OBJ 5ca1fc0 (sz=28, trace=3979,
class=java/awt/image/DataBufferUShort@9a2570)
data 5ca1670
bankdata 5ca1f90
offsets 5ca1340
ARR 5ca1340 (sz=4, trace=3980, nelems=1,
elem type=int)
ARR 5ca1670 (sz=420004, trace=3981, nelems=210000,
elem type=short)
ARR 5ca1f90 (sz=12, trace=3982, nelems=1,
elem type=[S@9a2d90)
[0] 5ca1670
You can see that the data value of this raster image references an array
5ca1670 which in turns lists 210000 elements of a
short of size 2. This means 420004 bytes of memory are used
in this array.
From this data you can conclude that the TableExample3 program
uses nearly 0.5Mb to map each table. If the example application is running
on a small memory machine, you should make sure you do not keep unnecessary
references to large tables or images that are built by the
createImage method.
The Heap Analysis Tool
The Heap Analysis tool can analyze the same data for you, but requires a
binary report file as input. You can generate a binary report file as
follows:
java -Xrunhprof:file=TableExample3.hprof,format=b
TableExample3
To generate the binary report, close the TableExample3 window.
The binary report file TableExample3.hprof is created when the
program exits. The Heap Analysis tool starts an HTTP Server that analyzes
the binary profile file and displays the results in HTML that you can view
with a browser.
You can get a copy of the Heap Analysis Tool from the
java.sun.com site.
Once you install it, you run shell and batch scripts in the installed
bin directory so you can start the Heap Analysis Tool
server as follows:
>hat TableExample3.hprof
Started HCODEP server on port 7000
Reading from /tmp/TableExample3.hprof...
Dump file created Tue Jan 05 13:28:59 PST 1999
Snapshot read, resolving...
Resolving 17854 objects...
Chasing references,
expect 35 dots.......................
Eliminating duplicate
references.........................
Snapshot resolved.
Server is ready.
The above output tells you an HTTP server is started on port 7000
by default. To view this report enter the url
http://localhost:7000 or http://your_machine_name:7000 in your web
browser. If you have problems starting the server using the scripts, you
can alternatively run the application by including the hat.zip
classes file on your CLASSPATH and use the following command:
java hat.Main TableExample3.hprof
The default report view contains a list of all the classes. At the bottom
of this initial page are the following two key report options:
Show all members of the rootset
Show instance counts for all classes
If you select the Show all members of the rootset link, you see
a list of the following references because these reference are likely targets
for potential memory leaks.
Java Static References
Busy Monitor References
JNI Global References
JNI Local References
System Class References
What youn look for here are instances in the application that have references
to objects that have a risk of not being garbage collected. This can sometimes
occur in the case of JNI if memory is allocated for an object, the memory is
left to the garbage collector to free up, and the garbage collector does not
have the information it needs to do it. In this list of references, you are
mainly interested in a large number of references to objects or objects of a
large size.
The other key report is the Show instance counts for all classes.
This lists the number of calls to a particular method. The String
and Character array objects, [S and [C,
are always going to be high on this list, but some objects are a bit more
intriguing. Why are there 323 instances of
java.util.SimpleTimeZone for example?
5109 instances of class java.lang.String
5095 instances of class [C
2210 instances of class java.util.Hashtable$Entry
968 instances of class java.lang.Class
407 instances of class [Ljava.lang.String;
323 instances of class java.util.SimpleTimeZone
305 instances of class
sun.java2d.loops.GraphicsPrimitiveProxy
304 instances of class java.util.HashMap$Entry
269 instances of class [I
182 instances of class [Ljava.util.Hashtable$Entry;
170 instances of class java.util.Hashtable
138 instances of class java.util.jar.Attributes$Name
131 instances of class java.util.HashMap
131 instances of class [Ljava.util.HashMap$Entry;
130 instances of class [Ljava.lang.Object;
105 instances of class java.util.jar.Attributes
To get more information on the SimpleTimeZone instances, click on
the link (the line beginning with 323). This will list all 323 references and
calculate how much memory has been used. In this example, 21964 bytes have
been used.
Instances of java.util.SimpleTimeZone
class java.util.SimpleTimeZone
java.util.SimpleTimeZone@0x004f48c0 (68 bytes)
java.util.SimpleTimeZone@0x003d5ad8 (68 bytes)
java.util.SimpleTimeZone@0x004fae88 (68 bytes)
.....
Total of 323 instances occupying 21964 bytes.
If you click on one of these SimpleTimeZone instances, you see
where this object was allocated.
Object allocated from:
java.util.TimeZoneData.<clinit>(()V) :
TimeZone.java line 1222
java.util.TimeZone.getTimeZone((Ljava/lang/String;)
Ljava/util/TimeZone;) :
TimeZone.java line (compiled method)
java.util.TimeZone.getDefault(
()Ljava/util/TimeZone;) :
TimeZone.java line (compiled method)
java.text.SimpleDateFormat.initialize(
(Ljava/util/Locale;)V) :
SimpleDateFormat.java line (compiled method)
In this example the object was allocated from TimeZone.java.
The source to this file is in the standard src.jar file, and
on examining this file, you can see that indeed there are nearly 300 of
these objects in memory.
static SimpleTimeZone zones[] = {
// The following data is current as of 1998.
// Total Unix zones: 343
// Total Java zones: 289
// Not all Unix zones become Java zones due to
// duplication and overlap.
//-------------------------------------------
new SimpleTimeZone(-11*ONE_HOUR,
"Pacific/Niue" /*NUT*/),
Unfortunately, you have no control over the memory used in this example
because it is allocated when the program first requests a default timezone.
However, this same technique can be applied to analyzing your own application
where you may be able to make some improvements
Where the Application Spends its Time
Again, you can use the -Xrunhprof parameter to get information
about the time the application spent processing a particular method.
You can use one of two CPU profiling options to achieve this. The first option
is cpu=samples. This option reports the result of a sampling of
the running threads of the Java VM to which a statistical count of the
frequency of the occurrence of a particular method is used to find busy
sections of the applications. The second option is cpu=times,
which measures the time taken by individual methods and generates a sorted
list ranked as a total percentage of the CPU time taken by the application.
By using the cpu=times option, you should see something similiar
to this at the end of the output file
CPU TIME (ms) BEGIN (total = 11080)
Fri Jan 8 16:40:59 1999
rank self accum count trace method
1 13.81% 13.81% 1 437 sun/
awt/X11GraphicsEnvironment.initDisplay
2 2.35% 16.16% 4 456 java/
lang/ClassLoader$NativeLibrary.load
3 0.99% 17.15% 46 401 java/
lang/ClassLoader.findBootstrapClass
If you contrast this with the cpu=samples output, you see the
difference between how often a method appears during the runtime of the
application in the samples output compared to how long that method took in
the times output.
CPU SAMPLES BEGIN (total = 14520)
Sat Jan 09 17:14:47 1999
rank self accum count trace method
1 2.93% 2.93% 425 2532 sun/
awt/windows/WGraphics.W32LockViewResources
2 1.63% 4.56% 237 763 sun/
awt/windows/WToolkit.eventLoop
3 1.35% 5.91% 196 1347 java/
text/DecimalFormat.<init>
The W32LockView method, which calls a native windows lock
routine, is called 425 times. So when it is sampled it appears in the
active runnings threads because it also takes time to complete. In contrast,
the initDisplay method is only called once, but it is the method
that takes the longest time to complete in real time.
Operating System Performance Tools
Sometimes the performance bottleneck occurs at the system or operating system
level. This is because Java VM depends on many operating system libraries for
functionality such as disk access or networking. However, what occurs in these
libraries after the Java VM calls them is beyond the reach of most
profiling tools for the Java platform.
Here is a list of tools you can use to analyze performance problems on some
common operating systems.
Solaris Platform
System accounting reports, sar, reports the activity of the system
in terms of disk IO, user program activity, and system level activity. If your
application is using excessive amounts of memory, it may require disk swap
space, which will show up as high percentages in the WIO column. User programs
that get stuck in a busy loop show a high percentage in the user column:
developer$ sar 1 10
SunOS developer 5.6 Generic_105181-09 sun4u
02/05/99
11:20:29 %usr %sys %wio %idle
11:20:30 30 6 9 55
11:20:31 27 0 3 70
11:20:32 25 1 1 73
11:20:33 25 1 0 74
11:20:34 27 0 1 72
The truss command traces and records the details of every system
call called by the Java VM to the Solaris kernel. A common way to run
truss is:
truss -f -o /tmp/output -p <process id>
The -f parameter follows any child processes that are created,
the -o parameter writes the output to the named file, and the
-p parameter traces an already running program from its
process ID. Alternately, you can replace -p <process id>
with the Java VM, for example:
truss -f -o /tmp/output java MyDaemon
The /tmp/output is used to store the truss output,
which should look similiar to the following:
15573: execve("/usr/local/java/jdk1.2/solaris/
bin/java", 0xEFFFF2DC,
0xEFFFF2E8) argc = 4
15573: open("/dev/zero", O_RDONLY) = 3
15573: mmap(0x00000000, 8192,
PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE, 3, 0) = 0xEF7C0000
15573: open("/home/calvin/java/native4/libsocket.so.1",
O_RDONLY) Err#2 ENOENT
15573: open("/usr/lib/libsocket.so.1",
O_RDONLY) = 4
15573: fstat(4, 0xEFFFEF6C) = 0
15573: mmap(0x00000000, 8192, PROT_READ|PROT_EXEC,
MAP_SHARED, 4, 0) = 0xEF7B00 00
15573: mmap(0x00000000, 122880, PROT_READ|PROT_EXEC,
MAP_PRIVATE, 4, 0) = 0xEF7 80000
15573: munmap(0xEF78E000, 57344) = 0
15573: mmap(0xEF79C000, 5393,
PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 4, 49152)
= 0xEF79C000
15573: close(4) = 0
In the truss output, look for files that fail when opened due to
access problems, such as error ENOPERM, or a missing file error
ENOENT. You can also track data read or written with the
truss parameters -rall to log all data read
or -wall to log all data written by the program. With these
parameters, it is possible to anaylze data sent over a network or to a
local disk.
Linux Platform
Linux has a trace command called strace. It traces
systems calls to the underlying Linux kernel. This example traces the
SpreadSheet example in the JDK demo directory.
$ strace -f -o /tmp/output
java sun.applet.AppletViewer
example1.html
$ cat /tmp/output
639 execve("/root/java/jdk117_v1at/java/
jdk117_v1a/bin/java", ["java",
"sun.applet.AppletViewer ",
"example1.html"], [/* 21 vars */]) = 0
639 brk(0) = 0x809355c
639 open("/etc/ld.so.preload", O_RDONLY) = -1
ENOENT (No such file or directory)
639 open("/etc/ld.so.cache", O_RDONLY) = 4
639 fstat(4, {st_mode=0, st_size=0, ...}) = 0
639 mmap(0, 14773, PROT_READ, MAP_PRIVATE,
4, 0) = 0x4000b000
639 close(4) = 0
639 open("/lib/libtermcap.so.2", O_RDONLY) = 4
639 mmap(0, 4096, PROT_READ, MAP_PRIVATE,
4, 0) = 0x4000f000
To obtain system information similar to the Solaris sar
command, read the contents of the file /proc/stat. The format
of this file is described in the proc man page. Look at the
cpu line to get the user and system time.
cpu 4827 4 1636 168329
In the above example, the cpu line indicates 48.27 seconds
in user space, 0.04 at nice priority, 16.36 seconds processing system
calls, and 168 seconds idle. This is
a running total; individual entries for each process are available
in /proc/<process_id>/stat.
Windows95/98/NT Platforms
There are no standard performance analysis tools included on this platform,
but the following tools are available by way of freeware or shareware
resources such as http://www.download.com .
Runtime memory analysis: Memory meter
Network analysis: Traceplus
[TOP]
|