| benf.org : other : cfr : Finally |
(standard disclaimer - nothing new here, just collecting useful snippets I find / rediscover).
I'm not going to bother describing what finally does - what's interesting is how it does it, and how that's evolved over time.
Originally - finally was implemented by having the 'finally' code block stuck somewhere in the bytecode - and then at each point that needed to call it, the compiler would insert a JSR/JSR_W. (Jump subroutine - for the truly ancient equivalent to GOSUB ;) ) - at the point JSR is called, the return address is pushed on the stack, and at the end of the finally block, RET returns to where execution left off.
This produced nice compact bytecode - anything that needs to leave a block contained in a finally just has a JSR call. However, it has downsides.
The problem with JSR (spoiler - and the reason it was removed...) is that it makes provable verification extremely tricky. Because multiple places can legitimately jump into a finally with different stack states, it means that the verifier can't guarantee what the content of the stack is easily
The problem (and several solutions) are beautifully described in this paper - Java bytecode verification: algorithms and formalizations by Xavier Leroy - see section 5.1 - I encourage you to read it - it's great.
(As an aside - ) the StackMapTable Attribute was added in 1.6, in order to improve the verification process. In the absence of JSR, I'm not convinced is strictly speaking neccessary - please note, CFR doesn't use it currently, as it looks like it's possible to add valid but incorrect data to it... (I need to branch out into writing an obfuscator!)
So, given the fact that it's hard, the approach that was used from JVM/Java 1.6 onwards was.... to give up. (Ok, that's a bit harsh, but!). Instead, every possible site that might jump to a finally block has its own copy of the entire finally block.
Indeed. verifier release info mentions this - to quote:
"
The new verifier does not allow instructions jsr and ret. These instructions are used to make subroutines for generating try/finally blocks. Instead the compiler will inline subroutine code which means the byte code in subroutines will be inserted in places where the subroutines are called. As a consequence, the compiler will sometimes generate more byte codes (sic) than with jsr/ret. Since the class file format limits the size of methods, some degenerated methods with excessively large or nested finally blocks might exhaust this limit and fail to compile."
By 'sometimes' here, I think we mean 'always, guaranteed.' ;)
It's quite interesting to see what this means in practice. Here's some code....
public void ft1(int x) {
try {
if (x == 0) {
System.out.println("0");
return;
}
if (x < 10) {
System.out.println("10");
return;
}
if (x < 100) {
System.out.println("100");
return;
}
if (x < 1000) {
System.out.println("1000");
return;
}
} catch (Exception e) {
System.out.println("Exception");
} finally {
System.out.println("HERE");
}
}
And here's the bytecode... (more commentary afterwards, honest!)
public void ft1(int);
Code:
0: iload_1
1: ifne 21
4: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
7: ldc #3 // String 0
9: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
12: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
15: ldc #5 // String HERE
17: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
20: return
21: iload_1
22: bipush 10
24: if_icmpge 44
27: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
30: ldc #6 // String 10
32: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
35: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
38: ldc #5 // String HERE
40: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
43: return
44: iload_1
45: bipush 100
47: if_icmpge 67
50: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
53: ldc #7 // String 100
55: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
58: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
61: ldc #5 // String HERE
63: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
66: return
67: iload_1
68: sipush 1000
71: if_icmpge 91
74: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
77: ldc #8 // String 1000
79: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
82: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
85: ldc #5 // String HERE
87: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
90: return
91: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
94: ldc #5 // String HERE
96: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
99: goto 133
102: astore_2
103: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
106: ldc #10 // String Exception
108: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
111: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
114: ldc #5 // String HERE
116: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
119: goto 133
122: astore_3
123: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
126: ldc #5 // String HERE
128: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
131: aload_3
132: athrow
133: return
Exception table:
from to target type
0 12 102 Class java/lang/Exception
21 35 102 Class java/lang/Exception
44 58 102 Class java/lang/Exception
67 82 102 Class java/lang/Exception
0 12 122 any
21 35 122 any
44 58 122 any
67 82 122 any
102 111 122 any
In this case (and pretty much only in this case) because many of the exits from the try block return the same value (a no-value return) it would be nice to use common subexpression elimination, and have a GOTO to a common instance for all legit exits (we'd still need a clone for the exception case.). However, this is probably rare enough that it's not worth doing. It's also the sort of optimisation that Javac likes to leave to the JIT (but I've not checked if duplicate copies of the finally statement end up in JIT output yet!).
The try block is split up into 4 distinct regions - as the finally handler has to run outside of try block cover. At every exit point from the try block (so, the returns here), the contents of the finally block are repeated. The finally block also exists inside the catch handler for the exception, and after the legitimate fall through of the try block.
It's awkward to take the above code literally, because we end up with seperate try blocks that all have the same target - however a (semi) reasonable approximation (with too few tries!) is obtained by using --decodefinally false.
Here's the result of using CFR 110 with the --decodefinally false flag. To quote Whitney - it's NOT right, but it's ok.. (kind of). We're missing exception coverage on some of the cases.
public void ft1(int n) {
block10 : {
block9 : {
block8 : {
if (n != 0) break block8;
System.out.println("0");
System.out.println("HERE");
return;
}
if (n >= 10) break block9;
System.out.println("10");
System.out.println("HERE");
return;
}
if (n >= 100) break block10;
System.out.println("100");
System.out.println("HERE");
return;
}
try {
if (n < 1000) {
System.out.println("1000");
System.out.println("HERE");
return;
}
System.out.println("HERE");
}
catch (Exception var2_2) {
try {
System.out.println("Exception");
System.out.println("HERE");
}
catch (Throwable var3_3) {
System.out.println("HERE");
throw var3_3;
}
}
}
To handle this - I search for 'any' handlers which have multiple vectors to them - then find if every possible exit (via legitimate return, goto or otherwise) from all try blocks which share this handler also go through an identical code block. If that's the case, it can be considered to be (or appear to be!) a finally statement, and these try blocks can be merged.
It should be noted that this is (while reasonably good) a brittle test - so it would be quite possible for fairly simple obfuscation to cause problems...
At least normally we get the right answer!
/*
* Decompiled with CFR 0_110.
*/
package org.benf.cfr.tests;
import java.io.PrintStream;
public class FinallyTest4 {
public void ft1(int n) {
try {
if (n == 0) {
System.out.println("0");
return;
}
if (n < 10) {
System.out.println("10");
return;
}
if (n < 100) {
System.out.println("100");
return;
}
if (n < 1000) {
System.out.println("1000");
return;
}
}
catch (Exception var2_2) {
System.out.println("Exception");
}
finally {
System.out.println("HERE");
}
}
}
I should mention - CFR attempts where possible to handle JSR/JSR_W also.
| Last updated 11/2015 |