benf.org :  other :  cfr :  Finally

Finally - and why sometimes it's a pain.

(standard disclaimer - nothing new here, just collecting useful snippets I find / rediscover).

I'm not going to bother describing what finally does - what's interesting is how it does it, and how that's evolved over time.

Originally - finally was implemented by having the 'finally' code block stuck somewhere in the bytecode - and then at each point that needed to call it, the compiler would insert a JSR/JSR_W. (Jump subroutine - for the truly ancient equivalent to GOSUB ;) ) - at the point JSR is called, the return address is pushed on the stack, and at the end of the finally block, RET returns to where execution left off.

This produced nice compact bytecode - anything that needs to leave a block contained in a finally just has a JSR call. However, it has downsides.

Verification

The problem with JSR (spoiler - and the reason it was removed...) is that it makes provable verification extremely tricky. Because multiple places can legitimately jump into a finally with different stack states, it means that the verifier can't guarantee what the content of the stack is easily

The problem (and several solutions) are beautifully described in this paper - Java bytecode verification: algorithms and formalizations by Xavier Leroy - see section 5.1 - I encourage you to read it - it's great.

(As an aside - ) the StackMapTable Attribute was added in 1.6, in order to improve the verification process. In the absence of JSR, I'm not convinced is strictly speaking neccessary - please note, CFR doesn't use it currently, as it looks like it's possible to add valid but incorrect data to it... (I need to branch out into writing an obfuscator!)

Ok, I give up

So, given the fact that it's hard, the approach that was used from JVM/Java 1.6 onwards was.... to give up. (Ok, that's a bit harsh, but!). Instead, every possible site that might jump to a finally block has its own copy of the entire finally block.

Really? But that would mean massive duplication!

Indeed. verifier release info mentions this - to quote:

" The new verifier does not allow instructions jsr and ret. These instructions are used to make subroutines for generating try/finally blocks. Instead the compiler will inline subroutine code which means the byte code in subroutines will be inserted in places where the subroutines are called. As a consequence, the compiler will sometimes generate more byte codes (sic) than with jsr/ret. Since the class file format limits the size of methods, some degenerated methods with excessively large or nested finally blocks might exhaust this limit and fail to compile."

By 'sometimes' here, I think we mean 'always, guaranteed.' ;)

It's quite interesting to see what this means in practice. Here's some code....

    public void ft1(int x) {
        try {
            if (x == 0) {
                System.out.println("0");
                return;
            }
            if (x < 10) {
                System.out.println("10");
                return;
            }
            if (x < 100) {
                System.out.println("100");
                return;
            }
            if (x < 1000) {
                System.out.println("1000");
                return;
            }
        } catch (Exception e) {
            System.out.println("Exception");
        } finally {
            System.out.println("HERE");
        }
    }

And here's the bytecode... (more commentary afterwards, honest!)

  public void ft1(int);
    Code:
       0: iload_1       
       1: ifne          21
       4: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       7: ldc           #3                  // String 0
       9: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      12: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      15: ldc           #5                  // String HERE
      17: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      20: return        
      21: iload_1       
      22: bipush        10
      24: if_icmpge     44
      27: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      30: ldc           #6                  // String 10
      32: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      35: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      38: ldc           #5                  // String HERE
      40: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      43: return        
      44: iload_1       
      45: bipush        100
      47: if_icmpge     67
      50: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      53: ldc           #7                  // String 100
      55: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      58: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      61: ldc           #5                  // String HERE
      63: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      66: return        
      67: iload_1       
      68: sipush        1000
      71: if_icmpge     91
      74: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      77: ldc           #8                  // String 1000
      79: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      82: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      85: ldc           #5                  // String HERE
      87: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      90: return        
      91: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      94: ldc           #5                  // String HERE
      96: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      99: goto          133
     102: astore_2      
     103: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
     106: ldc           #10                 // String Exception
     108: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     111: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
     114: ldc           #5                  // String HERE
     116: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     119: goto          133
     122: astore_3      
     123: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
     126: ldc           #5                  // String HERE
     128: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     131: aload_3       
     132: athrow        
     133: return        
    Exception table:
       from    to  target type
           0    12   102   Class java/lang/Exception
          21    35   102   Class java/lang/Exception
          44    58   102   Class java/lang/Exception
          67    82   102   Class java/lang/Exception
           0    12   122   any
          21    35   122   any
          44    58   122   any
          67    82   122   any
         102   111   122   any

In this case (and pretty much only in this case) because many of the exits from the try block return the same value (a no-value return) it would be nice to use common subexpression elimination, and have a GOTO to a common instance for all legit exits (we'd still need a clone for the exception case.). However, this is probably rare enough that it's not worth doing. It's also the sort of optimisation that Javac likes to leave to the JIT (but I've not checked if duplicate copies of the finally statement end up in JIT output yet!).

So... decompiling?

The try block is split up into 4 distinct regions - as the finally handler has to run outside of try block cover. At every exit point from the try block (so, the returns here), the contents of the finally block are repeated. The finally block also exists inside the catch handler for the exception, and after the legitimate fall through of the try block.

It's awkward to take the above code literally, because we end up with seperate try blocks that all have the same target - however a (semi) reasonable approximation (with too few tries!) is obtained by using --decodefinally false.

Here's the result of using CFR 110 with the --decodefinally false flag. To quote Whitney - it's NOT right, but it's ok.. (kind of). We're missing exception coverage on some of the cases.

    public void ft1(int n) {
        block10 : {
            block9 : {
                block8 : {
                    if (n != 0) break block8;
                    System.out.println("0");
                    System.out.println("HERE");
                    return;
                }
                if (n >= 10) break block9;
                System.out.println("10");
                System.out.println("HERE");
                return;
            }
            if (n >= 100) break block10;
            System.out.println("100");
            System.out.println("HERE");
            return;
        }
        try {
            if (n < 1000) {
                System.out.println("1000");
                System.out.println("HERE");
                return;
            }
            System.out.println("HERE");
        }
        catch (Exception var2_2) {
            try {
                System.out.println("Exception");
                System.out.println("HERE");
            }
            catch (Throwable var3_3) {
                System.out.println("HERE");
                throw var3_3;
            }
        }
    }

To handle this - I search for 'any' handlers which have multiple vectors to them - then find if every possible exit (via legitimate return, goto or otherwise) from all try blocks which share this handler also go through an identical code block. If that's the case, it can be considered to be (or appear to be!) a finally statement, and these try blocks can be merged.

It should be noted that this is (while reasonably good) a brittle test - so it would be quite possible for fairly simple obfuscation to cause problems...

At least normally we get the right answer!

/*
 * Decompiled with CFR 0_110.
 */
package org.benf.cfr.tests;

import java.io.PrintStream;

public class FinallyTest4 {
    public void ft1(int n) {
        try {
            if (n == 0) {
                System.out.println("0");
                return;
            }
            if (n < 10) {
                System.out.println("10");
                return;
            }
            if (n < 100) {
                System.out.println("100");
                return;
            }
            if (n < 1000) {
                System.out.println("1000");
                return;
            }
        }
        catch (Exception var2_2) {
            System.out.println("Exception");
        }
        finally {
            System.out.println("HERE");
        }
    }
}

An afterthought.

I should mention - CFR attempts where possible to handle JSR/JSR_W also.


Last updated 11/2015