DANNY YANG

about me · blog · projects · hire me

Reflections on Cornell's Undergrad Compilers Practicum (pt 2)

26 May 2019 - 1333 words - 6 minute read - RSS

Part 1 of this blog post series

Now that the last compilers project is finished, it’s appropriate for me to write the second part of my reflection on this course. It took was 24,000 lines of Java and an entire semester, but we now have a working compiler that compiles Xi++ (an object-oriented extension of Xi) into x64 assembly.

The last three assignments required much more work than the first four, and were a lot more conceptually challenging to understand and implement. Since these assignments dealt with the compiler backend, I had not a lot of relevant experience going in (3110 and PL only really cover concepts relevant to the compiler frontend, and my experience with assembly was limited).

Since we used a tree-based intermediate representation, for assembly code generation we had to recursively tile the IR subtrees into sequences of assembly instructions. As you might imagine, this involved different cases for different substructures of the IR tree, and the lack of pattern matching in Java made this extremely painful to implement.

To reduce the use of nested if statements, one of my teammates introduced a way of pseudo pattern matching, that I thought would be interesting to discuss here. To match the different types of expressions in our intermediate representation, we added an abstract match method with the following signature:

abstract class IRExpr {
    abstract <T> T match(Function<IRBinOp, T> a,
                    Function<IRCall, T> b,
                    Function<IRConst, T> c,
                    Function<IRMem, T> d,
                    Function<IRName, T> e,
                    Function<IRTemp, T> f);
}

Each child class would override the method with an implementation that would call the correct lambda. For example:

class IRCall {
    @Override
    public <T> T matchLow(Function<IRBinOp, T> a,
                          Function<IRCall, T> b,
                          Function<IRConst, T> c,
                          Function<IRMem, T> d,
                          Function<IRName, T> e,
                          Function<IRTemp, T> f) {
        return b.apply(this);
    }
}

If we wanted to match on some IR expression expr, we would call the match method with 6 appropriate lambdas.

expr.match(
    (a) -> { /* do this if expr is IRBinOp */},
    (b) -> { /* do this if expr is IRCall */},
    (c) -> { /* do this if expr is IRConst */},
    (d) -> { /* do this if expr is IRMem */},
    (e) -> { /* do this if expr is IRName */},
    (f) -> { /* do this if expr is IRTemp */}
)

This system left a lot to be desired when compared to real pattern matching. Our “pattern matching” didn’t have wildcard cases, so a lambda for each case must be explicitly passed in the correct order as arguments to the match method. With real pattern matching in languages like OCaml we would be able to write a case like| IRBinOp(IRConst c, _) where c > 5 -> something. Here, we could only match a single layer at a time, and could not match on the contents of the object at all. In spite of all its drawbacks, in the end it was still better than the alternative, which would have been a bunch of if/else statements with instanceof checks.

For implementing register allocation, Appel’s Modern Compiler Implementation in Java proved to be extremely helpful, because it contained an outline for a graph-coloring register allocation algorithm. That said, getting the implementation right was still tricky - the pseudocode itself was 7 pages long (!) and we wrote a lot of tests to make sure the invariants were properly maintained. In the end we made some modifications to the algorithm to make it compatible with our existing code for spilling variables.

During the final assignment of implementing object-oriented features, figuring out how to add compiler-generated functions to initialize classes and global variables was pretty challenging. In the end we reverse-engineered some example assembly code generated by other compilers, which involved reading through assembly and figuring out how to translate it back into IR.

Because I had no final exams during finals period, I left campus five days before the deadline for this project and worked remotely during that time. I can see now why some people dislike working remotely. Communication and coordination became much more difficult - even though we stayed in contact via messaging and calling, it’s lot easier to communicate and whiteboard ideas when you’re in the same room.

Yet, despite all the challenges we faced, we worked hard and prevailed. While writing this reflection, I look back on last semester with a sense of relief and pride. Before I conclude, I’d like to leave a few tips and best practices for anyone who is thinking of taking compilers and wants the best chance of success. Think of this as a supplement to How to lose in 4120. Listed in no particular order:

  • Test-driven development is a good approach for these projects. Good test coverage is vital to ensure backwards-compatibility and can help uncover bugs. If you do not write tests until the end then you will not have time to fix any bugs that are exposed.

  • Debugging will take much longer than expected; the late days that my team used were all because we couldn’t finish debugging in time.

  • Features should be planned out as a group that way everyone is on the same page. I suggest keeping a planning and TODO document that team members can use to write down their design decisions. It’s very important to constantly communicate and keep everyone in the loop.

  • I highly recommend purchasing a copy of Appel’s Modern Compiler Implementation in Java. The concepts and implementations in the book are a good guide on how to approach things. As mentioned above, I found it especially useful for register allocation. You might be surprised at the depth of this course - Appel mentions that the book is meant to cover two semesters of material, yet a lot of our lectures cover material from the second half of the book.

  • On the positive side, you get to bond with your teammates through shared suffering. I recommend frequent in-person meetings/work sessions since communication is a lot easier face-to-face.

This course was a grueling challenge for both the hard and soft skills needed for software development. Thinking back on all the late nights I spent working on this, of all the weeks where it felt like I was thinking about nothing but compilers, I regret none of it. Compilers has given me a lot more confidence in my software engineering skills and solidified my interest in PL. For anyone who is thinking of taking it, I would describe compilers like this: it will probably be the hardest class you’ll ever take at Cornell, but it will also one of the most rewarding.

Bonus: screenshot of a GUI application written in Xi++ that our compiler compiled

GUI screenshot



github · linkedin · email · rss