Programming languages is a field that I’ve been interested in ever since I took my first PL course (CS 4110) last semester. Over winter break, I started work in a relatively ambitious (for me) side project of writing a new language, and submitted a pull request to a lambda-calculus tool that my professor created. As a natural consequence of my newfound enthusiasm as well as my desire to challenge myself, I decided to take one of the hardest CS courses at Cornell - compilers (CS 4120/4121).
The main project for the course consists of working with a team to build a compiler for the strongly-typed, imperative language Xi (and its object-oriented extension oXi). The scope of the project was bigger than any school project I had worked on in the past, and past reviews of the class warned that taking it was a massive time commitment. Nonetheless, armed with knowledge from PL and having formed a team with my former colleagues from the CS 3110 course staff, I felt ready to take on the challenges that this course had to offer.
Four assignments and 12,500 significant lines of code later, our compiler is about halfway complete. We have written the lexer, parser, typechecker, and intermediate-representation translation/lowering, and will go on to tackle code generation and various optimizations/extensions. Before starting work on the remaining parts of the compiler, I thought that it would be nice to reflect a bit on the journey so far.
Although 4110 and 4120 are in the same category of courses (courses with material related to PL are numbered x1xx), there actually wasn’t a lot of overlap between the material covered in 4110 and 4120; there were about 2 lectures in 4120 on type systems and type checking, while that subject was the focus for more than half of 4110. Nonetheless, my experience with implementing interpreters and typecheckers in both 3110 and 4110 was extremely helpful for both understanding the Xi type system and implementing the typechecker for this project.
A big difference between my past PL-related experience and this current project is that my past projects had all been implemented in OCaml, while this compiler is written entirely in Java. For me, this was a big change because I was used to thinking in a functional-programming mindset and had more experience with OCaml than I had with Java. I had originally strongly advocated for using OCaml, but in the end we decided to use Java because we felt that it would be better to implement the project in a language more commonly used in industry. I had some reservations about the language choice, but I felt that it was worthwhile to improve my object-oriented programming skills so I went along with the decision.
While the basic steps of writing a compiler do not change, how each stage is implemented can vary significantly depending on the implementation language - this meant that I needed to approach problems from a different perspective than I was used to. For instance, AST traversal in my past projects made heavy use of OCaml’s pattern-matching, a feature that is not found in Java. Using the visitor pattern mitigated some of the drawbacks of not having pattern matching, but we were still forced to use a lot of clunky conditional statements which I felt was not nearly as nice to look at or debug. Ultimately, I felt that the visitor pattern helped me think about things from a more object-oriented mindset (such as using dynamic dispatch to my advantage to create more efficient programs).
Another major change moving from a functional language to an object-oriented one was the lack of variant types. In OCaml, if I wanted to define Integer and Array types in a type system I could probably do something like type T = Int | Array of T | ...
but in Java I would probably have to make an abstract Type
class and have TypeInteger
and TypeArray
inherit from that class. For simple things like this writing a class feels a bit excessive, and leads to odd and unintuitive syntax in some cases. For example, when annotating an AST node with its type, we would need to create an instance of the appropriate type class (new TypeInteger()
). The idea of having to create instances of type objects feels pretty inelegant to me, and someone who just treats the newly created annotation like a string or enum would be tempted to reuse the same object in multiple places (which is potentially dangerous if the AST is mutable).
One of my biggest worries about using Java was the possibility of null references, due to lack of anything like nullable types or options in the language. This fear turned out to be relatively unfounded. As long as we were careful there were very few null-pointer exceptions and those that did occur were relatively easy to debug due to Java’s informative stack trace printouts.
Workload-wise the course hasn’t been too bad. My teammates are all very competent and we work well together, and I’m taking a really light course load this semester, so we’ve been able to manage the assignments pretty well. This can potentially change as the assignments are supposed to get more difficult towards the end of the course, but I’m relatively confident that we can handle it. Working on a team for such a large project has given me some insights into efficiency and communication in a team setting, but I’ll save those conclusions for the next post.
To be continued.