This post is intended to be a supplement for the 2014 Polyglot tutorial. First, I’ll briefly go over the structure and naming conventions for Polyglot, focusing on clarifications and corrections from the tutorial. Then, I’ll discuss at a high level several ways we might want to extend Polyglot, and what we need to do to make it happen.
The rest of this post assumes you’ve at least skimmed the Polyglot tutorial; I will frequently refer back to the
Carray extension from that tutorial.
Polyglot has several naming conventions, which are not all listed in the same place in the tutorial. They are as follows:
- File names for an extension are prefixed with the name of the extension.
- Most classes have their own separate interface for extensibility. Concrete classes that implement a particular interface use the same name suffixed with
_c. For example, the type system class for the Java 7 extension is named
JL7TypeSystem_c, and its interface is
JL7TypeSystem. Thus, creating a
JL7TypeSystem_calso requires creating a corresponding
CarrayTypeSysteminterface that extends
These conventions will be important because writing a Polyglot extension doesn’t just involve writing new, self-contained methods; extension code will import, implement, extend, or call on many classes from the base Polyglot compiler.
AST & Extension Objects
Abstract Extension Factory:
carray.ast package, the Extension Factory is actually split between
CarrayExtFactory, with the latter extending the former. It’s not listed on the extension code structure overview, but shows up in a later section. The Abstract Extension Factory is a boilerplate class that adds support for attaching extension objects to AST nodes added in this extension.
TypeNodes are AST nodes which represent user-written type annotations which are parsed. For example, formal parameters and local declarations each contain a
TypeNode. During the Type Building, Disambiguation, and Type Checking passes, the
TypeNode is resolved to an actual Java type (a
Type object), which can be accessed using the
type() method of the
TypeNode. After the typechecking pass, each
Expr node will also hold a
These classes define extension objects, which are all subclasses of the default extension object for that extension (in our case, they all extend
CarrayExt). Extension objects do not follow the typical
_c naming convention. For example, if we wanted to create an extension object that holds extra information specifically for the
LocalDecl node, we would just create a class called
While the lack of extension object interfaces appears to conflict with the goal of extensibility, this is actually an intentional design choice. Extension objects from different extensions should not be part of the same typing hierarchy. If our Polyglot extension builds on top of another another extension (for example, the Java 7 extension), then there there’s already a
JL7Ext attached to each AST node. If
JL7Ext, then there will be 2
JL7Exts attached to each node, which is not desirable.
In general, our extension object should only contain the implementation for functionality exclusive to our extension, since it can call the “parent” extension’s implementation through the language dispatcher. Every extension object has a pointer to the current extension’s language dispatcher (through the
lang() function) and the parent extension’s language dispatcher (through the
superLang() function). To invoke the parent extension’s operations for a particular node, we should call
superLang().typeCheck(...) as opposed to the usual
These interfaces define operations supported by AST nodes. In Polyglot, a few examples include
In an extension, there should be a default ops class which defines operations added to all nodes by the extension (in the case of
Carray, it would be
CarrayOps). If our extension doesn’t add any operations, then the interface can be empty. Additional interfaces can be added to define operations which are not supported by all nodes. For example, if we add an operation that only applies to expressions, we can put it in
Polyglot’s Type System isn’t designed to be as extensible as its AST nodes (there’s no concept of extension objects in the type system), so you’ll have to extend the classes directly if we want to modify or add functionality.
These classes represent typing information for local variables, methods, constructors and more. They are created by factories in the Type System class.
There is a single
MethodInstance for each method that is declared in a class, and after method call nodes are resolved the
Call node itself will contain a pointer to that
MethodInstance. Similarly, each local variable declaration is associated with a
LocalInstance, and all further usages of that local variable contain pointers to that
Creation and resolution of
Instances happens during the Type Building, Disambiguation, and Type Checking passes.
How To Extend Polyglot
Adding new syntax:
- Create classes/interfaces for the new AST nodes.
- Update the Abstract Extension Factory to handle the new nodes.
- If our extension added new operations for AST nodes, see “Adding fields/operations to AST nodes” and treat the new nodes the same as existing nodes.
- Update the Node Factory to add methods to construct the new nodes.
- Modify the
flexlexer file if necessary (if new tokens or keywords are needed).
- Modify the
ppgparser file to account for any changes to the lexer, and add constructions for the new nodes by calling the new methods in the Node Factory.
Adding fields/operations to AST nodes:
If the field/operation should be supported by all AST nodes:
- Just add it to
CarrayExtand (if it is a method)
If the field/operation isn’t supported by all AST nodes:
- Create a new extension class. For example, if we’re adding a new operation to all statements, we would create a
- Add the field/operation to the new class.
CarrayExtFactoryand override the corresponding factory method to return the new extension class. In our example, we would override
- To avoid casting a lot when we use extension objects, I’d recommend writing 2 getter methods. The first method gets the extension object from the node (in our example, a static method with type signature
Stmt -> CarrayStmtExt). The second method overrides the
Ext.nodemethod to return a
Stmtinstead of a
Extending the Type System
In general, adding new types or type system operations is similar to adding new AST nodes / operations, except the factory methods for types are in the Type System class instead of the Node Factory. New operations should be added to both the Type System class and the type classes themselves; oftentimes (but not always), the former simply delegates to the latter. If operations are being added to existing types, we will have to add the new operation by subclassing each existing type class, since extension objects do not exist for types.
Adding new types presents additional challenges if you’re building on top of Polyglot’s Java 5 or Java 7 extensions, because you’ll need to make sure that the new type interacts correctly with generics and the type inference mechanism.
The Type System class has a lot of methods that do things like checking if method calls are valid, finding the correct version of an overloaded method, or checking if a class implements a particular interface. In a larger extension, many of these operations will need to be overridden, which may require a lot of engineering effort. Larger extensions that I’ve seen have Type System classes that are 1000-3000 loc, which is similar in length to Polyglot’s base
TypeSystem_c class. This suggests that many operations cannot simply be delegated to the parent class’s implementation and must be re-implemented from scratch.
Extending the Type System: Alternative Approach
In some cases, we can bypass the engineering complexity of rewriting/extending the numerous type system operations by storing additional typing information in the AST nodes themselves. This approach leverages the AST’s superior extensibility, and is the approach I used in my compiler for research. It is appropriate if our additional type system mechanics do not interact much with Java’s type system - as in, if it could be viewed as a new type system layered on top of Java’s.
In this case, you’d need to create extension objects that contain extra fields for this typing information, and override the
typeCheck methods to check this information before delegating to the parent extension’s implementation.
If we need to store extra information about each method or variable or class, then we have to extend the type system’s
Instance classes and their factory methods. For this, we can consult the source code of Polyglot’s Java 5 extension for an example of extending
MethodInstance to support throw types and generics.
Generating Additional Java classes:
The Polyglot tutorial discusses how to generate translations of added AST nodes relatively well. In particular, the tutorial covers the QuasiQuoter feature, which is very important for conciseness, as nesting AST node constructors gets very verbose.
I have two additional tips for codegen:
- When generating a new AST node that doesn’t cleanly map to the original source, we can set its position with
- To generate multiple Java source files from a single extended language source file, we can create an extension for the
SourceFilenode and override the
extRewritemethod to return a
The overview I gave here represents the guidance that I wish I had when I started working with Polyglot. I hope that it’s useful for anyone that’s planning to write meaningful extensions with Polyglot in the future.
For those that want to learn more, I’ve written a post about my experience writing an extension with Polyglot.