DANNY YANG

about me · blog · projects · hire me

Compiler Hacking Part 4: Building a WASM Backend

11 Oct 2022 - 2378 words - 11 minute read - RSS

In the fourth part of my Chocopy Compiler Hacking series, I will be discussing how I built the backend to compile Chocopy to WebAssembly.

For those unfamiliar with the topic, Chocopy is a subset of Python 3 with static type annotations. WebAssembly is a binary instruction format designed for high performance web applications. WebAssembly can run in most browsers and other JavaScript environments such as NodeJS.

In this post, I’ll discuss the overall approach and highlight some interesting details of the WebAssembly backend concerning memory management and runtime.

For reference, the source code for the compiler is available on Github, and past projects in this series are documented on this blog:

Overview #

As with previous projects in this series, I defined a set of goals beforehand to give myself the best possible shot at completing the project.

This time, I had three main goals. The first was a learning goal: I wanted to learn more about WASM as a compilation target and learn how it integrates with JavaScript applications. Although it was tempting to rely on imported JavaScript to do the heavy lifting, I really wanted to avoid this - so my second goal was to keep the runtime minimal. Since I knew very little about WASM going into the project, I was worried about how much work it would take - to reduce the risk of failure, my final goal was to reuse existing compiler passes as much as possible and avoid overthinking the memory management aspect.

Following the same pattern as the JVM and CIL backends, I decided to emit a plaintext format (WAT) from the compiler and use a tool (wat2wasm) to convert them into WASM binaries that could be loaded by a JavaScript runtime. Like CIL (and unlike JVM), both the plaintext format and assembler were officially supported parts of the WASM toolchain. I set up the test suite to actually run the compiled WASM programs, allowing me to be confident that this backend matched the behavior of the other backends.

Learning Resources and References #

I found that the WebAssembly docs on MDN were really helpful both as high-level overview of WASM/WAT as well as a technical reference for implementing the JavaScript integration/runtime.

I also referenced the online notes from UC San Diego’s CSE231, a course that implements a Chocopy-to-WASM compiler/REPL. The notes on memory layouts and classes were a useful sanity check to make sure I was on the right track, and I reused their simple memory allocation strategy for this project.

When I implemented the frontend of this compiler a few years ago, someone told me about the existence of PPCI. Prior to starting this project, I revisited it and found that it also had a WASM backend implemented in Python. However, in the end this was not a particularly useful reference for the project since the subset of Python that PPCI’s WASM backend supported was much smaller than Chocopy’s feature set.

Implementation #

Compared to CIL and JVM, WASM is at a lower level of abstraction - it has manual memory management and no built-in concept of classes. This meant that I had to decide memory layouts for each data type, and implement various memory operations and dynamic dispatch from scratch.

The other features in Chocopy - expressions, function calls, control flow, etc - were pretty straightforward to implement, so I won’t discuss them in this blog post. Nested functions and captured nonlocal variables were hoisted using the same passes as the CIL/JVM backends; refer here for more details.

Data Types #

Integers are represented as i64, while booleans and pointers for reference types (strings, lists, objects, nonlocals) are represented as i32. The null pointer (None) is represented by the address 0, and checking if an object is null is the same as checking if its address is 0.

Memory Management #

Memory is allocated using a simple linear allocator that tracks the position of the next unallocated address using a global variable. Allocated memory is ever reclaimed or freed and there is no garbage collection.

Each program generated by the compiler includes a set of memory-related utilities, implemented in around 300 lines of hand-written WASM.

Edit 6/2023 - added raw WASM for reference.

Allocation #

;; allocate and return addr of memory
;; based on https://github.com/ucsd-cse231-s22/chocopy-wasm-compiler-A/blob/2022/stdlib/memory.wat
(func $alloc (param $bytes i32) (result i32)
    (local $addr i32)
    global.get $heap
    local.set $addr
    local.get $bytes
    global.get $heap
    i32.add
    global.set $heap
    local.get $addr
)

Copying Memory #

;; copy $size bytes from $src to $dest
;; this just blindly copies memory and does not do any sort of validation/checks
(func $mem_cpy (param $src i32) (param $dest i32) (param $size i32)
    (local $idx i32)
    (local $temp i32)
    i32.const 0
    local.set $idx
    (block $block
        (loop $loop
            local.get $idx
            local.get $size
            i32.lt_s
            i32.eqz
            br_if $block
            ;; read byte from $src + offset
            local.get $idx
            local.get $src
            i32.add
            i32.load8_u
            local.set $temp
            ;; write byte to $dest + offset
            local.get $idx
            local.get $dest
            i32.add
            local.get $temp
            i32.store8
            ;; increment offset
            local.get $idx
            i32.const 1
            i32.add
            local.set $idx
            br $loop
        )
    )
)

String/List Length #

;; return the length of a string or list as i32
(func $len (param $addr i32) (result i32)
    local.get $addr
    call $nullthrow
    i32.load
)

Null Checks #

;; throw if $addr is null, otherwise return $addr
(func $nullthrow (param $addr i32) (result i32)
    local.get $addr
    i32.eqz
    ;; throw if $addr == 0
    (if
        (then
            unreachable
        )
    )
    local.get $addr
)

Bound Checks #

;; check the bounds of a string or list access, throwing if illegal
(func $check_bounds (param $addr i32) (param $idx i32)
    local.get $addr
    call $len
    local.get $idx
    i32.gt_s
    i32.eqz
    ;; throw if !($len > $idx)
    (if
        (then
            unreachable
        )
    )
    i32.const 0
    local.get $idx
    i32.gt_s
    ;; throw if 0 > $idx
    (if
        (then
            unreachable
        )
    )
)

String Indexing #

;; index a string, returning the character as an i32
(func $get_char (param $addr i32) (param $idx i32) (result i32)
    local.get $addr
    i32.const 4
    i32.add
    local.get $idx
    i32.add
    i32.load8_u
)
;; index a string, allocating a new string for the single character and returning the address
(func $str_idx (param $addr i32) (param $idx i32) (result i32)
    (local $new i32)
    i32.const 8
    call $alloc
    local.set $new
    local.get $new
    i32.const 1
    i32.store
    local.get $new
    i32.const 4
    i32.add
    local.get $addr
    local.get $idx
    call $get_char
    i32.store8
    local.get $new
)

String Concatenation #

;; concatenate two strings, returning the address of the new string
(func $str_concat (param $s1 i32) (param $s2 i32) (result i32)
    (local $len1 i32)
    (local $len2 i32)
    (local $addr i32)
    ;; allocate memory
    local.get $s1
    call $len
    local.tee $len1
    local.get $s2
    call $len
    local.tee $len2
    i32.add
    i32.const 4
    i32.add
    i32.const 8
    i32.div_u
    i32.const 8
    i32.mul
    i32.const 8
    i32.add
    call $alloc
    local.tee $addr
    ;; store length
    local.get $len1
    local.get $len2
    i32.add
    i32.store
    ;; copy string 1
    local.get $s1
    i32.const 4
    i32.add
    local.get $addr
    i32.const 4
    i32.add
    local.get $len1
    call $mem_cpy
    ;; copy string 2
    local.get $s2
    i32.const 4
    i32.add
    local.get $addr
    i32.const 4
    i32.add
    local.get $len1
    i32.add
    local.get $len2
    call $mem_cpy
    local.get $addr
)

String Equality #

;; compare two strings, returning true if the two strings have the same contents
(func $str_cmp (param $left i32) (param $right i32) (result i32)
    (local $result i32)
    (local $length i32)
    (local $idx i32)
    i32.const 1
    local.set $result
    local.get $left
    i32.load
    local.tee $length
    ;; compare $length with len of $right
    local.get $right
    i32.load
    i32.eq
    ;; only compare contents if lengths are equal
    (if
        (then
            i32.const 0
            local.set $idx
            (block $block
                (loop $loop
                    local.get $idx
                    local.get $length
                    i32.lt_s
                    i32.eqz
                    ;; get left char
                    br_if $block
                    local.get $left
                    local.get $idx
                    call $get_char
                    ;; get right char
                    local.get $right
                    local.get $idx
                    call $get_char
                    ;; $result = $result && left char == right char
                    i32.eq
                    local.get $result
                    i32.and
                    local.set $result
                    ;; if !$result then break
                    local.get $result
                    i32.eqz
                    br_if $block
                    local.get $idx
                    i32.const 1
                    i32.add
                    local.set $idx
                    br $loop
                )
            )
        )
        (else
            i32.const 0
            local.set $result
        )
    )
    local.get $result
)

List Concatenation #

;; concatenate two lists, returning the address of the new list
(func $list_concat (param $l1 i32) (param $l2 i32) (result i32)
    (local $len1 i32)
    (local $len2 i32)
    (local $addr i32)
    ;; allocate 8 * (len1 + len2 + 1) bytes
    local.get $l1
    call $len
    local.tee $len1
    local.get $l2
    call $len
    local.tee $len2
    i32.add
    i32.const 1
    i32.add
    i32.const 8
    i32.mul
    call $alloc
    local.tee $addr
    ;; store length
    local.get $len1
    local.get $len2
    i32.add
    i32.store
    ;; copy list 1
    local.get $l1
    i32.const 4
    i32.add
    local.get $addr
    i32.const 4
    i32.add
    local.get $len1
    i32.const 8
    i32.mul
    call $mem_cpy
    ;; copy list 2
    local.get $l2
    i32.const 4
    i32.add
    local.get $addr
    i32.const 4
    i32.add
    local.get $len1
    i32.const 8
    i32.mul
    i32.add
    local.get $len2
    i32.const 8
    i32.mul
    call $mem_cpy
    local.get $addr
)

Strings and Lists #

Strings are represented in utf-8 format, and store their length as an i32 followed by 1 byte for each character. Lists also store their length as an i32, followed by 8 bytes for each element. The length of strings and lists are always stored in the first four bytes, allowing similar implementations for equality checks, iteration, indexing, and concatenation.

For memory safety, indexing operations have bounds checks. List operations also have a null check, since None is a legal initial value for a list-typed variable. These checks are implemented as simple control flow leading to unreachable instructions.

Operators - Added 6/2023 #

The non-straightforward operators are implemented the same way as the CIL backend: modulo is ((a % b) + b) % b), and short-circuiting operators are implemented as ternaries.

Classes #

The vtables for every class are stored at the beginning of the memory buffer, and the allocator is initialized past this value so that the vtable memory is never overwritten by later allocations.

The memory layout for classes and objects is as follows:

  • first four bytes are the offset of the object’s vtable
  • 8 bytes for each attribute

The ordering of attributes and methods in the vtable is consistent between parent classes and their children.

Runtime Support and Imported Functions #

The JavaScript runtime for the generated WASM code is extremely minimal. It defines the memory buffer and four functions that are imported by the WASM code:

  1. print integers
  2. print booleans
  3. print strings
  4. assertions

The following NodeJS script is all the runtime needed for the generated WASM:

const wasm_path = process.argv[2];

function logString(offset) {
    // first 4 bytes is the length
    const length = new Uint32Array(memory.buffer, offset, 1)[0];
    // next [length] bytes is the string contents, encoded as utf-8
    const bytes = new Uint8Array(memory.buffer, offset + 4, length);
    const string = new TextDecoder('utf8').decode(bytes);
    console.log(string);
}

function logInt(val) {
    // cast BigInt to a number for pretty-printing w/o the "n"
    // this may truncate or break for values that take >53 bits
    console.log(Number(val));
}

function logBool(val) {
    // this is a 32 bit number, either 1 or 0
    console.log(val !== 0);
}

function assert(val, line) {
    if (!val) {
        throw new Error("Assertion failed on line " + line.toString());
    }
}

const memory = new WebAssembly.Memory({
    initial: 10,
    maximum: 100
});

const importObject = {
    imports: {
        logString: x => logString(x),
        logInt: x => logInt(x),
        logBool: x => logBool(x),
        assert: (x, y) => assert(x, y)
    },
    js: {
        mem: memory
    },
};

const fs = require('fs');

const wasmBuffer = fs.readFileSync(wasm_path);
WebAssembly.instantiate(wasmBuffer, importObject);

Final Thoughts #

Overall, I’m pretty satisfied with this project. I succeeded in compiling Chocopy programs into WebAssembly with a minimal runtime, and I learned a lot about WASM and how it integrates with JavaScript applications.

I was initially hesitant to start to this project since I wasn’t sure how much work it would take. In the end, adding WASM support to this compiler was less work than expected - around 800 lines of Python, 300 lines of hand-written WASM, and 50 lines of JavaScript for the runtime. In total, it took me around 2-3 weekends to complete.

The process of building this compiler over the last couple years has been pretty enjoyable, and I’ve gotten into a good rhythm of planning and completing reasonable and interesting extensions. However, I get the sense that I’m approaching the limits of this project as a learning tool. I’ve accomplished almost everything I want for this project, aside from possibly making an LLVM backend in the future. Perhaps it’s time for something new - possibly something that lets me experiment with different parsing techniques, or something in a different field entirely. We’ll see…



github · linkedin · email · rss