G-Machine 2#

This article is the second in the series on implementing lazy evaluation in MoonBit. In the first part, we explored the purposes of lazy evaluation and a typical abstract machine for lazy evaluation, the G-Machine, and implemented some basic G-Machine instructions. In this article, we will further extend the G-Machine implementation from the previous article to support let expressions and basic arithmetic, comparison, and other operations.

let 表达式#

The let expression in coref differs slightly from that in MoonBit. A let expression can create multiple variables but can only be used within a limited scope. Here is an example:

{
  let x = n + m
  let y = x + 42
  x * y
}

Equivalent coref expression:

(let ([x (add n m)]
      [y (add x 42)])
  (mul x y)) ;; xy can only be used within let

It is important to note that coref’s let expressions must follow a sequential order. For example, the following is not valid:

(let ([y (add x 42)]
      [x (add n m)])
  (mul x y))

In contrast, letrec is more complex as it allows the local variables defined to reference each other without considering the order of their definitions.

Before implementing let (and the more complex letrec), we first need to modify the current parameter passing method. The local variables created by let should intuitively be accessed in the same way as parameters, but the local variables defined by let do not correspond to NApp nodes. Therefore, we need to adjust the stack parameters before calling the supercombinator.

The adjustment is done in the implementation of the Unwind instruction. If the supercombinator has no parameters, it is the same as the original unwind. When there are parameters, the top address of the supercombinator node is discarded, and the rearrange function is called.

fn rearrange(self : GState, n : Int) -> Unit {
  let appnodes = self.stack.take(n)
  let args = appnodes.map(fn (addr) {
    guard let NApp(_, arg) = self.heap[addr] else {
      _ => panic()
    }
    arg 
  })
  self.stack = args + appnodes.drop(n - 1)
}

The rearrange function assumes that the first N addresses on the stack point to a series of NApp nodes. It keeps the bottommost one (used as Redex update), cleans up the top N-1 addresses, and then places N addresses that directly point to the parameters.

After this, both parameters and local variables can be accessed using the same command by changing the PushArg instruction to a more general Push instruction.

fn push(self : GState, offset : Int) -> Unit {
  // Push(n) a0 : . . . : an : s
  //     =>  an : a0 : . . . : an : s
  let addr = self.stack.unsafe_nth(offset)
  self.put_stack(addr)
}

The next issue is that we need something to clean up. Consider the following expression:

(let ([x1 e1]
      [x2 e2])
  expr)

After constructing the graph corresponding to the expression expr, the stack still contains addresses pointing to e1 and e2 (corresponding to variables x1 and x2), as shown below (the stack grows from bottom to top):

<Address pointing to expr>
       |
<Address pointing to x2>
       |
<Address pointing to x1>
       |
...remaining stack...

Therefore, we need a new instruction to clean up these no longer needed addresses. It is called Slide. As the name suggests, the function of Slide(n) is to skip the first address and delete the following N addresses.

fn slide(self : GState, n : Int) -> Unit {
  let addr = self.pop1()
  self.stack = Cons(addr, self.stack.drop(n))
}

Now we can compile let. We will compile the expressions corresponding to local variables using the compileC function. Then, traverse the list of variable definitions (defs), compile and update the corresponding offsets in order. Finally, use the passed comp function to compile the main expression and add the Slide instruction to clean up the unused addresses.

Compiling the main expression using the passed function makes it easy to reuse when adding subsequent features.

fn compileLet(
  comp : (RawExpr[String], List[(String, Int)]) -> List[Instruction],
  defs : List[(String, RawExpr[String])],
  expr : RawExpr[String],
  env : List[(String, Int)]
) -> List[Instruction] {
  let (env, codes) = loop env, List::Nil, defs {
    env, acc, Nil => (env, acc)
    env, acc, Cons((name, expr), rest) => {
      let code = compileC(expr, env)
      let env = List::Cons((name, 0), argOffset(1, env))
      continue env, acc + code, rest
    }
  }
  codes + comp(expr, env) + List::of([Slide(defs.length())])
}

The semantics of letrec are more complex - it allows the N variables within the expression to reference each other, so we need to pre-allocate N addresses and place them on the stack. We need a new instruction: Alloc(N), which pre-allocates N NInd nodes and pushes the addresses onto the stack sequentially. The addresses in these indirect nodes are negative and only serve as placeholders.

fn alloc_nodes(self : GState, n : Int) -> Unit {
  let dummynode : Node = NInd(Addr(-1))
  for i = 0; i < n; i = i + 1 {
    let addr = self.heap.alloc(dummynode)
    self.put_stack(addr)
  }
}

The steps to compile letrec are similar to let:

  • Use Alloc(n) to allocate N addresses.

  • Use the loop expression to build a complete environment.

  • Compile the local variables in defs, using the Update instruction to update the results to the pre-allocated addresses after compiling each one.

  • Compile the main expression and use the Slide instruction to clean up.

fn compileLetrec(
  comp : (RawExpr[String], List[(String, Int)]) -> List[Instruction],
  defs : List[(String, RawExpr[String])],
  expr : RawExpr[String],
  env : List[(String, Int)]
) -> List[Instruction] {
  let mut env = env
  loop defs {
    Nil => ()
    Cons((name, _), rest) => {
      env = Cons((name, 0), argOffset(1, env))
      continue rest
    }
  }
  let n = defs.length()
  fn compileDefs(
    defs : List[(String, RawExpr[String])],
    offset : Int
  ) -> List[Instruction] {
    match defs {
      Nil => comp(expr, env) + List::of([Slide(n)])
      Cons((_, expr), rest) =>
        compileC(expr, env) +
        Cons(Update(offset), compileDefs(rest, offset - 1))
    }
  }

  Cons(Alloc(n), compileDefs(defs, n - 1))
}

Adding Primitives#

From this step, we can finally perform basic integer operations such as arithmetic, comparison, and checking if two numbers are equal. First, modify the Instruction type to add related instructions.

  Add
  Sub
  Mul
  Div
  Neg
  Eq // ==
  Ne // !=
  Lt // <
  Le // <=
  Gt // >
  Ge // >=
  Cond(List[Instruction], List[Instruction])

At first glance, implementing these instructions seems simple. Take Add as an example: just pop two top addresses from the stack, retrieve the corresponding numbers from memory, perform the operation, and push the result address back onto the stack.

fn add(self : GState) -> Unit {
  let (a1, a2) = self.pop2() // Pop two top addresses
  match (self.heap[a1], self.heap[a2]) {
    (NNum(n1), NNum(n2)) => {
      let newnode = Node::NNum(n1 + n2)
      let addr = self.heap.alloc(newnode)
      self.putStack(addr)
    }
    ......
  }
}

However, the next problem we face is that this is a lazy evaluation language. The parameters of add are likely not yet computed (i.e., not NNum nodes). We also need an instruction that can force a computation to give a result or never stop computing. We call it Eval (short for Evaluation).

In jargon, the result of such a computation is called Weak Head Normal Form (WHNF).

At the same time, we need to modify the structure of GState and add a state called dump. Its type is List[(List[Instruction], List[Addr])], used by Eval and Unwind instructions.

The implementation of the Eval instruction is not complicated:

  • Pop the top address of the stack.

  • Save the current unexecuted instruction sequence and stack (by putting them into the dump).

  • Clear the current stack and place the previously saved address.

  • Clear the current instruction sequence and place the Unwind instruction.

This is similar to how strict evaluation languages handle saving caller contexts, but practical implementations would use more efficient methods.

fn eval(self : GState) -> Unit {
  let addr = self.pop1()
  self.put_dump(self.code, self.stack)
  self.stack = List::of([addr])
  self.code = List::of([Unwind])
}

This simple definition requires modifying the Unwind instruction to restore the context when Unwind in the NNum branch finds that there is a recoverable context (dump is not empty).

fn unwind(self : GState) -> Unit {
  let addr = self.pop1()
  match self.heap[addr] {
    NNum(_) => {
      match self.dump {
        Nil => self.put_stack(addr)
        Cons((instrs, stack), rest_dump) => {
          self.stack = stack
          self.put_stack(addr)
          self.dump = rest_dump
          self.code = instrs
        }
      }
    }
    NApp(a1, _) => {
      self.put_stack(addr)
      self.put_stack(a1)
      self.put_code(List::of([Unwind]))
    }
    NGlobal(_, n, c) => {
      if self.stack.length() < n {
        abort("Unwinding with too few arguments")
      } else {
        if n != 0 {
          self.rearrange(n)
        } else {
          self.put_stack(addr)
        }
        self.put_code(c)
      }
    }
    NInd(a) => {
      self.put_stack(a)
      self.put_code(List::of([Unwind]))
    }
  }
}

Next, we need to implement arithmetic and comparison instructions. We use two functions to simplify the form of binary operations. The result of the comparison instruction is a boolean value, and for simplicity, we use numbers to represent it: 0 for false, 1 for true.

fn negate(self : GState) -> Unit {
  let addr = self.pop1()
  match self.heap[addr] {
    NNum(n) => {
      let addr = self.heap.alloc(NNum(-n))
      self.put_stack(addr)
    }
    otherwise => {
      abort("negate: wrong kind of node \{otherwise}, address \{addr}")
    }
  }
}

fn lift_arith2(self : GState, op : (Int, Int) -> Int) -> Unit {
  let (a1, a2) = self.pop2()
  match (self.heap[a1], self.heap[a2]) {
    (NNum(n1), NNum(n2)) => {
      let newnode = Node::NNum(op(n1, n2))
      let addr = self.heap.alloc(newnode)
      self.put_stack(addr)
    }
    (node1, node2) => abort("liftArith2: \{a1} = \{node1} \{a2} = \{node2}")
  }
}

fn lift_cmp2(self : GState, op : (Int, Int) -> Bool) -> Unit {
  let (a1, a2) = self.pop2()
  match (self.heap[a1], self.heap[a2]) {
    (NNum(n1), NNum(n2)) => {
      let flag = op(n1, n2)
      let newnode = if flag { Node::NNum(1) } else { Node::NNum(0) }
      let addr = self.heap.alloc(newnode)
      self.put_stack(addr)
    }
    (node1, node2) => abort("liftCmp2: \{a1} = \{node1} \{a2} = \{node2}")
  }
}

Finally, implement branching:

fn condition(self : GState, i1 : List[Instruction], i2 : List[Instruction]) -> Unit {
  let addr = self.pop1()
  match self.heap[addr] {
    NNum(0) => {
      // false
      self.code = i2 + self.code
    }
    NNum(1) => {
      // true
      self.code = i1 + self.code
    }
    otherwise => abort("cond : \{addr} = \{otherwise}")
  }
}

No major adjustments are needed in the compilation part, just add some predefined programs:

let compiled_primitives : List[(String, Int, List[Instruction])] = List::of(
  [
    // Arith
    (
      "add",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Add, Update(2), Pop(2), Unwind]),
    ),
    (
      "sub",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Sub, Update(2), Pop(2), Unwind]),
    ),
    (
      "mul",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Mul, Update(2), Pop(2), Unwind]),
    ),
    (
      "div",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Div, Update(2), Pop(2), Unwind]),
    ),
    // Compare
    (
      "eq",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Eq, Update(2), Pop(2), Unwind]),
    ),
    (
      "neq",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Ne, Update(2), Pop(2), Unwind]),
    ),
    (
      "ge",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Ge, Update(2), Pop(2), Unwind]),
    ),
    (
      "gt",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Gt, Update(2), Pop(2), Unwind]),
    ),
    (
      "le",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Le, Update(2), Pop(2), Unwind]),
    ),
    (
      "lt",
      2,
      List::of([Push(1), Eval, Push(1), Eval, Lt, Update(2), Pop(2), Unwind]),
    ),
    // MISC
    ("negate", 1, List::of([Push(0), Eval, Neg, Update(1), Pop(1), Unwind])),
    (
      "if",
      3,
      List::of(
        [
          Push(0),
          Eval,
          Cond(List::of([Push(1)]), List::of([Push(2)])),
          Update(3),
          Pop(3),
          Unwind,
        ],
      ),
    ),
  ],
)

and modify the initial instruction sequence

code : List::of([PushGlobal("main"), Eval]),

总结#

In the next part, we will improve the code generation for primitives and add support for data structures.