Skip to content

TM008 the "do" and "ret" statement separators

Daniel Patterson edited this page May 30, 2013 · 2 revisions

Motivation:

Once we add binary operators (TM008), we run into ambiguousness in our grammar. Consider, as a motivating example, the following piece of code:

obj.foo(1+2) + 3

In the current grammar, when we add binary operators, this parses as:

obj.foo
(1+2) + 3

And indeed, it is completely ambiguous what it should parse as. We've surveyed several languages, and we see the following approaches:

In C family languages, there are semi-colons separating these types of expressions, so it would not be ambiguous. In Python and other whitespace delimited languages, we have newlines / whitespace delimiting (where if the next line were indended the same amount, it would be a separate statement, if it was indented more, it would continue). Javascript deals with this (poorly) with heuristic semicolon insertion - if it sees the end of a line without a semicolon, it tries to figure out if there should have been one, or if the statement continues onto the next line. Go deals with this by trying to make it unambiguous, and when they can't they just disallow patterns (like you can't put the opening brace on the next line when starting blocks). Finally, Ruby does crazy things to make stuff like this possible, which results in the language being incredibly hard to parse.

None of these are things we are crazy about. Semi-colons are noise that give the language an imperative feel that we don't want, Javascripts rules are just bad, and we don't want to get stuck in the restrictions that Python / Go have.

But, we need something to resolve the ambiguity. What we propose is to have a few simple rules:

  1. A block with a single expression in it evaluates to that expression. This already occurs.
  2. A block with multiple statements cannot have bare expressions in it. To cause a side-effecting expression (ie, throw the result away), prefix the expression with "do", and if you want to return the last expression (ie, cause the block to evaluate to it), prefix it with "ret".

Examples:

All of these are taken from real programs. Note that with less imperative code, the only significant change with what we have currently will be "ret" on the tail position when you have let bindings, but that is something we've talked about wanting anyway. Here's an example of such a pure function:

fun birdinit(_):
  x = uniform(_centrex, _positionspread)
  y = uniform(_centrey, _positionspread)
  vx = uniform(0, _speedspread)
  vy = uniform(0, _speedspread)
  ret [x, y, vx, vy]
end

This example is sort of as bad as it can get, using lots of side effecting stuff (print statements).

fun format-results():
  passed = results.filter(\r: (r.passed))
  failed = results.filter(\r: (r.passed.not()))
  passedNum = passed.length()
  failedNum = failed.length()

  do print(passedNum.tostring().append(" tests passed."))
  do print(failedNum.tostring().append(" tests failed."))

  do results.map(\r: (
    cond:
      | r.passed => nothing
      | else =>
        do print("===========================")
        do print("Test: ".append(r.message))
        do print("Failed because: ".append(r.reason))
        do print("Expected:")
        do print(r.expected)
        do cond:
          | builtins.keys(r).member("actual") =>
            do print("Actual:")
            do print(r.actual)
          | builtins.keys(r).member("exception") =>
            do print("Exception:")
            do print(r.exception)
        end
        do print("")
    end
  ))
  ret nothing
end

You also have to use them for top level function calls, like test cases:

simple-cursor = find(simple-tree, \x: (x.equals(3)))
do test("Simple cursor find", simple-cursor, \c: (is-some(c)))
do test("Simple cursor find", simple-cursor, \c: (c.val.path.rest.first.node.value.eq    uals(4)))
do test("Simple cursor find", simple-cursor,
     \c: (c.val.path.rest.first.child.equals(0)))

As a final example, here's a rewrite of the core of union find. It is also possible to not need the "ret" on the "cond", but we're not sure if that makes things less clear (right now you know as soon as you start reading the cond that whatever the cond evaluates to is what the outer block evaluates to).

fun union-helper(n1,n2):
  cond:
    | n1.eq(n2) =>
      n1
    | else =>
      r1 = UFnode-max-rank(n1)
      r2 = UFnode-max-rank(n2)
      ret cond:
        | r1 > r2 =>
           do set-UFnode-parent(n2,n1)
           ret n1
        | r2 > r1 =>
           do set-UFnode-parent(n1,n2)
           ret n2
        | r1 == r2 =>
           do set-UFnode-parent(n2,n1)
           do set-UFnode-max-rank(n1,r1+1)
           ret n1
      end
  end
end

fun find(n):
  parent  = UFnode-parent(n)
  ret cond:
    | parent.not() => n
    | else =>
      ancestor = find(parent)
      do set-UFnode-parent(n,ancestor)
      ret ancestor
  end
end

Semantics:

"do expr" is desugared into "expr" and then "nothing". "ret expr" is equivalent to "expr"; its presence is enforced only at parse time. And the main problem that spawned this is just how to parse unambiguously, which this allows us to do.

Resolution:

We realized that if we added a few minor restrictions to our grammar, ie that function applications must have the opening parenthesis immediately following the function expression, and the opening paren of a binary operator parenthesis must have either a space, another opening paren, or an operator immediately preceding it, we could retain the language as is. We still like ret, but since this TM was about NEEDING to add these things, we consider it rejected.