Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

Implement parser for mapping key lookup (e.g. m[0][1]) #3943

Draft
wants to merge 16 commits into
base: wrap
Choose a base branch
from

Conversation

gnidan
Copy link
Contributor

@gnidan gnidan commented Mar 23, 2021

Addresses #3877

Had this lying around from when I discovered parjs a few weeks ago. Decided to clean it up as a draft for review, in case it's something we can utilize in @truffle/decoder.

As it stands, this PR adds a new package @truffle/parse-mapping-lookup, whose name is terrible and needs to change. This new package exposes a function parse(), to accept a struct/mapping lookup expression (m[0], ledger.balances[0], etc.) and return an AST representation of that expression.

This handles literal floating point numbers and strings, but it doesn't do anything special with those values. It does handle escape sequences (thanks to parjs's own example). It currently loses information, in that m["0x"] will parse the same as m["\"0x\""]. @haltman-at I figure this is something that would be handled by @truffle/encoder, if we want it, so I didn't get fancy.

This also squashes all possibly useful error handling; that will likely be something to improve even slightly before merging.

See test cases for an overview of supported syntax.

Possibly minor note: I borrowed the original tsconfig for this new package from @truffle/db, so it uses src/, not lib/). This was because it seemed to be the fastest way to get jest working for tests, but we might want to update this to borrow from one of the codec packages instead.

@gnidan gnidan requested a review from haltman-at March 23, 2021 04:14
Copy link
Contributor

@haltman-at haltman-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think my biggest comment is that I think we should distinguish string literals from other literals. But see below for more detailed comments.

I guess some of these comments are like "is this supposed to be Solidity or JS", and I guess the answer is Solidity, so, uh, I guess it's the Solidity branch of those conditionals that you want to take. :P

packages/parse-mapping-lookup/package.json Show resolved Hide resolved
packages/parse-mapping-lookup/package.json Show resolved Hide resolved
packages/parse-mapping-lookup/src/parser.ts Outdated Show resolved Hide resolved
packages/parse-mapping-lookup/src/parser.ts Outdated Show resolved Hide resolved
export const stringP = stringEntriesP.pipe(many(), stringify(), between('"'));
}

const numberP = int().pipe(or(float()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're relying on builtins here? You may want to make sure these actually matches Solidity's numeric literal syntax, which for instance allows underscores (no more than one consecutively) between digits for grouping purposes.

Edit: Removed comment about JS numeric literals since those aren't relevant

packages/parse-mapping-lookup/src/ast.ts Outdated Show resolved Hide resolved

const stringP = Strings.stringP;

const literalP = numberP.pipe(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solidity has other sorts of literals that I think need to be accounted for. Specifically:

  1. Boolean literals (true and false)
  2. Hex string literals -- I assume static bytestring literals 0x1234 (and address literals) fall under numeric literals, but there's also dynamic bytestring / hex string literals hex"deadbeef". (Edit: This also has a single-quoted form.)
  3. It also might be worth supporting Solidity units here, as those are effectively part of numeric literals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, one more I forgot about -- string literals are normally restricted to ASCII, but you can also do unicode string literals as unicode"..."; better account for that too.

Copy link
Contributor Author

@gnidan gnidan Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Boolean literals
  • hex"deadbeef" literals
  • unicode"..." literals
  • Solidity units ( gwei, etc.)

Copy link
Contributor Author

@gnidan gnidan Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so unicode literals are a bit annoying, it seems, because Parjs seems to just assume its input is in the BMP (See their README section Parsing Unicode).

It should be doable to restrict "" literals to ASCII, but it might be a bit annoying to write the parser for that?

What do you think of this:

  • Exclude unicode"..." as unsupported for now (since BMP restriction suggests that we shouldn't make the promise of full-unicode support)
  • Don't restrict "" to ASCII in the meantime, to afford some way of specifying literals that include BMP characters.

Alternatively, if you want to be a stickler about "" meaning ASCII, we could just define our own literal bmp"" to be more explicit.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to actually enforce the ASCII restriction. Especially as that's relatively recent, older Solidity doesn't make a distinction.

I think it's also OK to exclude unicode"..." for now. Proper Unicode handling will be a pain regardless of how we do it, so maybe it's OK if we just don't support it all properly for now. (Especially the \xNN escape codes...)

Copy link
Contributor Author

@gnidan gnidan Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Especially the \xNN escape codes...)

I don't want to know 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad I'm going to explain anyway! :P In Solidity, you can do "\xNN" to insert a particular byte. Note, not a particular character -- a particular byte, regardless of whether or not this makes valid UTF-8. However, if it's invalid UTF-8, then you won't be able to use it as a string, only as a bytes or bytesN (since string literals can be used as either). So, e.g., unicode"\xA1" is not the same as unicode"\u00A1" or unicode"¡", but rather is the same as hex"A1", and so is an error if used as a string rather than a bytes/bytesN.

Copy link
Contributor Author

@gnidan gnidan Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Does the Solidity compiler catch this up-front? Or is it a runtime problem?

Too bad I'm going to explain anyway!

Good cause I was lying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Solidity compiler detects it upfront. It is possible to put bad UTF-8 inside a string at runtime; there are no runtime checks on whether your UTF-8 is good. But attempting to do so directly from a literal (whether unicode or hex or whatever) will trip the compile-time check and cause a compile error. If you really want to put bad UTF-8 inside a string, you'll need to be sneakier about it. (Putting it in a bytes and the converting that bytes to a string will do the trick. :) )

packages/parse-mapping-lookup/src/ast.ts Outdated Show resolved Hide resolved
packages/parse-mapping-lookup/src/ast.ts Outdated Show resolved Hide resolved
@gnidan gnidan changed the base branch from develop to wrap April 30, 2021 23:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants