Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHC Cmm #1387

Merged
merged 102 commits into from Apr 14, 2020
Merged

GHC Cmm #1387

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
dceb5e1
ghc-cmm: Bootstrap
supersven Dec 24, 2019
28ad0a8
ghc-cmm: Headings
supersven Dec 24, 2019
153331f
ghc-cmm: Brackets are Punctuation
supersven Dec 24, 2019
a50348a
ghc-cmm: Function definitions, Int arrays
supersven Dec 24, 2019
c1c275b
ghc-cmm: const is Keyword::Constant
supersven Dec 24, 2019
e15fe24
ghc-cmm: Operators and comparisons
supersven Dec 24, 2019
48f4248
ghc-cmm: ! is an operator
supersven Dec 24, 2019
aadb993
ghc-cmm: Hp and HpLim
supersven Dec 24, 2019
48f5cae
ghc-cmm: General registers
supersven Dec 24, 2019
4f663f6
ghc-cmm: calls
supersven Dec 24, 2019
23299f8
ghc-cmm: offset
supersven Dec 24, 2019
ab07693
ghc-cmm: properties
supersven Dec 24, 2019
c600632
ghc-cmm: properties & entities
supersven Dec 24, 2019
707a074
ghc-cmm: & is an operator
supersven Dec 24, 2019
4d9997f
ghc-cmm: combine Keyword rules
supersven Dec 24, 2019
4312702
ghc-cmm: array accesses
supersven Dec 24, 2019
6df481c
ghc-cmm: I128 would be valid, too
supersven Dec 24, 2019
c68d188
ghc-cmm: combine rules for global registers
supersven Dec 24, 2019
e5021b2
ghc-cmm: Add HpAlloc and BaseReg
supersven Dec 24, 2019
fc6b67f
ghc-cmm: Extract :infos mixin
supersven Dec 24, 2019
14dbe32
ghc-cmm: Extract :names mixin
supersven Dec 24, 2019
4b7d4ba
ghc-cmm: Extract :comments mixin
supersven Dec 24, 2019
e77b481
ghc-cmm: Extract :literals mixin
supersven Dec 24, 2019
a3da0b5
ghc-cmm: Extract ::operators_and_keywords mixin
supersven Dec 24, 2019
2df7adb
ghc-cmm: Inline :section
supersven Dec 24, 2019
c72ed7d
ghc-cmm: Multiline comments
supersven Dec 24, 2019
c48a6d7
ghc-cmm: Fix register name regex
supersven Dec 24, 2019
7eeb3c2
ghc-cmm: #include
supersven Dec 24, 2019
d75e918
ghc-cmm: #include and #if; String literals
supersven Dec 25, 2019
2b87cd2
ghc-cmm: #else
supersven Dec 25, 2019
99ac4fd
ghc-cmm: #define
supersven Dec 25, 2019
94a3bb5
ghc-cmm: Guess *.cmm
supersven Dec 25, 2019
c3f5110
ghc-cmm: functions and types in declarations
supersven Dec 25, 2019
dc3dd7d
ghc-cmm: ccall and jump
supersven Dec 25, 2019
ab545e4
ghc-cmm: foreign
supersven Dec 25, 2019
2bde8ac
ghc-cmm: prim
supersven Dec 25, 2019
25bd5e8
ghc-cmm: Simplify `#if defined`
supersven Dec 25, 2019
61ae71d
ghc-cmm: switch/case, .. operator
supersven Dec 25, 2019
2771bc6
ghc-cmm: never returns; don't match spaces, \s is much more stable
supersven Dec 25, 2019
19a4977
ghc-cmm: match %p string patterns
supersven Dec 25, 2019
7e4acd8
ghc-cmm: type annotations
supersven Dec 25, 2019
e0172af
ghc-cmm: unwind
supersven Dec 25, 2019
ed34c31
ghc-cmm: INFO_TABLE_* and ~
supersven Dec 26, 2019
2e2fe13
ghc-cmm: functions with explicit stack
supersven Dec 26, 2019
49654d0
ghc-cmm: stabilize keyword expressions
supersven Dec 26, 2019
fdd7c76
ghc-cmm: identifiers may contain '
supersven Dec 26, 2019
3dfd1cb
ghc-cmm: more section types
supersven Dec 26, 2019
1df5fe6
ghc-cmm: builtins with %
supersven Dec 26, 2019
4e9325e
ghc-cmm: memory access and more general array type parsing
supersven Dec 26, 2019
995bb37
ghc-cmm: cleanup
supersven Dec 26, 2019
57ab736
ghc-cmm: test and fix return statements
supersven Dec 26, 2019
617d7c2
ghc-cmm: add test
supersven Dec 26, 2019
aad3f71
ghc-cmm: #define with whitespace pattern
supersven Dec 26, 2019
c2344ac
ghc-cmm: refactor: move ws
supersven Dec 27, 2019
3c7fcc8
ghc-cmm: use ws
supersven Dec 27, 2019
16add46
ghc-cmm: more on ws
supersven Dec 27, 2019
85e71b7
ghc-cmm: introduce and use id
supersven Dec 27, 2019
afb867c
ghc-cmm: combine function expressions
supersven Dec 27, 2019
6073f5d
ghc-cmm: cleanup
supersven Dec 27, 2019
b897b0a
ghc-cmm: comment
supersven Dec 27, 2019
d300284
ghc-cmm: re-order states
supersven Dec 27, 2019
22baf87
ghc-cmm: Simplify: Type in variable or parameter declaration
supersven Dec 31, 2019
65154e7
ghc-cmm: refine switch
supersven Dec 31, 2019
87b90b3
ghc-cmm: refine section
supersven Dec 31, 2019
75a59f3
ghc-cmm: "match newline with dot" doesn't seem to work for mix-in reg…
supersven Dec 31, 2019
5cdbc75
ghc-cmm: Make white space matching more stable
supersven Jan 4, 2020
038d135
ghc-cmm: floating point numbers
supersven Jan 4, 2020
b94bf30
ghc-cmm: fix
supersven Jan 4, 2020
8499d15
ghc-cmm: info_tbls: Only markup special constructs, otherwise things …
supersven Jan 4, 2020
83dd651
ghc-cmm: arg & result hints
supersven Jan 4, 2020
fbeb984
ghc-cmm: Cleanup, some documentation
supersven Jan 5, 2020
e665267
ghc-cmm: reorder states
supersven Jan 5, 2020
17087ca
ghc-cmm: escaped newlines
supersven Jan 5, 2020
93dc2b3
ghc-cmm: #define and type lexing
supersven Jan 5, 2020
a342a04
ghc-cmm: macro variables & function calls
supersven Jan 5, 2020
a1342bc
ghc-cmm: inline function calls
supersven Jan 5, 2020
ab89b60
ghc-cmm: <highSp>
supersven Jan 5, 2020
c535499
ghc-cmm: lex complex ids after namespaces
supersven Jan 7, 2020
323a9c9
ghc-cmm: complex function names
supersven Jan 7, 2020
4742c02
ghc-cmm: complex function names - add tests
supersven Jan 7, 2020
a4dfa19
ghc-cmm: smaller demo
supersven Jan 11, 2020
21e5699
ghc-cmm: remove obsolete comment
supersven Jan 11, 2020
4efeba1
ghc-cmm: even more complex names
supersven Jan 11, 2020
01fc41b
ghc-cmm: complex ids in info tables
supersven Jan 11, 2020
cea654d
ghc-cmm: more on complex names
supersven Jan 11, 2020
c90de70
ghc-cmm: reduce lookahead rules, try to yield asap
supersven Jan 11, 2020
1132b3e
ghc-cmm: Move function detection to :names
supersven Jan 11, 2020
30d12fe
ghc-cmm: update visual sample
supersven Jan 11, 2020
027d5a0
Delete personal notes files
supersven Jan 11, 2020
582beae
Wrap comments
pyrmont Apr 5, 2020
e5974e2
Update description
pyrmont Apr 5, 2020
9e88960
Add missing newlines
pyrmont Apr 5, 2020
170c5b7
Remove extraneous newline
pyrmont Apr 5, 2020
42bf005
ghc-cmm: Remove alias
supersven Apr 7, 2020
4bc3348
ghc-cmm: Capture as much whitespace (\s) as possible
supersven Apr 7, 2020
90748b5
ghc-cmm: Simplify id regex
supersven Apr 7, 2020
aa1c6ce
ghc-cmm: Reintroduce aliases
supersven Apr 7, 2020
faaaee9
ghc-cmm: CLOSURE can be an import type or a function name
supersven Apr 8, 2020
d222dc4
ghc-cmm: `Data.Functor.Utils.#._closure` is a valid name
supersven Apr 11, 2020
9fc2c24
ghc-cmm: Keyword rule for `const` captured too much
supersven Apr 11, 2020
294d97f
ghc-cmm: Emit token for whitespace in const rule
supersven Apr 11, 2020
7354d9a
ghc-cmm: The quote `/` of a quoted newline is `Text`
supersven Apr 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
23 changes: 23 additions & 0 deletions lib/rouge/demos/ghc-cmm
@@ -0,0 +1,23 @@
[lvl_s4t3_entry() // [R1]
{ info_tbls: [(c4uB,
label: lvl_s4t3_info
rep: HeapRep 1 ptrs { Thunk }
srt: Nothing)]
stack_info: arg_space: 8 updfr_space: Just 8
}
{offset
c4uB: // global
if ((Sp + -32) < SpLim) (likely: False) goto c4uC; else goto c4uD;
c4uC: // global
R1 = R1;
call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8;
c4uD: // global
I64[Sp - 16] = stg_upd_frame_info;
P64[Sp - 8] = R1;
R2 = P64[R1 + 16];
I64[Sp - 32] = stg_ap_p_info;
P64[Sp - 24] = Main.fib3_closure+1;
Sp = Sp - 32;
call GHC.Num.fromInteger_info(R2) args: 40, res: 0, upd: 24;
}
}
340 changes: 340 additions & 0 deletions lib/rouge/lexers/ghc_cmm.rb
@@ -0,0 +1,340 @@
# -*- coding: utf-8 -*- #
# frozen_string_literal: true

# C minus minus (Cmm) is a pun on the name C++. It's an intermediate language
# of the Glasgow Haskell Compiler (GHC) that is very similar to C, but with
# many features missing and some special constructs.
#
# Cmm is a dialect of C--. The goal of this lexer is to use what GHC produces
# and parses (Cmm); C-- itself is not supported.
#
# https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler/cmm-syntax
#
module Rouge
module Lexers
class GHCCmm < RegexLexer
title "GHC Cmm (C--)"
desc "GHC Cmm is the intermediate representation of the GHC Haskell compiler"
tag 'ghc-cmm'
filenames '*.cmm', '*.dump-cmm', '*.dump-cmm-*'
aliases 'cmm'

ws = %r(\s|//.*?\n|/[*](?:[^*]|(?:[*][^/]))*[*]+/)mx

# Make sure that this is not a preprocessor macro, e.g. `#if` or `#define`.
id = %r((?!#[a-zA-Z])[\w#\$%_']+)

complex_id = %r(
(?:[\w#$%_']|\(\)|\(,\)|\[\]|[0-9])*
(?:[\w#$%_']+)
)mx

state :root do
rule %r/\s+/m, Text

# sections markers
rule %r/^=====.*=====$/, Generic::Heading

# timestamps
rule %r/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ UTC$/, Comment::Single

mixin :detect_section
mixin :preprocessor_macros

mixin :info_tbls
mixin :comments
mixin :literals
mixin :keywords
mixin :types
mixin :infos
mixin :names
mixin :operators

# escaped newline
rule %r/\\\n/, Text

# rest is Text
rule %r/./, Text
end

state :detect_section do
rule %r/(section)(\s+)/ do |m|
token Keyword, m[1]
token Text, m[2]
push :section
end
end

state :section do
rule %r/"(data|cstring|text|rodata|relrodata|bss)"/, Name::Builtin

rule %r/{/, Punctuation, :pop!

mixin :names
mixin :operators
mixin :keywords

rule %r/\s+/, Text
end

state :preprocessor_macros do
rule %r/#(include|endif|else|if)/, Comment::Preproc

rule %r{
(\#define)
(#{ws}*)
(#{id})
}mx do |m|
token Comment::Preproc, m[1]
recurse m[2]
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
token Name::Label, m[3]
end
end

state :info_tbls do
rule %r/({ )(info_tbls)(:)/ do |m|
token Punctuation, m[1]
token Name::Entity, m[2]
token Punctuation, m[3]

push :info_tbls_body
end
end

state :info_tbls_body do
rule %r/}/, Punctuation, :pop!
rule %r/{/, Punctuation, :info_tbls_body

rule %r/(?=label:)/ do
push :label
end

rule %r{(\()(#{complex_id})(,)}mx do |m|
token Punctuation, m[1]
token Name::Label, m[2]
token Punctuation, m[3]
end

mixin :literals
mixin :infos
mixin :keywords
mixin :operators

rule %r/#{id}/, Text
rule %r/\s+/, Text
end

state :label do
mixin :infos
mixin :names
mixin :keywords
mixin :operators

rule %r/[^\S\n]+/, Text # Tab, space, etc. but not newline!
rule %r/\n/, Text, :pop!
end

state :comments do
rule %r/\/{2}.*/, Comment::Single
rule %r/\(likely.*?\)/, Comment
rule %r/\/\*.*?\*\//m, Comment::Multiline
end

state :literals do
rule %r/-?[0-9]+\.[0-9]+/, Literal::Number::Float
rule %r/-?[0-9]+/, Literal::Number::Integer
rule %r/"/, Literal::String::Delimiter, :literal_string
end

state :literal_string do
# quotes
rule %r/\\./, Literal::String::Escape
rule %r/%./, Literal::String::Symbol
rule %r/"/, Literal::String::Delimiter, :pop!
rule %r/./, Literal::String
end

state :operators do
rule %r/\.\./, Operator
rule %r/[+\-*\/<>=!&|~]/, Operator
rule %r/[\[\].{}:;,()]/, Punctuation
end

state :keywords do
rule %r/(const)(\s+)/ do |m|
token Keyword::Constant, m[1]
token Text, m[2]
end

rule %r/"/, Literal::String::Double

rule %r/(switch)([^{]*)({)/ do |m|
token Keyword, m[1]
recurse m[2]
token Punctuation, m[3]
end

rule %r/(arg|result)(#{ws}+)(hints)(:)/ do |m|
token Name::Property, m[1]
recurse m[2]
token Name::Property, m[3]
token Punctuation, m[4]
end

rule %r/(returns)(#{ws}*)(to)/ do |m|
token Keyword, m[1]
recurse m[2]
token Keyword, m[3]
end

rule %r/(never)(#{ws}*)(returns)/ do |m|
token Keyword, m[1]
recurse m[2]
token Keyword, m[3]
end

rule %r{(return)(#{ws}*)(\()} do |m|
token Keyword, m[1]
recurse m[2]
token Punctuation, m[3]
end

rule %r{(if|else|goto|call|offset|import|jump|ccall|foreign|prim|case|unwind|export|reserve|push)(#{ws})} do |m|
token Keyword, m[1]
recurse m[2]
end

rule %r{(default)(#{ws}*)(:)} do |m|
token Keyword, m[1]
recurse m[2]
token Punctuation, m[3]
end
end

state :types do
# Memory access: `type[42]`
# Note: Only a token for type is produced.
rule %r/(#{id})(?=\[[^\]])/ do |m|
token Keyword::Type, m[1]
end

# Array type: `type[]`
rule %r/(#{id}\[\])/ do |m|
token Keyword::Type, m[1]
end

# Capture macro substitutions before lexing typed declarations
# I.e. there is no type in `PREPROCESSOR_MACRO_VARIABLE someFun()`
rule %r{
(^#{id})
(#{ws}+)
(#{id})
(#{ws}*)
(\()
}mx do |m|
token Name::Label, m[1]
recurse m[2]
token Name::Function, m[3]
recurse m[4]
token Punctuation, m[5]
end

# Type in variable or parameter declaration:
# `type /* optional whitespace */ var_name /* optional whitespace */;`
# `type /* optional whitespace */ var_name /* optional whitespace */, var_name2`
# `(type /* optional whitespace */ var_name /* optional whitespace */)`
# Note: Only the token for type is produced here.
rule %r{
(^#{id})
(#{ws}+)
(#{id})
}mx do |m|
token Keyword::Type, m[1]
recurse m[2]
token Name::Label, m[3]
end
end

state :infos do
rule %r/(args|res|upd|label|rep|srt|arity|fun_type|arg_space|updfr_space)(:)/ do |m|
token Name::Property, m[1]
token Punctuation, m[2]
end

rule %r/(stack_info)(:)/ do |m|
token Name::Entity, m[1]
token Punctuation, m[2]
end
end

state :names do
rule %r/(::)(#{ws}*)([A-Z]\w+)/ do |m|
token Operator, m[1]
recurse m[2]
token Keyword::Type, m[3]
end

rule %r/<(#{id})>/, Name::Builtin

rule %r/(Sp|SpLim|Hp|HpLim|HpAlloc|BaseReg|CurrentNursery|CurrentTSO|R\d{1,2}|gcptr)(?!#{id})/, Name::Variable::Global
rule %r/([A-Z]#{id})(\.)/ do |m|
token Name::Namespace, m[1]
token Punctuation, m[2]
push :namespace_name
end

# Inline function calls:
# ```
# arg1 `lt` arg2
# ```
rule %r/(`)(#{id})(`)/ do |m|
token Punctuation, m[1]
token Name::Function, m[2]
token Punctuation, m[3]
end

# Function: `name /* optional whitespace */ (`
# Function (arguments via explicit stack handling): `name /* optional whitespace */ {`
rule %r{(?=
#{complex_id}
#{ws}*
[\{\(]
)}mx do
push :function
end

rule %r/CLOSURE/, Keyword::Type
rule %r/#{complex_id}/, Name::Label
end

state :namespace_name do
rule %r/([A-Z]#{id})(\.)/ do |m|
token Name::Namespace, m[1]
token Punctuation, m[2]
end

rule %r{(#{complex_id})(#{ws}*)([\{\(])}mx do |m|
token Name::Function, m[1]
recurse m[2]
token Punctuation, m[3]
pop!
end

rule %r/#{complex_id}/, Name::Label, :pop!

rule %r/(?=.)/m do
pop!
end
end

state :function do
rule %r/INFO_TABLE_FUN|INFO_TABLE_CONSTR|INFO_TABLE_SELECTOR|INFO_TABLE_RET|INFO_TABLE/, Name::Builtin
rule %r/%#{id}/, Name::Builtin
rule %r/#{complex_id}/, Name::Function
rule %r/\s+/, Text
rule %r/[({]/, Punctuation, :pop!
mixin :comments
end
end
end
end