Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default arguments support #24 #62

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

theseanl
Copy link
Contributor

@theseanl theseanl commented Jan 2, 2024

The goal of this PR is to add support for macros with default arguments in macro expansion, e.g. as used in CLI's -e option. This should fix #24.

This PR mostly modifies unified-latex-util-macros package. Also the argspec parser was modified to correctly provide default arguments. In doing so, I have simplified the AST by removing Group and always returning an array of strings. Some complexity were moved to the argspec serializer. Previously default arguments and embellishment tokens were encoded in the same Groups, but it seemed to me that they need to behave differently in several aspects, so now in pegjs they are represented as different terms group and collections. This irons out several places which only had supported specifying a single character, but actually should have supported arbitrary token. Also #46 was fixed.

Macro extension is done in two phases, attaching arguments and expanding
macros. Only in the first phase has a call to `parseArgspec`, so it
seems natural to deal with default arguments in the first phase. We
additionally attach default arguments to the AST during attachMacroArgs
and then it will be picked up by existing macro expanders. This breaks
an invariant that printRaw(attachArgs(X)) = printRaw(X).
Doing so required a rewrite of pegjs grammar use for parsing xparse
argument specifications. Previously, a lot of things were modeled by an
ArgSpec struct `Group`, but in the end those used in O,R,D and those for
e,E,u required different behavior in many aspects, so the pegjs grammar
now differentiate them via `group` and `collection`. Also, the ArgSpec
does not have Group anymore, everything is parsed to an array of
strings.

This should fix siefkenj#46.
Windows

Previously, the test was failing when run on Windows terminals due to
how it treats quotes passed as arguments. We now use 'cross-spawn'
module to get around of this issue.
@theseanl
Copy link
Contributor Author

theseanl commented Jan 3, 2024

On a second thought, I am uneasy about attaching default args to AST, I'll think about moving that under createMacroExpander. What do you think?

If we choose that way, I may leave attachMacroArgs intact during this PR, and I don't see much point on working on expandMacros to support default arguments, because that will anyway require an overhaul while working on #43...

@siefkenj
Copy link
Owner

siefkenj commented Jan 4, 2024

This is exciting :-). I will review it soon.

Just a comment, though. _renderInfo is the place to stick data that isn't "part of the AST" but that you still want hanging around. It would be a reasonable place to stick default arguments. (It's already where named arguments are placed)

@@ -25,7 +25,7 @@ describe("unified-latex-util-argspec", () => {
"o m o !o m",
"!o r() m",
"O{somedefault} m o",
"m e{^}",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was replaced with m e^. Now the printRaw function does something smarter, and it omits braces when there's a single embellishment token of single width (which is something that wasn't even accepted at pegjs grammar level before)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have it default to always using braces. You can add e^ as a separate test though, just to make sure we support the non-braced version.

@siefkenj
Copy link
Owner

siefkenj commented Jan 4, 2024

Would you be able to make cross-spawn its own PR? That should be a fairly quick change :-)

@theseanl
Copy link
Contributor Author

theseanl commented Jan 4, 2024

Would you be able to make cross-spawn its own PR? That should be a fairly quick change :-)

Sure, done at #63

Copy link
Owner

@siefkenj siefkenj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all this work!

I've made several comments. I think we can simplify the peggy grammar further. As well, we can move most logic about default arguments to the expander. If _renderInfo of the macro is supplied with the parsed defaultArgs, then the expander can have all the smarts to use them appropriately.

= "{" content:($(!"}" !braced_group .) / braced_group)* "}" {
return { type: "group", content: content };
= "{" content:(control_word_or_symbol / non_brace / braced_group)* "}" {
return content;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature is (Array | string)[]? That makes me a little uncomfortable...I prefer the {type: "group", content: ...}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this comment was resolved. Did I miss the change?

embellishmentTokens: args.content.map(groupContent).flat(),
defaultArg: g,
embellishmentTokens: args,
embellishmentDefaultArg: g
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as defaultArg

@@ -25,7 +25,7 @@ describe("unified-latex-util-argspec", () => {
"o m o !o m",
"!o r() m",
"O{somedefault} m o",
"m e{^}",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have it default to always using braces. You can add e^ as a separate test though, just to make sure we support the non-braced version.

@theseanl
Copy link
Contributor Author

theseanl commented Jan 17, 2024

I need to get this going, and I propose to ditch Typescript projects in favor of lerna. Typescript project not allowing circular dependency is the problem. Lerna allows us to do everything that Typescript project does.

In the long term, I see benefits of disallowing circular dependencies, and in order to implement contextual macro expansion #43, I anyway see the needs for some architectural change. However, it'd be much better if we can do it incrementally, and introducing circular dependencies among projects in the meantime would be inevitable. There are many testimonies in microsoft/TypeScript#33685 that some had to abandon TS projects because there is no workaround for circular dependency issues.

I can make a separate PR that does this, merge that here and then make this PR to a state that is reviewable.

Initially we have assumed that 'until's token list can be parsed in a
way similar to that of embellishments, but it turned out that there are
some differences. Thus, I have simplified pegjs grammar for until and
also worked out to make multi-token stop work.
@siefkenj
Copy link
Owner

@theseanl I am strongly against circular dependencies. I think refactoring things into their own package is the solution most of the time.

Why do you think a circular dependency is required?

@theseanl
Copy link
Contributor Author

theseanl commented Jan 17, 2024

I didn't say that it is required. What I think is that attempting such a refactoring you are talking about is likely beyond the scope of this PR, and will slow down the development process.

The reason for this is that in order to parse default arguments, we need unified-latex-util-parse's parse function, and it is a gigantic function that invokes various other packages. Deferring invoking parse during macro expansion phase (which you have suggested) does not eliminate circular dependencies, because such an invocation will anyway be placed on unified-latex-util-macro package, which unified-latex-util-parse depends on.

IMO, we anyway will need some major architectural change to implement contextual macro expansion, and it will require passing down some config or host objects all the way down to a lot of places, including gobbleArguments function. There is already a place where we invoke the plain parse function somewhere (I forgot where exactly it is) and this is fundamentally flawed; Any configuration the user have provided to the parser need to be provided to that parse call as well. Such an issue will be resolved during such a refactoring. We will likely only import interfaces of such objects and thereby achieving loose coupling of packages and likely also an acyclic dependency graph.

I am not sure about your plans for tackling those problems, but I would first focus on default arguments support, and the then to contextual macro parsing which is likely to resolve the tight coupling issue in the meantime.

This package is a nodejs module. IMO circular dependency is not something that is tabooed in Javascript environment, ES modules specification is carefully defined so that it handles circular dependencies in a graceful and predictable way. I don't see a reason to avoid circular dependencies.

@theseanl
Copy link
Contributor Author

theseanl commented Jan 17, 2024

Frankly, I think the root cause is that we are re-parsing argspecs after stringifying it.
It'd be better if we are directly operating on AST to extract argspecs. Not only that it will make the need to re-parse default arguments unnecessary, but would also make supporting argument processors like > { \SplitArgument { 2 } { , } } m easier.

@siefkenj
Copy link
Owner

It seems appropriate to split expandMacros into its own package so that including unfied-latex-util-macros doesn't create a circular mess :-). That could be done in a separate PR or in this one (though would certainly be easiest to review in a separate PR).

@theseanl
Copy link
Contributor Author

That indeed seems a quick and easy solution to our situation without getting my hands dirty with cleaning up all the mess :)

Previously, typescript project references were used to wire imports
between workspace packages. Now this is done by package.json export
conditions. This nodejs feature has already been used to support
consuming packages in both esm and cjs format. We add an artificial
condition "_bundle", and use the respective bundler's export conditions
feature https://esbuild.github.io/api/#conditions and https://www.typescriptlang.org/tsconfig#customConditions
to achieve that IDE and TS compiler resolve any workspace package
imports to their respective Typescript source file, not their build
output .d.ts files.

This has an additional benefit of reducing bundle size of cjs build.
Previously, CJS build operated on ESM build output; They fed
dist/index.js files to esbuild. This changes this so that esbuild
directly operates on Typescript source files even in CJS build. As
esbuild will have more information, it will be able to better tree-shake
unused codes.
also used double quotes for interoparabilities with Windows
Until argspec's behavior was fixed, and while doing so, a support for
multi-token stops was added. Also, now it properly supports macro
delimiters (which was really just a by-product of applying uniform
treatment to any logic related to finding braces). This fixes
siefkenj#46.
@theseanl
Copy link
Contributor Author

theseanl commented Jan 20, 2024

Based on new observations above, I've simplified the pegjs part a bit and worked on gobbling until arguments. While doing so, I thought I could support multiple stop tokens, so I did it. It is currently 'dirty' in that it may introduce unnecessary string splits while doing stop tokens matching. Having extraneous string split should be harmless, but we may anyway 'clean' it if needed...

It currently lacks support for default arguments referencing other arguments, such as m o{#1}. This would be a goal of the next iteration.

It currently does have circular dependency. I'll try to see if it can be removed in next iterations, but I'm not sure if it will be done. The other PR should have removed any build-related issues that may arise from circular dependency.

In order to deal with default arguments referencing other arguments, it
seems to be better to move the logic from attachArgs to expandMacros.
Now expandMacros can correctly expand macros whose default arguments
reference other arguments. A special care was taken in order to follow
the behavior of xparse in case of circular references. In most of the
cases, xparse throws a compilation error, but it works in case like
`O{siefkenj#2} O{siefkenj#1}`. From this, we can reasonably guess that xparse treats
default arguments that directly copies another argument's value in a
special way. This behavior was implemented with a simple algorithm, and
it turns out to be consistent with xparse's behavior in cases considered
in unit tests.
@theseanl
Copy link
Contributor Author

Implemented support for default arguments referencing other arguments! It turned out to be quite tricky, but in the end it boiled down to a simple algorithm. Now I think this PR is feature complete, ready to be reviewed.

  • I have moved logic involving default arguments from attachArgs (the first phase) to expandMacros (the second phase). I still left some changes to unified-latex-util-arguments, I think it simplifies code a bit.

  • Previously nodes of type "hash_number" wasn't part of the Ast, and codes dealing with this hash_number nodes used forceful type casting which wasn't modelling the actual type correctly. I just added "hash_number" to Ast so that we have less type castings and type annotations to be more rigorous. This required adding some switch statement branches in distant packages.

  • It seems that although unified-latex-util-macros was declared to be a dependency of unified-latex-util-parse in tsconfig.json project references, actual code wasn't depending on it. Now the macros package depends on parse, but maybe it is still free of circular dependencies.

Since package.json is re-written before publishing, we are free to point
`import` condition to `.ts` source files. However, CLI tests need to be
able to resolve to `dist/*.js` files, so a custom condition `prebuilt`
pointing to those are kept in every package.json.
@siefkenj
Copy link
Owner

Now that #67 is merged, this should be easier to review :-)

@@ -83,6 +83,8 @@ export function printLatexAst(
return printVerbatimEnvironment(path, print, options);
case "whitespace":
return line;
case "hash_number":
return "#" + node.number.toString(10);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not #${node.number}?

packages/unified-latex-util-argspec/libs/argspec-parser.ts Outdated Show resolved Hide resolved
packages/unified-latex-util-argspec/libs/argspec-parser.ts Outdated Show resolved Hide resolved
packages/unified-latex-util-argspec/libs/argspec-parser.ts Outdated Show resolved Hide resolved
packages/unified-latex-util-macros/libs/newcommand.ts Outdated Show resolved Hide resolved
= "{" content:($(!"}" !braced_group .) / braced_group)* "}" {
return { type: "group", content: content };
= "{" content:(control_word_or_symbol / non_brace / braced_group)* "}" {
return content;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this comment was resolved. Did I miss the change?

@theseanl theseanl force-pushed the default-arguments branch 2 times, most recently from c705354 to 7848c5b Compare January 25, 2024 00:44
@theseanl theseanl force-pushed the default-arguments branch 3 times, most recently from 7b69a61 to 916e10f Compare January 28, 2024 18:41
The code path for optional argument without a default value wasn't being
tested. Added a test and fixed it so that it properly expands to
-NoValue-.
This was referenced Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Macro default argument support
2 participants