Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use Python's str.format() for expanding placeholders in commands #840

Closed
phoerious opened this issue Sep 22, 2021 · 46 comments
Closed

Comments

@phoerious
Copy link

Description

Placeholders in command environment variables such as CIBW_BEFORE_BUILD and CIBW_BEFORE_TEST are expanded using Python's str.format():

https://github.com/pypa/cibuildwheel/blob/main/cibuildwheel/util.py#L41

This makes it (nearly) impossible to use curly braces in a command, because a pair of single braces will throw a KeyError exception and a pair of double braces gets interpreted by the GitHub Actions YAML parser. The only option that might work is something ridiculous like {{ '{{' }}.

You should use a different parameter expansion mechanism that doesn't try to expand unknown parameter names.

Build log

No response

CI config

No response

@Czaki
Copy link
Contributor

Czaki commented Sep 22, 2021

Did you try to simply escape curly braces \{?

@henryiii
Copy link
Contributor

My first recommendation would be to set these in pyproject.toml if possible, then you avoid GitHub Actions expansion. (I was going to suggest \{ as a possible workaround after that, but @Czaki beat me to it)

@phoerious
Copy link
Author

\{ is not a valid escape mechanism. Braces in Python format strings are escaped as {{ and }}.

@Czaki
Copy link
Contributor

Czaki commented Sep 22, 2021

YAML allows to escape special characters by putting line in ".

@phoerious
Copy link
Author

I am using a | multiline string with Bash commands. Putting everything in quotes is beyond inconvenient.

@henryiii
Copy link
Contributor

henryiii commented Sep 23, 2021

How are you using { in bash commands, then? It expands there too, it's a shortcut. I'd avoid using that shortcut if that's what you are doing.

You should use a different parameter expansion mechanism that doesn't try to expand unknown parameter names

This would be bug prone, confusing to readers, and wouldn't allow us to add new expansions in the future. {python} would expand to our Python, while {python3} would get passed through and might not error at all. It's much better to be explicit and learn the correct escaping mechanisms. Users would not be able to tell the difference between {project} and something that doesn't get substituted. In the future, if we added a new substitution, we could break everyone's code that was expecting that to magically not get substituted. Debugging would be harder, too. You'd no longer get an error message if you tried an incorrect substitution (like python3).

I'd still recommend placing your config in pyproject.toml. Possibly bundle the logic into a script and avoid complex multiline commands if possible. I don't think it's our fault for having a substitution mechanism. Might be more GitHub Action's fault if you want to blame someone. Honestly, I thought GHA used ${{}}, not just {{}}. I've run into GHA collisions before due to cookiecutter also having a simpler escaping scheme. It's not a bug for either library, you just have to learn how to escape.

@phoerious
Copy link
Author

phoerious commented Sep 23, 2021

I am using Bash variables in the command, so I want the expansion on that level.

It's much better to be explicit and learn the correct escaping mechanisms.

Thing is, there is no escape mechanism. If there were, I'd by happy, but the way I escape this makes it vulnerable on another level and vice versa. Also, there is not a single word in the docs that other placeholders in braces are also expanded, so users have to debug why they are getting random KeyError exceptions without knowing why.

I thought GHA used ${{}}, not just {{}}.

${{ }} is GitHub's syntax, {{ }} is a standard YAML expression. So both are invalid in this context. Besides, ${{XX}} is how you would escape a Bash variable ${XX} for Python format strings, so this is broken on yet another level.

@Czaki
Copy link
Contributor

Czaki commented Sep 23, 2021

if you have a multiline bash script with needs of escape then maybe wrap this in bash script which use arguments?

@phoerious
Copy link
Author

phoerious commented Sep 23, 2021

That kind of defeats the purpose of my workflow file. The workflow file already litters my repository. I don't want to create additional shell scripts for every two lines of Bash code that I need for running my CI.

In my specific case, I could simply remove the curly braces, because the character following the Bash variable happens to be a non-alpha character. But in general, it's still not a good solution.

@Czaki
Copy link
Contributor

Czaki commented Sep 23, 2021

If you think that should be changed to avoid adding bash script files to the repository then maybe suggest another stable mechanism of text substitution?

@phoerious
Copy link
Author

First, it should be documented that anything between braces will be expanded, which isn't obvious. Then there should be a stable way to escape braces that is compatible with YAML/GHA syntax. An alternative suggestion would be to do string replacement instead of format string parsing, which would replace only defined placeholders (imho, it's rather unlikely that a large number of additional placeholders will be added in the future that could break people's workflows).

I could also live with some sort of signal character sequence at the beginning that turns off parameter substitution for the string, although that would imply that it's either a valid shell comment (difficult for Windows vs. Unix) or that the signal is stripped from the string before its passed on to the shell.

@henryiii
Copy link
Contributor

henryiii commented Sep 23, 2021

You should not be using something like ${PWD} instead of $PWD, the latter is both more readable, and more portable. If I type that into my terminal right now, I get "fish: Variables cannot be bracketed.". If it wasn't for this, you'd be able to do {{}}, because it doesn't conflict with GHA, only adding the $ does.

Windows doesn't have ${x} or $x, it uses %x% so there's already no conflict there.

I feel like we already have a solution (the pyproject.yaml file instead of placing code in the CI file, which has less conflicting substitutions, and is more portable too), and custom rolling our own text replacement is likely to be tricky and error-prone (and again, WORSE in my opinion have {project} be a variable expansion and {proiect} not be). That is my opinion; other maintainers may think differently.

We should be a bit clearer in the documentation, and maybe have an example of how to properly escape if anyone can think of a good use for ${} in a command. Bash slicing or arrays, but do people really need that in a command? Poorly named variables that don't stick together with $ should be avoided. Ideally, your package should be designed to build as simply as possible, and long, complex commands shouldn't be in your GitHub Actions files, that makes it harder for users to build your package for development, too.

In my specific case ... But in general, it's still not a good solution. ... I could also live with some sort of signal character sequence

Why is using the proper bash syntax for variables not a good solution? With hundreds of cibulidwheel users, no one has complained about {}, and it turns out you don't have a use case requiring ${} to be passed either. $X is a better form than ${X}, as mentioned above, and is a fine solution.

@phoerious
Copy link
Author

phoerious commented Sep 23, 2021

You should not be using something like ${PWD} instead of $PWD, the latter is both more readable, and more portable.

Both ${PWD} and $PWD are POSIX shell syntax. If fish doesn't accept the latter, then it's a problem with your shell, tbh (fish has various issues with POSIX compatibility). But it's not just a matter of style. The brace syntax is needed if the parameter substitution would otherwise be ambiguous, i.e., if the following character isn't a space or other non-alphanumeric character. But even if it is, it is sometimes more readable than without. One example I have is where I need to install a freshly built wheel before I can build and test another one or things like the Sphinx docs in a separate job:

python -m pip install wheelhouse/*-cp$PYTHON_ABI-*-manylinux*.whl

Even though there is a -, so the braces are not strictly needed, it's more readable with them and most Bash style guides would even require them:

python -m pip install wheelhouse/*-cp${PYTHON_ABI}-*-manylinux*.whl

Bash slicing or arrays, but do people really need that in a command?

I sometimes use string replacements via Bash syntax, which is much shorter than piping something to sed, although that's more of a special case. My main concern would be the general brace syntax for variables.

Why is using the proper bash syntax for variables not a good solution?

As I said, it is proper Bash syntax and if you look at various style guides, they even require them if you are using parameter substitution within a string of literal characters.

@henryiii
Copy link
Contributor

henryiii commented Sep 23, 2021

I didn't say ${X} was not proper bash syntax, I said $X was proper bash syntax (too), and more portable (as in, works on more shells. No shell requires the {}, but at least one shell requires it to not be there, therefore it's more portable. Portable means it works in more places, not that it only follows bash or POSIX specifications). It's helpful to see an example, though I think you are always going to have to deal with some substitution issues when you have multiple things processing strings. I'm assuming we didn't make these normal environment variables to be more cross-platform (?).

If we did have "smart" substitution that ignored missing names, what would you tell someone who expected this to work?

export wheel=*.wheel
python -m pip install wheelhouse/${wheel}

Depending on which command you are setting, this might be wheelhouse/*.whl or wheelhouse/$wheel-cp38-.... And how would you explain the difference to a reader between {dest_dir}, {a,b}, and ${var}? If you escape, then it's explicit, though a bit uglier.

I'll let another maintainer chime in and see if they are interested in using a custom solution. If we used format_map instead of format + DefaultDict, you'd be able to use simple strings, but not string replacements, expansions, array syntax, etc. We'd need a full custom rolled solution, I think, to accept {<whitespace>key<whitespace>}.

@phoerious
Copy link
Author

I doubt that users would expect anything from {a,b} if the docs say {dest_dir} is a placeholder. Even to me it was entirely unexpected that you were trying to replace anything that wasn't documented. I expected a simple string substitution and not a fully-fledged parser behind these placeholders and I would consider myself somewhat of an expert Python user. In fact, it took me a while to realise that the KeyError came from your parser and wasn't an error in my YAML definition or something.

@henryiii
Copy link
Contributor

I'm not horribly against allowing {x} to pass through if x is not a substitute key anymore, but I'm not for it either. I can see the usefulness, but also there are the points I raised above. I'll let @joerick, @mayeut, and/or @YannickJadoul decide.

@joerick
Copy link
Contributor

joerick commented Sep 23, 2021

For sure this is a sharp edge! Thank you for reporting it @phoerious. I'm surprised we have not hit this error sooner. I suppose some amount of nonsense is to be expected when the string processing pipeline looks like YAML -> Github actions -> cibuildwheel (str.format) -> bash.

If it's possible, I'd rather not change our string expansion mechanism, because as henryiii noted, it introduces a (small) namespace issue, and could cause confusing error messages if a substitution was mistyped {proejct}. But I don't feel strongly about it.

The least we can do is:

  • improve the error message when we get KeyError on str.format, including a workaround
  • (maybe) include a note in the docs explaining this gotcha

In terms of workarounds, what are we dealing with here? Apologies if I'm covering old ground, but I want to make sure I understand. So let's say the offending variable is CIBW_BEFORE_BUILD: make ${FOO}bar.

  • as written, this raises a KeyError in cibuildwheel's substitution.
  • You can't do make $FOObar or make $FOO bar because the first references the wrong variable and the second is a different output.
  • So then you try CIBW_BEFORE_BUILD: make ${{FOO}}bar. But this is interpreted as a Github actions expression. This isn't a YAML feature(?), it happens after YAML parsing. And, it seems that there is no way to escape out of this. This post suggests that the workaround here would be to substitute a string, like CIBW_BEFORE_BUILD: make ${{ '${{FOO}}' }}bar. I don't like this solution...!

Then... I think we're out of options in terms of things that we can do inside the workflow file? The remaining options would be to move this config to pyproject.toml or write a bash script?

@henryiii
Copy link
Contributor

As I've pointed out several times, there's also the option of moving the command to pyproject.toml, where make ${{FOO}}bar works just fine, your CI service is not parsing that string at all.

To make it more fun, how about cookiecutter -> YAML -> Github actions -> cibuildwheel (str.format) -> bash, which I also do. :)

@joerick
Copy link
Contributor

joerick commented Sep 23, 2021

As I've pointed out several times, there's also the option of moving the command to pyproject.toml, where make ${{FOO}}bar works just fine, your CI service is not parsing that string at all.

Ah, yes, good point. I updated my previous post. But it would be nice if there was a way to do this with environment variables, too. And the TOML would still get the weird KeyError when doing before_build = "make ${FOO}bar", right? The workaround would be to set before_build = "make ${{FOO}}bar", I think?

@phoerious
Copy link
Author

phoerious commented Sep 23, 2021

You could add a custom escaping syntax with backslashes and translate that to double braces before handing it to string format. But I don't know how that would interact with YAML's own escape sequences. Worst case, you have to double-backslash everything.

And yes, the TOML would be a workaround, but not a good one for three reasons:

  1. I have multiple packages and each has its own TOML, so that is a lot of duplication.
  2. I couldn't use it for anything that is above package build level (i.e. concerning integration of multiple packages, building docs, coverage etc.)
  3. I don't like to mix CI backend code with application code.

@henryiii
Copy link
Contributor

@joerick, is there a reason these are not normal environment variables made available to the command via the shell? To make cross-platform easier?

Yes, you'd still need to know that this is going to be processed with str.format, and double bracket to use brackets.

@joerick
Copy link
Contributor

joerick commented Sep 23, 2021

@joerick, is there a reason these are not normal environment variables made available to the command via the shell? To make cross-platform easier?

I guess that's a question for Github Actions? But I would assume it's so that they can do their special expression syntax inside the curly quotes, like ${{ fromJSON(env.time) }}

This syntax goes back to the introduction of CIBW_BEFORE_BUILD, version 0.2.0 :) Yes, I'd assume that it was to make cross-platform config easier. And I didn't foresee GitHub Actions' expression syntax at the time!! :)

@henryiii
Copy link
Contributor

henryiii commented Sep 23, 2021

And I didn't foresee GitHub Actions' expression syntax at the time! :)

That would have been impressive if you had, since it launched in 2019 and that's from 2017. :)

Maybe we could add them to the environment with CIBW_<UPPER>, then provide a mechanism for disabling str.format processing. Though I don't like adding new switches. If we use this mechanism for future additions, rather than expanding {}, then having a custom string format replacement isn't so bad. {pip} and {python} are deprecated or at least not recommended anymore, IIRC?

@phoerious
Copy link
Author

I haven't had the need for any substitutions other than input and output paths. Executables should be on the path, so there is no need for those.

@henryiii
Copy link
Contributor

That's why {pip} and {python} are deprecated or at least not recommended anymore. However there are other important substitutions besides {package} and {project}; specifically {wheel} and {dest_dir} for the repair command. I think that's the full set of them, though.

@henryiii
Copy link
Contributor

henryiii commented Sep 23, 2021

I have multiple packages and each has its own TOML, so that is a lot of duplication.

But you are running them all in the same repo and with the same config files? Okay. But now GHA is literally the only place that can build your code correctly. Unless you duplicate the config, or use the local runner package that I've forgotten the name of (but keep meaning to play with).

Oh, and you can set the file to a cibuildwheel specific toml file, you don't have to use pyproject.toml. You could have a central cibuildwheel.toml file and then point cibuildwheel at it for each package with --config-file=cibuildwheel.toml.

I couldn't use it for anything that is above package build level (i.e. concerning integration of multiple packages, building docs, coverage etc.)

cibuildwheel is not useable for docs or coverage either. It builds wheels, and can test them, but it does not build docs, etc. If you use the built wheels for building docs after cibuildwheel runs, great, but that's not part of cibuildwheel. It happens afterward.

I don't like to mix CI backend code with application code.

It's not CI backend code, it's a recipe for how to build your package. Ideally you can avoid any special setup, and just use pyproject.toml's python requirements, but if need command customization, then you can set things in pyproject.toml. And then you can build locally! Being able to run:

pipx run cibuildwheel --platform linux

locally is fantastic if your CI breaks down! Or if your ARM or PowerPC builds time out, Travis credits run out, etc. (All actually events that have happened, and where local builds were great).

@YannickJadoul
Copy link
Member

Two thoughts:

  1. Are we expecting users to use weird features of the format strings, like {wheel:42}? If not, a search-and-replace of '{wheel}' might just do the trick? However, that doesn't seem to be very future proof.
  2. What about not trying to replace {thing}s that we don't define, and not give an error? https://stackoverflow.com/a/17215533 seems to indicate it's not that hard. That won't catch @joerick's mentioned typos, but the error might still be quite obvious when you look at the logs (and arguably even more obvious than the cryptic KeyError) ?

@Czaki
Copy link
Contributor

Czaki commented Sep 23, 2021

Maybe I found a solution.
Base on this https://docs.python.org/3/library/stdtypes.html#str.format_map

class Default(dict):
    def __missing__(selfself, key):
        return "{" + key + "}"
    
"text {aa} pr vc {bb}".format_map(Default(aa="eee"))
Out[7]: 'text eee pr vc {bb}'

@henryiii
Copy link
Contributor

The map solution that you are both mentioning I've already brought up in #840 (comment):

If we used format_map instead of format + DefaultDict, you'd be able to use simple strings, but not string replacements, expansions, array syntax, etc. We'd need a full custom rolled solution, I think, to accept {key}.

Basically, nothing interesting on the bash side would work, like mv "$f" "${f%.txt}.text". I don't think we are expecting or seeing anyone use fancy Python string formatting, so simply replacing \{\w*<name>\w*\} wouldn't be that bad - the bash side is likely to get much fancier. The downsides are: easier to pass through typos, but environment variables aren't any better (and likely worse, since they turn into nothing). Special behavior for the 6 strings we define. Not future proof, but hopefully people aren't using lots of common lowercase names here that we might eventually want to add (and we should possibly think about adding $CIBW_PLATFORM and such as well? Adding options here is likely not extremely common since we so far only have six? Assuming our planned env-var pass-through won't try to piggy-back on this syntax?).

I'm not as against it as I was, though I also think most users should be avoiding complex scripting here anyway.

@henryiii
Copy link
Contributor

henryiii commented Sep 23, 2021

PS: existing workarounds would break, {{ would now be actual {{. If there are any.

@phoerious
Copy link
Author

phoerious commented Sep 24, 2021

cibuildwheel is not useable for docs or coverage either.

No, but I might want to use the pre-defined environment variables in another context. Not every workflow looks like your basic hands-off setup where you only invoke the action formula and that's it. My workflow is much more complex and calling CIBW (multiple times) is just one part of it.

It's not CI backend code, it's a recipe for how to build your package.

For me it definitely is. Running CIBW locally may be useful in some rare cases, but it's not the primary way to build the application, particularly since my workflow does a lot more than just building the wheels and some wheel build steps are specific to the CI environment and don't even work without the full CI workflow.

Another solution to the parameter expansion problem perhaps: With the dict approach you could also do shell/environment variable expansion yourself by either replacing {ENV_VARNAME} with the literal environment value or with ${VARNAME}. The latter is probably better, since it also works with locally defined or exported variables. The downside of this potential solution is that it's yet another layer of indirection and other things could go wrong, as you are doing string replacements without taking into account the surrounding Bash context.

@joerick
Copy link
Contributor

joerick commented Sep 24, 2021

PS: existing workarounds would break, {{ would now be actual {{. If there are any.

I did a little scan...

Aside: this is a pretty cool tool, btw! It doesn't search all repos, but they say it's searching 2.1m repos, which is a lot :)

I only found one use of the {{ escape.

That said, if we used the str.format_map solution, I think the existing str.format escape mechanism would be preserved. So both string substitution and format_map seems like valid options to me.

@YannickJadoul
Copy link
Member

The map solution that you are both mentioning I've already brought up in #840 (comment):

Yeah, I saw that, but didn't know you meant exactly the same since you said the special things wouldn't work and we'd need a fully custom rolled solution. What exactly did you mean?

That said, if we used the str.format_map solution, I think the existing str.format escape mechanism would be preserved. So both string substitution and format_map seems like valid options to me.

Yes, that does seem to work, actually! Stealing the code from the SO answer (which is basically the same as @Czaki's):

>>> '{a}{b}{{c}}{d}'.format_map({'a': 42, 'b': 3.14159, 'd': 'd'})
'423.14159{c}d'
>>> '{a}{b}{{c}}{d}'.format_map({'a': 42, 'b': 3.14159})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'd'
>>> class SafeDict(dict):
...     def __missing__(self, key):
...         return '{' + key + '}'
... 
>>> '{a}{b}{{c}}{d}'.format_map(SafeDict({'a': 42, 'b': 3.14159}))
'423.14159{c}{d}'
>>> '{a}{b:.2e}{{c}}{d}'.format_map(SafeDict({'a': 42, 'b': 3.14159}))
'423.14e+00{c}{d}'

@YannickJadoul
Copy link
Member

Would such a thing (which seems backwards-compatible) actually work for @phoerious, or is such a change still missing something for you?

@phoerious
Copy link
Author

I suppose it would work, but it needs to be documented clearly that these strings are parsed.

@henryiii
Copy link
Contributor

The problem with format map is it still parses the strings, meaning the only thing it fixes is ${unknown}. All other bash uses of this structure, like cp {a,b}.txt or mv "$f" "${f%.txt}.text" would be still broken, since special characters are invalid in a format string. The idea was to simplify this by being less invasive, but this still is invasive, a user needs to know that the parsing is happening, that {{ is sort-of not needed, etc. It's a bit more complex now, if you was me, since now {unknown} and {{unknown}} are exactly the same thing, but {wheel} and {{wheel}} are not.

Oh, just tried it, and it actually allows these to be passed through. The only character that's not allowed is :, which gets stripped along with anything after it. I don't remember if/what : does in brackets in bash. And [] behaves weirdly, but I think that might be fixable by dropping a formattable, sliceable class in instead of a raw string.

@henryiii
Copy link
Contributor

henryiii commented Sep 24, 2021

class Special:
    def __init__(self, inner: str) -> None:
        self.inner = inner
    def __format__(self, fmt: str) -> str:
        return str(self.__class__(self.inner + (f":{fmt}" if fmt else "")))
    def __getitem__(self: T, item: int) -> T:
        return self.__class__(f"{self.inner}[{item}]")
    def __str__(self) -> str:
        return f"{{{self.inner}}}"

class SafeDict(dict):
    def __missing__(self, key):
        return Special(key)
>>> '{a,b}{b:.2e}{{c}}{d%s}{e:3}{f[0]}'.format_map(SafeDict({'a': 42, 'b': 3.14159}))
'{a,b}3.14e+00{c}{d%s}{e:3}{f[0]}'

PS: I just discovered block select in iTerm. :D

@phoerious
Copy link
Author

A parsed string would probably not allow for unmatched braces or other invalid Python format syntax, would it?

@YannickJadoul
Copy link
Member

A parsed string would probably not allow for unmatched braces or other invalid Python format syntax, would it?

Yes, that does seem to be the case, when I quickly test this.

@joerick
Copy link
Contributor

joerick commented Oct 17, 2021

A parsed string would probably not allow for unmatched braces or other invalid Python format syntax, would it?

I suppose such expressions would be possible using {{ rather than {. Escaping (e.g. literal {wheel}) would be a bit harder with a regex-based solution.

I think the special format_map solution would be my preference. And yes, including a note about it in the docs.

@phoerious
Copy link
Author

I guess unmatched {{ would then be a YAML syntax error.

@joerick
Copy link
Contributor

joerick commented Oct 18, 2021

@henryiii
Copy link
Contributor

Only ${{ ... }} is special, due to GitHub Actions (and many other pre parsers), a plain {{ shouldn't do anything special inside a YAML string.

@phoerious
Copy link
Author

phoerious commented Oct 19, 2021

${{ ... }} ist GitHub syntax, but {{ .. }} is a YAML expression, which is the whole reason for this thread.

foo: 
  bar: {{baz

is invalid YAML. I guess what you could do is

--- 
foo: 
  bar: "{{baz"

@henryiii
Copy link
Contributor

henryiii commented Oct 19, 2021

The reason for this thread is that

foo: 
  bar: "${baz}"

is impossible to write currently. As written, it fails because {baz} runs through str.format and fails. If you double bracket, then this triggers GHA substitution.

Double brackets are fine in strings. This is not a thread on YAML syntax and strings - if you need a string, you should either be aware of the limitations of the shortcut allowing you to skip quotes, or you should use a quoting mechanism, like quotes or a block quote. Same issue if you try to use python-version: 3.10 in setup-python, you have to realize this is a floating point number so the extra 0 has no effect.

@joerick
Copy link
Contributor

joerick commented Jan 7, 2022

This was fixed in #889

@joerick joerick closed this as completed Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants