Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literal string concat with other scalars #6033

Closed
staabm opened this issue Nov 18, 2021 · 6 comments
Closed

Literal string concat with other scalars #6033

staabm opened this issue Nov 18, 2021 · 6 comments

Comments

@staabm
Copy link
Contributor

staabm commented Nov 18, 2021

Feature request

I would expect when a literal-string is concatenated with a int/float/bool/null value, that the resulting string is still considered literal

https://phpstan.org/r/20736d5a-ad79-425c-bd41-3cbc44aef509
https://phpstan.org/r/c7ce4e7b-140b-45c1-8e4d-38a7cc9e2b05

as described in the "integer values" section of the RFC this would be usefull behaviour so the literal-type can be used for sql queries, which are the result of concatenation of these values.

example

$id = 5;
$query = 'SELECT * FROM table where id='. $id; // should be considered literal, since $id is an int

$id = $_GET['abc']
$query = 'SELECT * FROM table where id='. $id; // should not be considered literal

$id = $_GET['abc']
$query = 'SELECT * FROM table where id='. mysql_real_escape_string($id); // should be considered literal

with these assumptions the literal string type, could be really usefull for db-querying methods like

class db {
  /** @param literal-string $queryString */
  public function query($queryString) {
  // ..
  }
}
@ondrejmirtes
Copy link
Member

Feel free to do simialr thing what was done here: phpstan/phpstan-src#768

@staabm
Copy link
Contributor Author

staabm commented Nov 18, 2021

ok, cool. wanted to check whether you agree with the sentiment first. but it seems we are on the same page

@craigfrancis
Copy link

Hi @staabm.

With your first example, you can use $id = '5';

As noted in the is_literal() RFC (as you have linked to), the PHP implementation is unlikely to support integers, because PHP can't track if they were developer defined not not (whereas string variables have a data structure with flags for things like this). It's why this type is called literal-string.

As to your third example, with mysql_real_escape_string(), what you're suggesting is more along the lines of Taint Checking, which is different to literal-string. This is because literal-string does not support any escaping functions, because escaping is very easy to get wrong. Your example is an example of the kind of mistakes that happen, because you haven't added quotes around the (not really) escaped value. e.g.

$id = $_GET['abc']

$query = 'SELECT * FROM table WHERE id = ' . mysqli_real_escape_string($link, $id); // INSECURE
$query = 'SELECT * FROM table WHERE id = "' . mysqli_real_escape_string($link, $id) . '"'; // Works, but not ideal.

// Much safer to use a Parameterised Query
$statement = $pdo->prepare('SELECT * FROM table where id = ?');
$statement->execute([$id]);

Notice how your variable hasn't been cast to an integer, and does not use quotes, this allows the attacker to set abc to "id" resulting in a "WHERE id = id" (returning all rows)... or the attacker could do something like "WHERE id = -1 UNION SELECT * FROM secret_table" :-)

It's because escaping is error prone (especially with HTML), that we are moving away from the Taint Checking approach (which tries to support escaping), and towards a much simpler system of "developer defined strings", and relying on parameterised queries (example), or a library, to handle unsafe user values.

And as an aside, when using real parameterised queries (not emulated), it means the SQL string is sent to the database first, for it to be parsed, and the Query Execution Plan to be created... then the user values are sent second, which reduces the risk of implementation flaws being an issue.

@staabm
Copy link
Contributor Author

staabm commented Nov 18, 2021

Hi Craig,

thx for the feedback

With your first example, you can use $id = '5';

the suggested change would of course work. in a more real-world example it renders the literal-strings as is useless though.

As noted in the is_literal() RFC (as you have linked to), the PHP implementation is unlikely to support integers, because PHP can't track if they were developer defined not not (whereas string variables have a data structure with flags for things like this). It's why this type is called literal-string.

I don't think the implementation in phpstan should try to mimic the shortcomings of the php-src runtime.
As we don't have the same constraints we IMO should afford doing the step regarding int (and also float/bool/null) which provides a more usefull feature.

@craigfrancis
Copy link

Hi Markus,

I appreciate where you're coming from... I have explored this before, where I also tried pushing this idea with Scott Arciszewski (who suggested the is_noble() name), but it was well and truly rejected (seen as too confusing).

It's how we got to the literal-string name, because these are "strings", that are defined as "literals"... if we start messing up that definition, it will cause problems.

It's also far from useless... I've got several systems using it, and I've not had a single problem (I'm happy to talk though any examples you might have, probably via email to avoid chatter here).

I believe the bit you are missing is the need for SQL to use Parameterised Queries (integers can and should be provided as parameters, not added to the SQL string, because of that separation of command and values, which can help with things like the Query Cache)... and with other languages (e.g. HTML or CLI), then a library should be used to do the escaping of these values for you (context aware).

And we should match how the PHP runtime can do this; otherwise you start getting multiple definitions, which will confuse developers over what is allowed and what is not (remember this definition is also being, or has been, implemented in other Static Analysis tools as well).

The limitation of "only strings" also applies in other languages, like the technique that's used in Go (e.g. this example library, where the library has defined an unexported "String" type, and developers using that library can call the method with an "Untyped String", where the compiler verifies this via Type Conversion); in Node the goog.string.Const is a "wrapper for strings"; and JavaScript will hopefully soon have isTemplateObject() only takes a Template String.

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants