Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres: public should not be considered a "application schema" and should be ignored the same as other system schema from getSchemaNames #5609

Closed
allan-simon opened this issue Aug 21, 2022 · 23 comments

Comments

@allan-simon
Copy link

allan-simon commented Aug 21, 2022

Bug Report

For now 10 years public has been returned by getSchemaNames generating a lot of bug and weird behaviour in higher level libraries relying on it.

Q A
Version 3.x

Summary

For now 10 years since commit b89490a (before that it was not working at all )

public is returned by PostgreSQLSchemaManager::getSchemaNames()

Current behaviour

How to reproduce

(a reproducer is soon coming)

  1. connect to any postgresql (a vanilla postgres docker is fine)
  2. call PostgreSQLSchemaManager::getSchemaNames
    -> public is returned among other user-defined schema

Expected behaviour

all application schema are returned , but not public (as well as other postgres-defined schema are also not returned

More context

This bug is at least the root cause of these issues

the more visible issue of it is with doctrine migration:

  1. all migrations contains a down migration trying to CREATE SCHEMA public that EVERYBODY deletes or workaround ( 💯 if you
    have done that too ) because it fails miserably (as it already exists)
  2. to avoid dropping it , we don't drop schemas at all cf https://github.com/doctrine/dbal/pull/5604/files , so if automated generated migrations are not bijectives because DROP SCHEMA will never be generated.

Why public should be considered a "system schema"

  1. public is not created by the user
  2. everybody assume that it is present
  3. even postgres assumes it is present
  4. it has a behaviour that no user-defined schema has (i.e omitting the name of the schema default to 'public' , i.e when you do CREATE TABLE foobar you're actually doing CREATE TABLE public.foobar
  5. if a ill-advised user drop it, a lot of libraries get broken ( doctrine migration by default create is own internal table in the public namespace , so as this SQL runs BEFORE the first migration, you can't even create back public in the first migration as it will fail before)
  6. weak argument: after 10 years of managing postgresql I never ever seen people dropping and even less (re)creating the public schema, the only case I could hypothetically see for it is a extremly ordered DBA that have severa applications on the same instance , and want each of them in its own schema to ease maintenance (but in that case it's the "infrastrcture" responsability to drop it once and for all, not one of the applications )

@morozov I understand the issue is "scary" because this library is a high profile one and after a time even bugs become part of the API , so I understand we may want to be careful on that one, so maybe we can find a deprecation strategy and changing this behaviour only on a minor/major release (at least not just a bug fix one)

@allan-simon allan-simon changed the title Postgres: public should not considered a "application schema" and should be ignored as other system schema Postgres: public should not be considered a "application schema" and should be ignored the same as other system schema from getSchemaNames Aug 21, 2022
@zerkms
Copy link

zerkms commented Aug 21, 2022

it has a behaviour that no user-defined schema has (i.e omitting the name of the schema default to 'public' , i.e when you do CREATE TABLE foobar you're actually doing CREATE TABLE public.foobar

It's only because it's in the user's/session's search_path.

@allan-simon
Copy link
Author

It's only because it's in the user's/session's search_path.

thanks for pointing that out, I was looking if it was overridable and it was right under my nose.

However my other point still stands and this one is slighltly amended , as in the case of a DBA requesting the search_path to be overrided to avoid accidental use of publc , it means even more that public should not be touched at all by the application layer

@zerkms
Copy link

zerkms commented Aug 21, 2022

for the records: I'm also affected by the bug for years and I agree with your points (just couldn't resist of being nit-picky, sorry!) :-D

@allan-simon
Copy link
Author

for the records: I'm also affected by the bug for years and I agree with your points (just couldn't resist of being nit-picky, sorry!) :-D

no offense taken , I also perfectly understand the maintainer point of view of being conservative and I think we're all on the same side of trying to take the best decision, so your nitpicking is welcomed as at least if forces me to really find good arguments :)

@zerkms
Copy link

zerkms commented Aug 21, 2022

Some other thoughts:

https://www.postgresql.org/docs/current/manage-ag-templatedbs.html

So theoretically a DBA may modify the template to NOT contain public schema.

But again, I agree that public exists in 99.999% installations and should be treated with exceptions.

@morozov
Copy link
Member

morozov commented Sep 22, 2022

I'm closing the issue as it doesn't state an objective problem. The opinion of "public should not be considered a "application schema" is subjective. The statement "everybody assume that it is present" doesn't have a proof.

All issues mentioned in the description are closed except for the one which is in a different repository.

If you believe that the current behavior needs to change, please file a new issue and state a problem, not an opinion.

@morozov morozov closed this as not planned Won't fix, can't repro, duplicate, stale Sep 22, 2022
@allan-simon
Copy link
Author

allan-simon commented Sep 22, 2022

i feel a bit disappointed , the stated issues are clear not a opinion, the "opinion" is a consequence of these issues, not the other way around,

it doesn't state an objective problem.

facts:

  • right now dbal can not handle dropping schema
  • right now dbal always report it will create the public schema (causing the "issue in an other repo" it's because it's the only repo using this feature in a way that is tangible for the end user , it does not take a long time to see doctrine migration is just doing a very thin layer on top of this, so there's no way to fix it except here) making people needing to edit their down migration to remove it

when you tried to consider public as a normal schema while adding the possibly for dbal to report SQL to drop schema , you had issues

as a consequence of that i see that the way to solve this is to consider public as special schema

I'm not some kind of militan that think public should be considered special out of the blue and here find a way to spread my propaganda , i'm just a pragmatic people who try to find solution to problem

so please first discard the facts before focusing on the proposed solution

@allan-simon
Copy link
Author

allan-simon commented Sep 22, 2022

All issues mentioned in the description are closed except for the one which is in a different repository.

the other issues have been closed by you these last days , so yes it's a self-fulling prophecy ...

@zerkms
Copy link

zerkms commented Sep 22, 2022

If you believe that the current behavior needs to change, please file a new issue and state a problem

#1110 <--- it already was explained 7 years ago, the problem hasn't changed since then.

@allan-simon
Copy link
Author

allan-simon commented Sep 22, 2022

"everybody assume that it is present" doesn't have a proof

  1. postgresql default value for search_path is public , so if you drop public select * from articles will break
  2. postgresql documentation states that By default such tables (and other objects) are automatically put into a schema named “public”. Every new database contains such a schema. https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PUBLIC so it read as .... well "it is present in every new database" , i don't know where's the subjective interpertration you can
  3. doctrine migrations breaks if you drop the public schema because it is the default one it uses

I'm really sorry but what kind of proof more than that do you need , if I rephrase it as "major projects, including the database itself use it as a default value because except if you're doing very specific stuff, it will be there " would a more objective statement ? if not how do you describe my points ?

also what's really frustrating is that these points are from my above comments, I really took the time to gather them, and you decide to discard them

@greg0ire
Copy link
Member

greg0ire commented Sep 23, 2022

@allan-simon in the bugs you mentioned, is there one that doesn't relate to migrations? Any thoughts about #1110 (comment) ?

@allan-simon
Copy link
Author

allan-simon commented Sep 23, 2022

@greg0ire it's not linked in my issue here, but yes #5596 is also affected by this issue (doctrine:schema:drop , if you try to enable the reporting of drop schem statement in dbal which is for now conveniently disabled )

so no I don't think it make sense to fix it in higher level library , as stated above, doctrine migration on this part is actually simply a think layer on top of the method provided by dbal

@greg0ire
Copy link
Member

Indeed, I've taken a look, and although it could be fixed in the ORM by tweaking SchemaTool::getSchemaFromMetadata(), that would make ORM schema creation attempt to create the public PG schema.

I see that DBAL has a schemaAssetsFilter configuration node, which I could not find documentation on but seems to be about making the DBAL ignore assets it should not manage. This leads me to formulating the following problem:

While it is possible to filter out assets in a schema, it's not possible to filter out schemas in a database.

A solution could be to add another configuration node just for this, and let the user configure it. That way we don't need to make any assumptions about what the user thinks about public. Assumptions should be made in Symfony recipes. That assumption should be made here, in the form of a commented out configuration node.

@allan-simon
Copy link
Author

allan-simon commented Sep 23, 2022

could be a way yes, at the end of the day I just want the problems quoted to be fixed my "opinion" was just IMHO the most sensible solution but if yours makes everyone happy , yes let's go with it

@allan-simon
Copy link
Author

if we just can verify it works first ( because I think to have already come accross it, i don't think it worked)

@greg0ire
Copy link
Member

greg0ire commented Sep 23, 2022

After taking a deeper look, I'm afraid my proposal would break getCurrentSchema():

public function getExistingSchemaSearchPaths()
{
if ($this->existingSchemaPaths === null) {
$this->determineExistingSchemaSearchPaths();
}
assert($this->existingSchemaPaths !== null);
return $this->existingSchemaPaths;
}
/**
* Returns the name of the current schema.
*
* @return string|null
*
* @throws Exception
*/
protected function getCurrentSchema()
{
$schemas = $this->getExistingSchemaSearchPaths();
return array_shift($schemas);
}
/**
* Sets or resets the order of the existing schemas in the current search path of the user.
*
* This is a PostgreSQL only function.
*
* @internal The method should be only used from within the PostgreSQLSchemaManager class hierarchy.
*
* @return void
*
* @throws Exception
*/
public function determineExistingSchemaSearchPaths()
{
$names = $this->listSchemaNames();
$paths = $this->getSchemaSearchPaths();
$this->existingSchemaPaths = array_filter($paths, static function ($v) use ($names): bool {
return in_array($v, $names, true);
});
}

Also, I'm not sure at what point in the schema comparison the method you're changing in #5600 is called. Maybe we should introduce another method that would return the filtered list of schemas when doing schema comparison, I don't know. I personally don't have to deal with the ORM or migrations so I'm less interested than you are in getting this fixed.

@allan-simon
Copy link
Author

I think having two separate method with different behaviour would introduce more bugs and will complexify things,

if you consider that "listSchemaNames" returns all the user-created schema [0], then it make sense to have it (for the above reaons) to ignore public as it's not user created , and the very few who have all:

  1. a DBA who deleted the public schema before handling it to developers
  2. a DBA who allows application code have a postgres user with CREATE SCHEMA rights
  3. a DBA who allows also the public schema to be recreated
  4. a DBA that doesn't complain that the application code who created the schema they didn't want in the first place, does not clean it on down migration

should be a minority (sorry no hard statitics here [1], the same as I don't have a hard statistics on how many people eat soup with a fork but at some point it feel like pointless to argue...)

to the point I feel like these people (if they exists) will not complain that now they are the one that need to manually add the public schema in the list of returned schema by dbal (or if in the case of indirect use through doctrine migration the fact to have to manually add it in their very first migration , while right now people need to remove it in every single migration)

[0] I know the documentation says " * Returns a list of the names of all schemata in the current database." but it's already wrong as it filter some schema ...

[1] on AWS , Azure , GCP , on default ubuntu, debian , fedora , windows , Mac installer , the postgres you got has a public schema, so you really must have a DBA that trick the install to remove it before handling it ot , what if we make a poll at the next php conference ?

@greg0ire
Copy link
Member

greg0ire commented Sep 24, 2022

it filter some schema

I think here you're referring to only one schema called information_schema. According to the official docs, it:

should be a minority

It seems like you are saying that the only thing we could break is doctrine/orm and doctrine/migrations. There is a lot more code that could break if we changed that method.

it feel like pointless to argue

We can stop the discussion if you want, that's not an issue to me.

Anyway, as you can see, I have a proposal, you have a proposal, and there might be more solutions, but I might be wrong about my solution, because I don't know schema comparison that well. For instance, it's unclear where listSchemaNames kicks in.

Likewise, you might be wrong.

Also, the problem you're willing to solve is not worded clearly, I believe this is why this issue was closed. Shouldn't there be somewhere a sentence like: when I run a comparison between this schema and that schema, the diff should be empty, but isn't?

I think instead of jumping to solutions, you should do what @morozov asked and file a new issue with a clearly stated problem, expressed with the DBAL APIs (meaning high-level APIs). A test case would be best.

BTW, I'm talking about schema comparison, but it's unclear if the issue you are trying to fix is with schema comparison, with Schema->toSql() (used in SchemaTool::getCreateSchemaSql(), with Schema->toDropSql(), or a combination of those.

@allan-simon
Copy link
Author

We can stop the discussion if you want, that's not an issue to me.

I'm arguying with @morozov handling of it , not you, we can continue to discuss, you're not ignoring my arguments or only handpicking the one that fit you.

@allan-simon
Copy link
Author

Also, the problem you're willing to solve is not worded clearly,

it is, it's the summary of dozen of specific issues , which all boils down to the same underlying issue of public being returned , so I was thinking that creating a summary issue, stating all the angle of the problems would be enough (it's not like i throwed a title-only issue without takign the time to git bissect accross 10 years of commit )

you should do what @morozov asked and file a new issue with a clearly stated problem

I'm sorry but #1110 was worded clearly and very specific but it was closed without any argument

so hence my frustration, I have a guy in front of me that contradicts himself:

  • ask for a clear problem BUT close issue where the issues is stated clearly ( so what do i do, i reopen the same one, take hours of my life, for something that will get the same fate ? )
  • ask for a PR , BUT when provided (Filter 'public' schema from listSchemaNames #5600) is ignoring it (the PR itself re-explain everything and give the clear exemples he is asking for
  • ask for the root cause to be fixed Filter 'public' schema from listSchemaNames #5600 (comment) BUT when I create this issue and the PR and, he closes it ask for specific symptoms ...

@allan-simon
Copy link
Author

I think here you're referring to only one schema called information_schema. According to the official docs, it:

no pg_ are also filtered

It seems like you are saying that the only thing we could break is doctrine/orm and doctrine/migrations. There is a lot more code that could break if we changed that method.

I'm not saying that , I'm saying the minority are people that have both a DBA who deleted the public schema but at the same time would allow people to recreate it

@greg0ire
Copy link
Member

which all boils down to the same underlying issue of public being returned

Again, let's not jump to solutions, we don't know that it is the correct one. Not returning public might fix your issues, it might also result in other issues.

I'm sorry but #1110 was worded clearly and very specific

It's quite clear, yet it's not easy to reproduce with just the DBAL APIs. I would expect something like

$schema1 = …
$schema2->getMigrateFromSql($schema1)

From reading the issue, you can't tell if there is a bug in the DBAL, or if doctrine/migrations is misusing it.

no pg_ are also filtered

Not sure how I missed that, that's indeed the case. The docs say this:

Schema names beginning with pg_ are reserved for system purposes and cannot be created by users.

For public, you'll have to admit that it's different: it's kind of common for people to create table and indexes inside public, while they would never do that inside information_schema or schemas starting with pg_, right?

I'm saying the minority are people that have both a DBA who deleted the public schema but at the same time would allow people to recreate it

There could also be other packages (different from ORM or migrations) relying on that method to display schema names, for purposes we don't know. The DBAL has 4k dependends: https://packagist.org/packages/doctrine/dbal/dependents?order_by=downloads

Hard to tell who is using this API for what. Hence my proposal of having this filtering be opt-in, and affect a new API instead of an existing one.

@doctrine doctrine locked as too heated and limited conversation to collaborators Sep 24, 2022
@greg0ire
Copy link
Member

greg0ire commented Sep 24, 2022

I'm locking this because it's starting to consume my energy as well, but anyone feel free to open another issue about this. If you do, make sure that you state a clear problem , preferably reproducible with the DBAL APIs that are used directly by doctrine/orm (specifically by SchemaTool) and by doctrine/migrations. And if you do, please leave your emotions at the door, or remarks about the issue being 10 years old. Avoid mentioning time of your life you lost with this when talking to maintainers that spend a lot more time working on Doctrine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants