Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency issues with different connections (eg read/write replicas) #289

Open
Tristan971 opened this issue Mar 23, 2024 · 8 comments
Open

Comments

@Tristan971
Copy link

The issue is similar to #284, but easier to catch in a more common setup, like so (Symfony 6.4):

doctrine:
  dbal:
    default_connection: default
    connections:
      default:
        url: '%env(resolve:DATABASE_URL)%'
        ...
        replicas:
          ro:
            url: '%env(resolve:DATABASE_RO_URL)%'

In this case, you might end up with invisible writes if they're stuck within a transaction on the default connection, but you're doing a read with the readonly version (default.ro is its name with this config)

Indeed, this yields 2 different static connections, since the hash in

will have different values for each.

eg:

image

image

This isn't surprising, to be fair, and I don't think the project can do much for it, but hopefully this saves someone some time.

The solution being to make sure your tests use a single connection per database (at least within contexts where you expect to see writes done by tests). Unfortunately Symfony doesn't make this easy.

In principle, you could add config/test/doctrine.yaml with:

doctrine:
  dbal:
    connections:
      default:
        use_savepoints: true
        replicas: ~

But in practice, during configuration parsing, Symfony maps ~ to [] and then performs an array merge instead of replacement, and thus your default values with a replica remain active...

So you pretty much have to remove replicas from your default doctrine.yaml and configure it explicitly for all your environments that don't rely on this bundle.

I'm not entirely sure how this can be worked around, to be honest.

About the only way to work around it that I can see is to have the StaticDriver somehow force Doctrine to return a connection to the primary specifically, maybe checking for when $connection is instanceof \Doctrine\DBAL\Connections\PrimaryReadReplicaConnection and calling ensureConnectedToPrimary.

@dmaicher
Copy link
Owner

dmaicher commented Apr 1, 2024

Indeed this seems tricky to handle for this bundle 😕

The StaticDriver is operating on a lower level and does not know anything about PrimaryReadReplicaConnection.

Possibly one could decorate Doctrine\Bundle\DoctrineBundle\ConnectionFactory::connect and call ensureConnectedToPrimary 🤔

Or we find a way of ensuring the same connection hash is used for primary and the read replica inside StaticDriver.

For now I would also suggest to configure a connection without replicas for the test environment then.

@dmaicher
Copy link
Owner

@Tristan971 @prohalexey could you maybe test https://github.com/dmaicher/doctrine-test-bundle/compare/connection_key?expand=1 ?

Its some quick proof of concept to allow forcing usage of a specific key for keeping the connection in the static array.

Not sure yet how to expose that (via config maybe) but for this PoC I quickly enabled defining parameters for that.

See https://github.com/dmaicher/doctrine-test-bundle/compare/connection_key?expand=1#diff-2582698d058918d461f16ca8976229516f88622d0e77f46db9e8141394d1f176R29-R31

This also allows to define/overwrite the key that is used per replica.

So for testing this PoC with a config like this

doctrine:
  dbal:
    default_connection: default
    connections:
      default:
        url: '%env(resolve:DATABASE_URL)%'
        ...
        replicas:
          ro:
            url: '%env(resolve:DATABASE_RO_URL)%'

you would need to define Symfony parameters to force using the same key for both

parameters:
    dama.connection_key.default: some_custom_key
    dama.connection_key.default.ro: some_custom_key

WDYT? Would this solve your problem?

@prohalexey
Copy link

prohalexey commented May 1, 2024

@dmaicher I think yes, if i could set the same keys for all connections.

BTW json_encode does not garauntee the order of elements in the array, so even if connections have the same connection params, they may produce different sha1 hashes(due to sorting). By default, you have to ksort connection params before json_encode.

@dmaicher
Copy link
Owner

dmaicher commented May 1, 2024

@prohalexey

BTW json_encode does not garauntee the order of elements in the array,

Do you have a source for that? I tested it and at least on PHP 8.3 this always generates exactly the same json.

@prohalexey
Copy link

prohalexey commented May 5, 2024

Do you have a source for that?

The first part is https://www.rfc-editor.org/rfc/rfc7159.html#section-1

An object is an unordered collection of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.

The second part is that keys in php array could be in the different order, but with the same values(values loaded from configuration), so it will produce the different string.

@dmaicher
Copy link
Owner

dmaicher commented May 5, 2024

The first part is https://www.rfc-editor.org/rfc/rfc7159.html#section-1

Ok but that is completely unrelated to the php implementation of json_encode. I'm pretty certain it will deterministically produce the same output string for the same input array.

The second part is that keys in php array could be in the different order, but with the same values(values loaded from configuration), so it will produce the different string.

How should this happen though? Also certain here that the order of keys will not change really. Its build deterministically from the Bundle configuration.

Anyway this is off-topic. I don't see an issue with the implementation. If you encounter any problems with changing hashes then please open a new issue.

@dmaicher
Copy link
Owner

dmaicher commented May 7, 2024

@Tristan971 @prohalexey do you think it would make sense to always (by default) use the same underlying driver connection for the primary and the read replicas? I can't really think of a scenario where you would need different connections and this would result in having those read inconsistencies due to non-committed transactions... 🤔

@Tristan971
Copy link
Author

Tristan971 commented May 7, 2024

Well, there's a reasonable argument that if someone is testing that something uses the primary/secondary specifically, then this side-effect/"bug" was actually a feature.

But in my opinion that's not very likely, so as long as it's documented in release notes then someone writing tests that are this specific will figure it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants