Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot start using "start --optimized" after upgrading to a newer version #29106

Closed
1 of 2 tasks
madmurl0c opened this issue Apr 26, 2024 · 9 comments
Closed
1 of 2 tasks

Comments

@madmurl0c
Copy link

madmurl0c commented Apr 26, 2024

Before reporting an issue

  • I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

Area

infinispan

Describe the bug

Hey everyone, I'm experiencing an issue trying to upgrade keycloak.

I've been using version 21.1.0 for quite a while but I cannot upgrade to any newer version.

I narrowed it down and noticed that it does start when I use "start-dev" instead of "start --optimized".

I build my optimized container like this:

FROM quay.io/keycloak/keycloak:latest AS keycloak

FROM keycloak as kc-builder

# Enable health and metrics support
ENV KC_HEALTH_ENABLED=true
ENV KC_METRICS_ENABLED=true

# Configure a database vendor
ENV KC_DB=mariadb

WORKDIR /opt/keycloak
# for demonstration purposes only, please make sure to use proper certificates in production instead
RUN keytool -genkeypair -storepass password -storetype PKCS12 -keyalg RSA -keysize 2048 -dname "CN=server" -alias server -ext "SAN:c=DNS:localhost,IP:127.0.0.1" -keystore conf/server.keystore
RUN /opt/keycloak/bin/kc.sh build

FROM keycloak
COPY --from=kc-builder /opt/keycloak/ /opt/keycloak/
ENTRYPOINT ["/opt/keycloak/bin/kc.sh"]
CMD ["start", "--optimized"]

Version

21.1.0, 24.0.3

Regression

  • The issue is a regression

Expected behavior

Keycloak should start up

Actual behavior

Following error message:

2024-04-26 09:18:52,332 WARN  [org.keycloak.quarkus.runtime.cli.Picocli] (main) The following build time non-cli options were found, but will be ignored during run time: kc.db
2024-04-26 09:18:52,333 WARN  [org.keycloak.quarkus.runtime.cli.Picocli] (main) The following used options or option values are DEPRECATED and will be removed in a future release:
	- proxy: Use proxy-headers.
Consult the Release Notes for details.
2024-04-26 09:18:53,537 INFO  [org.infinispan.CONTAINER] (keycloak-cache-init) ISPN000556: Starting user marshaller 'org.infinispan.jboss.marshalling.core.JBossUserMarshaller'
2024-04-26 09:18:53,701 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000088: Unable to use any JGroups configuration mechanisms provided in properties {}. Using default JGroups configuration!
2024-04-26 09:18:53,871 INFO  [org.keycloak.quarkus.runtime.hostname.DefaultHostnameProvider] (main) Hostname settings: Base URL: <unset>, Hostname: keycloak-test.my.lan, Strict HTTPS: true, Path: <request>, Strict BackChannel: false, Admin URL: <unset>, Admin: <request>, Port: -1, Proxied: true
2024-04-26 09:18:53,877 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000078: Starting JGroups channel `ISPN`
2024-04-26 09:18:53,883 INFO  [org.jgroups.JChannel] (keycloak-cache-init) local_addr: 84a417d6-caaa-4f17-a3e4-da2004cf2f86, name: a0c08860e31b-49509
2024-04-26 09:18:53,892 WARN  [org.jgroups.protocols.UDP] (keycloak-cache-init) JGRP000015: the send buffer of socket MulticastSocket was set to 1MB, but the OS only allocated 212.99KB
2024-04-26 09:18:53,892 WARN  [org.jgroups.protocols.UDP] (keycloak-cache-init) JGRP000015: the receive buffer of socket MulticastSocket was set to 20MB, but the OS only allocated 212.99KB
2024-04-26 09:18:53,892 WARN  [org.jgroups.protocols.UDP] (keycloak-cache-init) JGRP000015: the send buffer of socket MulticastSocket was set to 1MB, but the OS only allocated 212.99KB
2024-04-26 09:18:53,892 WARN  [org.jgroups.protocols.UDP] (keycloak-cache-init) JGRP000015: the receive buffer of socket MulticastSocket was set to 25MB, but the OS only allocated 212.99KB
2024-04-26 09:18:53,901 INFO  [org.jgroups.protocols.FD_SOCK2] (keycloak-cache-init) server listening on *.40533
2024-04-26 09:18:53,967 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000094: Received new cluster view for channel ISPN: [1cbd0635f413-37696|51] (2) [1cbd0635f413-37696, a0c08860e31b-49509]
2024-04-26 09:18:54,000 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000079: Channel `ISPN` local address is `a0c08860e31b-49509`, physical addresses are `[172.27.1.11:56069]`
2024-04-26 09:18:54,034 WARN  [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000569: Unable to persist Infinispan internal caches as no global state enabled
2024-04-26 09:18:54,414 ERROR [org.infinispan.CLUSTER] (jgroups-7,a0c08860e31b-49509) ISPN000475: Error processing response for request 2 from 1cbd0635f413-37696: java.io.IOException: Unknown type: 28
	at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:720)
	at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
	at org.infinispan.marshall.core.BytesObjectInput.readObject(BytesObjectInput.java:32)
	at org.infinispan.topology.CacheTopology$Externalizer.doReadObject(CacheTopology.java:269)
	at org.infinispan.topology.CacheTopology$Externalizer.doReadObject(CacheTopology.java:250)
	at org.infinispan.commons.marshall.InstanceReusingAdvancedExternalizer.readObject(InstanceReusingAdvancedExternalizer.java:102)
	at org.infinispan.marshall.core.GlobalMarshaller.readWithExternalizer(GlobalMarshaller.java:727)
	at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:708)
	at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
	at org.infinispan.marshall.core.BytesObjectInput.readObject(BytesObjectInput.java:32)
	at org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:98)
	at org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:85)
	at org.infinispan.marshall.core.GlobalMarshaller.readWithExternalizer(GlobalMarshaller.java:727)
	at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:708)
	at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
	at org.infinispan.marshall.core.BytesObjectInput.readObject(BytesObjectInput.java:32)
	at org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:71)
	at org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:63)
	at org.infinispan.marshall.core.GlobalMarshaller.readWithExternalizer(GlobalMarshaller.java:727)
	at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:708)
	at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
	at org.infinispan.marshall.core.GlobalMarshaller.objectFromObjectInput(GlobalMarshaller.java:191)
	at org.infinispan.marshall.core.GlobalMarshaller.objectFromByteBuffer(GlobalMarshaller.java:220)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1562)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1471)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1663)
	at org.jgroups.JChannel.up(JChannel.java:734)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:936)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:139)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:245)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:245)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:846)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:226)
	at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1086)
	at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:825)
	at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:807)
	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:456)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:680)
	at org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:105)
	at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:180)
	at org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:189)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:274)
	at org.jgroups.protocols.Discovery.up(Discovery.java:294)
	at org.jgroups.stack.Protocol.up(Protocol.java:340)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1184)
	at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:107)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
2024-04-26 09:18:54,415 ERROR [org.infinispan.CLUSTER] (jgroups-8,a0c08860e31b-49509) ISPN000474: Error processing request 0@1cbd0635f413-37696: java.io.EOFException
	at org.infinispan.marshall.core.AbstractBytesObjectInput.checkPosLength(AbstractBytesObjectInput.java:261)
	at org.infinispan.marshall.core.AbstractBytesObjectInput.readByte(AbstractBytesObjectInput.java:107)
	at org.infinispan.marshall.core.AbstractBytesObjectInput.readBoolean(AbstractBytesObjectInput.java:102)
	at org.infinispan.commands.topology.TopologyUpdateStableCommand.readFrom(TopologyUpdateStableCommand.java:98)
	at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readCommandParameters(ReplicableCommandExternalizer.java:116)
	at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:105)
	at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:67)
	at org.infinispan.marshall.core.GlobalMarshaller.readWithExternalizer(GlobalMarshaller.java:727)
	at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:708)
	at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
	at org.infinispan.marshall.core.GlobalMarshaller.objectFromObjectInput(GlobalMarshaller.java:191)
	at org.infinispan.marshall.core.GlobalMarshaller.objectFromByteBuffer(GlobalMarshaller.java:220)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processRequest(JGroupsTransport.java:1528)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1468)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1663)
	at org.jgroups.JChannel.up(JChannel.java:734)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:936)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:139)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:261)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:253)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:846)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:226)
	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:443)
	at org.jgroups.protocols.pbcast.NAKACK2.deliver(NAKACK2.java:1011)
	at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:905)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:690)
	at org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:105)
	at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:180)
	at org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:189)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:274)
	at org.jgroups.protocols.Discovery.up(Discovery.java:294)
	at org.jgroups.stack.Protocol.up(Protocol.java:340)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1184)
	at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:107)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
2024-04-26 09:18:56,202 WARN  [io.quarkus.agroal.runtime.DataSources] (JPA Startup Thread) Datasource <default> enables XA but transaction recovery is not enabled. Please enable transaction recovery by setting quarkus.transaction-manager.enable-recovery=true, otherwise data may be lost if the application is terminated abruptly
2024-04-26 09:18:57,150 WARN  [io.quarkus.vertx.http.runtime.VertxHttpRecorder] (main) The X-Forwarded-* and Forwarded headers will be considered when determining the proxy address. This configuration can cause a security issue as clients can forge requests and send a forwarded header that is not overwritten by the proxy. Please consider use one of these headers just to forward the proxy address in requests.

How to Reproduce?

I created a new docker-compose stack like this:

version: '3.8'

services:
  mariadb:
    image: mariadb:10.11
    environment:
      MARIADB_DATABASE: keycloak
      MARIADB_USER: keycloak
      MARIADB_PASSWORD: keycloak
      MARIADB_ROOT_PASSWORD: admin
    volumes:
      - mariadb_data:/var/lib/mysql
    restart: unless-stopped

  keycloak:
    image: gitlab.my.lan/madmurl0c/keycloak:21.1.0
    command: start-dev
    ports:
      - 8080:8080
      - 8443:8443
    depends_on:
      mariadb:
        condition: service_healthy
    environment:
      KC_DB: mariadb
      KC_DB_URL: jdbc:mariadb://mariadb:3306/keycloak
      KC_DB_USERNAME: keycloak
      KC_DB_PASSWORD: keycloak
      KC_HOSTNAME: keycloak-test.my.lan
#      KC_HOSTNAME_STRICT: "false"
      KC_PROXY_ADDRESS_FORWARDING: "true"
      KC_PROXY: edge
      KEYCLOAK_ADMIN: admin
      KEYCLOAK_ADMIN_PASSWORD: admin
    restart: unless-stopped


volumes:
  mariadb_data:

Then try to upgrade like this:

  keycloak:
#    image: gitlab.my.lan/madmurl0c/keycloak:21.1.0
    image:  gitlab.my.lan/madmurl0c/keycloak:24.0.3

Switching back to the older version and it starts up just fine.

Anything else?

No response

If you need any more info just let me know. Thanks for your help :)

@sschu
Copy link
Contributor

sschu commented Apr 26, 2024

Your trace looks a bit as if there is a second Keycloak container it is trying to form a cluster with but this does not work. Could it be that the upgrade tries to put a new Keycloak container next to the existing one to do the update? This is generally not supported for with different Keycloak versions. To update, you first have to stop the old container and then start a new one.

@madmurl0c
Copy link
Author

@sschu I noticed that too but I don't know where the 2nd entry is coming from. I stop the stack, update the keycloak container and start it back up. There aren't two servers online at once. It always shows two, never 3 or more, no matter how often I restart or change versions though.

The weird thing is that this error doesn't appear when using start-dev instead of start

@shawkins
Copy link
Contributor

The weird thing is that this error doesn't appear when using start-dev instead of start

start-dev creates only a local cache.

@madmurl0c
Copy link
Author

Ah okay, that explains it. Is there a way to clear the cache? Is that even a good idea? I still don't understand why it only happens when I change the version. Even if I completely kill and delete the container (retain the database), as long as I start it up with the same version there's no problem. It probably doesn't like migrating from an older version if there's unreachable hosts configured in the cache.

@shawkins
Copy link
Contributor

@madmurl0c the exception your are showing should only happen if there is another member of the cluster running with a different version.

a0c08860e31b-49509 is the local address, then you can see another one:

2024-04-26 09:18:53,967 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000094: Received new cluster view for channel ISPN: [1cbd0635f413-37696|51] (2) [1cbd0635f413-37696, a0c08860e31b-49509]

@madmurl0c
Copy link
Author

I feel kind of stupid now for not thinking of this earlier but I think I know where the 2nd instance comes from. I use traefik as a reverse proxy and have the keycloak-test and keycloak stacks wired up to a proxy docker network so that traefik is able to route to them. I think my keycloak-test discovers my production system since they are able to see each other.

I know that I should probably have the test-system running on different hardware but is there a way to separate the caches? Give them a name or prefix maybe? Something like KC_CACHE_NAME=production and KC_CACHE_NAME=test for example.

Thanks for pointing me in the right direction @shawkins

@shawkins
Copy link
Contributor

I know that I should probably have the test-system running on different hardware but is there a way to separate the caches? Give them a name or prefix maybe? Something like KC_CACHE_NAME=production and KC_CACHE_NAME=test for example.

The operator uses -Djgroups.dns.query and a headless discovery service to control this - as pointed out here #21067 (comment) - you can create separate services per version to segregate the members. I'm not sure what's the simpliest way to do this otherwise - cc @mhajas

@pruivo
Copy link
Contributor

pruivo commented Apr 30, 2024

@madmurl0c you're mixing different Infinispan versions in the same cluster and that is not supported by Infinispan. A wire format of one of the messages was changed to fix an issue and the old version is unable to parse it. That's why you have java.io.IOException: Unknown type: 28

@madmurl0c
Copy link
Author

Thanks everyone for your input. I worked around this issue by adding --cache=local in my build process. It should be fine since I don't use a cluster but a single docker container anyway.

@shawkins shawkins reopened this May 2, 2024
@shawkins shawkins closed this as not planned Won't fix, can't repro, duplicate, stale May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants