Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Failures in various sql tests due to ZoneRulesException: Unknown time-zone ID: Asia/Qostanay #72963

Closed
astefan opened this issue May 12, 2021 · 23 comments
Assignees
Labels
:Analytics/SQL SQL querying Team:QL (Deprecated) Meta label for query languages team >test-failure Triaged test failures from CI

Comments

@astefan
Copy link
Contributor

astefan commented May 12, 2021

Build scan: multiple. One of them https://gradle-enterprise.elastic.co/s/b5joeiwm33tdm

Repro line:

./gradlew ':x-pack:plugin:sql:qa:single-node:integTestRunner' \
  -Dtests.seed=28FE821C9735C914 \
  -Dtests.class=org.elasticsearch.xpack.sql.qa.single_node.JdbcResultSetIT \
  -Dtests.method="testNoInfiniteRecursiveGetObjectCalls" \
  -Dtests.security.manager=true \
  -Dtests.locale=ja-JP-u-ca-japanese-x-lvariant-JP \
  -Dtests.timezone=Africa/Porto-Novo \
  -Dcompiler.java=11 \
  -Druntime.java=8

Reproduces locally?: no

Applicable branches: 6.8

Failure history: started failing on May 11th.

Failure excerpt:

00:26:33 ERROR   0.12s | JdbcResultSetIT.testGettingInvalidLong <<< FAILURES!
00:26:33    > Throwable #1: java.sql.SQLException: Server sent bad type [x_content_parse_exception]. Original type was [[1:57] [sql/query] failed to parse field [time_zone]]. [org.elasticsearch.common.xcontent.XContentParseException: [1:57] [sql/query] failed to parse field [time_zone]
00:26:33    > 	at org.elasticsearch.common.xcontent.ObjectParser.parseValue(ObjectParser.java:316)
00:26:33    > 	at org.elasticsearch.common.xcontent.ObjectParser.parseSub(ObjectParser.java:362)
00:26:33    > 	at org.elasticsearch.common.xcontent.ObjectParser.parse(ObjectParser.java:168)
00:26:33    > 	at org.elasticsearch.common.xcontent.ObjectParser.apply(ObjectParser.java:182)
00:26:33    > 	at org.elasticsearch.xpack.sql.action.SqlQueryRequest.fromXContent(SqlQueryRequest.java:135)
00:26:33    > 	at org.elasticsearch.xpack.sql.plugin.RestSqlQueryAction.prepareRequest(RestSqlQueryAction.java:46)
00:26:33    > 	at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:91)
00:26:33    > 	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:240)
00:26:33    > 	at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:336)
00:26:33    > 	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174)
00:26:33    > 	at org.elasticsearch.http.netty4.Netty4HttpServerTransport.dispatchRequest(Netty4HttpServerTransport.java:551)
00:26:33    > 	at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:137)
00:26:33    > 	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at org.elasticsearch.http.netty4.pipelining.HttpPipeliningHandler.channelRead(HttpPipeliningHandler.java:68)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
00:26:33    > 	at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
00:26:33    > 	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
00:26:33    > 	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
00:26:33    > 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
00:26:33    > 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
00:26:33    > 	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
00:26:33    > 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656)
00:26:33    > 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556)
00:26:33    > 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510)
00:26:33    > 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470)
00:26:33    > 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
00:26:33    > 	at java.base/java.lang.Thread.run(Thread.java:834)
00:26:33    > Caused by: java.time.zone.ZoneRulesException: Unknown time-zone ID: Asia/Qostanay
00:26:33    > 	at java.base/java.time.zone.ZoneRulesProvider.getProvider(ZoneRulesProvider.java:279)
00:26:33    > 	at java.base/java.time.zone.ZoneRulesProvider.getRules(ZoneRulesProvider.java:234)
00:26:33    > 	at java.base/java.time.ZoneRegion.ofId(ZoneRegion.java:120)
00:26:33    > 	at java.base/java.time.ZoneId.of(ZoneId.java:408)
00:26:33    > 	at java.base/java.time.ZoneId.of(ZoneId.java:356)
00:26:33    > 	at org.elasticsearch.xpack.sql.action.AbstractSqlQueryRequest.lambda$objectParser$3(AbstractSqlQueryRequest.java:79)
00:26:33    > 	at org.elasticsearch.common.xcontent.ObjectParser.lambda$declareField$1(ObjectParser.java:213)
00:26:33    > 	at org.elasticsearch.common.xcontent.ObjectParser.parseValue(ObjectParser.java:314)
00:26:33    > 	... 56 more
@astefan astefan added >test-failure Triaged test failures from CI :Analytics/SQL SQL querying labels May 12, 2021
@astefan astefan self-assigned this May 12, 2021
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label May 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@astefan
Copy link
Contributor Author

astefan commented May 12, 2021

Managed to reproduce this one. It's all about the java version.

Compiler JDK Version  : 11 (18.9 11.0.2 [OpenJDK 64-Bit Server VM 11.0.2+7])
23:48:35   Compiler java.home    : /var/lib/jenkins/.java/openjdk-11.0.2-linux
23:48:35   Runtime JDK Version   : 1.8 (Oracle Corporation 1.8.0_281 [Java HotSpot(TM) 64-Bit Server VM 25.281-b09])
23:48:35   Runtime java.home     : /var/lib/jenkins/.java/java8
23:48:35   Gradle JDK Version    : 11 (18.9 11.0.2 [OpenJDK 64-Bit Server VM 11.0.2+7])
23:48:35   Gradle java.home      : /var/lib/jenkins/.java/openjdk-11.0.2-linux

According to #72490 the Joda version has been updated two days ago to 2.10.10 and the Java 11 version used in CI doesn't seem to be the latest one. With 11.0.11 the issue doesn't reproduce for me.
Looking at the tzdata updates over time https://www.oracle.com/java/technologies/tzdata-versions.html:

tzdata2018h
2018/12/23
...
New zone Asia/Qostanay because Qostanay, Kazakhstan didn't move.
....

If I'm not mistaken this update ^ went into 11.0.4 which is consistent with the behavior I see in this test: 11.0.2 complains that Asia/Qostanay is not a recognized timezone while 11.0.11 doesn't.
@mark-vieira can you, please, double check the Java 11 CI version we use for tests?

@astefan
Copy link
Contributor Author

astefan commented May 12, 2021

Relates to #72963.

@astefan
Copy link
Contributor Author

astefan commented May 12, 2021

Also, the local test that fails on my machine show these logs:

[2021-05-12T12:19:48,114][INFO ][o.e.e.NodeEnvironment    ] [node-0] using [1] data paths, mounts [[Work (D:)]], net usable_space [373.8gb], net total_space [553.8gb], types [NTFS]
[2021-05-12T12:19:48,116][INFO ][o.e.e.NodeEnvironment    ] [node-0] heap size [494.9mb], compressed ordinary object pointers [true]
[2021-05-12T12:19:48,118][INFO ][o.e.n.Node               ] [node-0] node name [node-0], node ID [l6xZV3O8QguSV_6LGuzU7Q]
[2021-05-12T12:19:48,118][INFO ][o.e.n.Node               ] [node-0] version[6.8.16-SNAPSHOT], pid[12516], build[default/zip/103f38c/2021-05-12T08:30:43.562157300Z], OS[Windows 10/10.0/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/11.0.2/11.0.2+9]
[2021-05-12T12:19:48,118][INFO ][o.e.n.Node               ] [node-0] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=

The jvm being used is 11.0.2. The runtime jvm to be used in this test according to the reproduceable command -Druntime.java=8.

@mark-vieira
Copy link
Contributor

Looks like we indeed were not correctly using RUNTIME_JAVA_HOME for external test clusters in the 6.8 branch. That has probably been the case for a long time 🤦

I've opened #73003 to fix that, but I'm not sure it'll necessarily fix the errors. Athough it at least with make the behavior consistent.

@mark-vieira
Copy link
Contributor

Not surprisingly my PR picked up a different error. Since we've effectively not been testing against runtime javas properly for who knows how long, my fear is merging the "fix" will break countless other things. That said, if we do have incompatibilities we should probably find out.

https://gradle-enterprise.elastic.co/s/kjpid2amlnu5m/console-log?task=:x-pack:plugin:sql:qa:single-node:integTestRunner

@astefan
Copy link
Contributor Author

astefan commented May 13, 2021

@mark-vieira I've looked at that failure you mentioned. Its root cause is not ES SQL or even Elasticsearch specific imo, but JVM specific. The query that is failing is SELECT TAN(RADIANS(66)). Which is basically Math.tan(Math.toRadians(66)). I tested locally a simple java program running the previous Math formula. This is what I got:

java 1.8.0_291 java 13.0.2
2.246036773904215 2.2460367739042164

Which is surprising given that this has never been caught before. If we test ES on 1.8, both runtime and compilation, this failure should have been caught somehow. To me it looks like different Java behavior unless I'm missing something.

@mark-vieira
Copy link
Contributor

That's exactly the case. We've only been running integration test clusters using the Gradle JDK (11). So no, we wouldn't have caught it until the fix in that PR.

@astefan
Copy link
Contributor Author

astefan commented May 13, 2021

Then how about upgrading jdk 11 (and all other JDKs) to the latest, which should have been the like this in the first place?

@mark-vieira
Copy link
Contributor

I'll reach out to infra to see why these aren't being regularly updated. For some reason it's still 11.0.2.

https://jvm-catalog.elastic.co/jdk/latest_openjdk_11_linux

@mark-vieira
Copy link
Contributor

mark-vieira commented May 13, 2021

I've opened an infra issue to investigate why oracle and openjdk releases seem to be lagging behind on jvm-catalog.elastic.co

https://github.com/elastic/infra/issues/29067

@nik9000
Copy link
Member

nik9000 commented May 13, 2021

This one is sort of another flavor of the same issue: https://gradle-enterprise.elastic.co/s/mwxjqeyzes2fy

@astefan
Copy link
Contributor Author

astefan commented May 14, 2021

And another one, not in sql tests: https://gradle-enterprise.elastic.co/s/vl3sa55rd7iey

gradlew ':server:integTest' -Dtests.seed=FE54386079882F76 -Dtests.class=org.elasticsearch.search.aggregations.bucket.DateRangeIT -Dtests.method="testSingleValueFieldWithDateMath" -Dtests.security.manager=true -Dtests.locale=id-ID -Dtests.timezone=Asia/Karachi -Dcompiler.java=11 -Druntime.java=8

10:44:26    > Caused by: NotSerializableExceptionWrapper[zone_rules_exception: Unknown time-zone ID: America/Nuuk]
10:44:26    > 	at java.time.zone.ZoneRulesProvider.getProvider(ZoneRulesProvider.java:272)
10:44:26    > 	at java.time.zone.ZoneRulesProvider.getRules(ZoneRulesProvider.java:227)
10:44:26    > 	at java.time.ZoneRegion.ofId(ZoneRegion.java:120)
10:44:26    > 	at java.time.ZoneId.of(ZoneId.java:411)
10:44:26    > 	at java.time.ZoneId.of(ZoneId.java:359)
10:44:26    > 	at org.elasticsearch.common.time.DateUtils.dateTimeZoneToZoneId(DateUtils.java:74)
10:44:26    > 	at org.elasticsearch.search.DocValueFormat$DateTime.<init>(DocValueFormat.java:181)
10:44:26    > 	at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.docValueFormat(DateFieldMapper.java:394)

@droberts195
Copy link
Contributor

Java 11.0.2 is still on some CI machines - https://gradle-enterprise.elastic.co/s/hqzblhjznqpqg failed due to Asia/Qostanay.

@cbuescher
Copy link
Member

Several SQL test related issues that look the same today, here are a few of them, all on 6.8 an in periodic matrix tests:
https://gradle-enterprise.elastic.co/s/x6uxmdpe765ty
https://gradle-enterprise.elastic.co/s/p4glwbtr2nuty
https://gradle-enterprise.elastic.co/s/6rbn3s25q5az2

@cbuescher
Copy link
Member

Don't know if this is relevant but some of these have different problematic time zone:

Caused by: java.time.zone.ZoneRulesException: Unknown time-zone ID: America/Nuuk
	at java.base/java.time.zone.ZoneRulesProvider.getProvider(ZoneRulesProvider.java:279)
	at java.base/java.time.zone.ZoneRulesProvider.getRules(ZoneRulesProvider.java:234)
	at java.base/java.time.ZoneRegion.ofId(ZoneRegion.java:120)
	at java.base/java.time.ZoneId.of(ZoneId.java:408)
	at java.base/java.time.ZoneId.of(ZoneId.java:356)
	at org.elasticsearch.xpack.sql.action.AbstractSqlQueryRequest.lambda$objectParser$3(AbstractSqlQueryRequest.java:79)

e.g. https://gradle-enterprise.elastic.co/s/mvwz5nwsdha4a

@mark-vieira
Copy link
Contributor

Yeah, looks like this is still blowing up on 6.8.

@mark-vieira
Copy link
Contributor

JDK 11 has been updated on the CI machines and 6.8 has been "fixed" so that test executors and clusters run using the same configured JDK. Is there something else that needs addressing here in unit tests?

@mark-vieira
Copy link
Contributor

Ok, it seems there are still some of these failures due to our CentOS 6 images not having been updated yet. I've opened https://github.com/elastic/infra/issues/29279 to address this.

@astefan
Copy link
Contributor Author

astefan commented Jun 7, 2021

Closing this one now, as there were no more failures in the last week.

@astefan astefan closed this as completed Jun 7, 2021
@benwtrent
Copy link
Member

Surprise! Two more failures.

Both failed with:

Throwable #1: java.time.zone.ZoneRulesException: Unknown time-zone ID: America/Nuuk

https://gradle-enterprise.elastic.co/s/6jqxfpy5cjtqe

reproduce with:

./gradlew ':server:unitTest' -Dtests.seed=9BAD455C70F83B8C -Dtests.class=org.elasticsearch.index.query.RangeQueryBuilderTests -Dtests.method="testToQuery" -Dtests.security.manager=true -Dtests.locale=pl -Dtests.timezone=Asia/Harbin -Dcompiler.java=11 -Druntime.java=12

Runtime info

=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 5.4.1
  OS Info               : Linux 4.15.0-1098-gcp (amd64)
  Compiler JDK Version  : 11 (18.9 11.0.11 [Java HotSpot(TM) 64-Bit Server VM 11.0.11+9-LTS-194])
  Compiler java.home    : /var/lib/jenkins/.java/oracle-11.0.11-linux
  Runtime JDK Version   : 12 (Oracle Corporation 12.0.2 [OpenJDK 64-Bit Server VM 12.0.2+10])
  Runtime java.home     : /var/lib/jenkins/.java/openjdk12
  Gradle JDK Version    : 11 (18.9 11.0.11 [Java HotSpot(TM) 64-Bit Server VM 11.0.11+9-LTS-194])
  Gradle java.home      : /var/lib/jenkins/.java/oracle-11.0.11-linux
  Random Testing Seed   : 9BAD455C70F83B8C
=======================================

https://gradle-enterprise.elastic.co/s/zwjln6kii34n2

reproduce with:

./gradlew ':server:unitTest' -Dtests.seed=E5067C3545BCA7FB -Dtests.class=org.elasticsearch.index.query.RangeQueryBuilderTests -Dtests.method="testToQuery" -Dtests.security.manager=true -Dtests.locale=is -Dtests.timezone=Mexico/General -Dcompiler.java=11 -Druntime.java=8

Runtime info

=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 5.4.1
  OS Info               : Linux 5.3.18-24.64-default (amd64)
  Compiler JDK Version  : 11 (18.9 11.0.11 [Java HotSpot(TM) 64-Bit Server VM 11.0.11+9-LTS-194])
  Compiler java.home    : /var/lib/jenkins/.java/oracle-11.0.11-linux
  Runtime JDK Version   : 1.8 (Amazon.com Inc. 1.8.0_232 [OpenJDK 64-Bit Server VM 25.232-b09])
  Runtime java.home     : /var/lib/jenkins/.java/corretto8
  Gradle JDK Version    : 11 (18.9 11.0.11 [Java HotSpot(TM) 64-Bit Server VM 11.0.11+9-LTS-194])
  Gradle java.home      : /var/lib/jenkins/.java/oracle-11.0.11-linux
  Random Testing Seed   : E5067C3545BCA7FB
=======================================

@benwtrent benwtrent reopened this Jun 7, 2021
@astefan
Copy link
Contributor Author

astefan commented Jun 7, 2021

@benwtrent these failures are not in ES SQL tests. I'd suggest creating a new issue for these specifically.

@benwtrent
Copy link
Member

Opening new issue: #73899

Even though its probably the same root cause, new issue for tracking work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying Team:QL (Deprecated) Meta label for query languages team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

7 participants