{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715902415.0","currentOid":""},"activityList":{"items":[{"before":"88eb9ebe8524676631c6a4a6cfaf9172acd23818","after":"52ca921113b4308e52298d4a9968da7257d21d00","ref":"refs/heads/master","pushedAt":"2024-05-20T06:08:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[MINOR][PYTHON][TESTS] Remove unnecessary hack imports\n\n### What changes were proposed in this pull request?\nRemove unnecessary hack imports\n\n### Why are the changes needed?\nit should be no longer needed after the introduction of parent Column class\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46656 from zhengruifeng/test_parity_column.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[MINOR][PYTHON][TESTS] Remove unnecessary hack imports"}},{"before":"74b42fd1ce7dd9353d08ea1b096b91b487953fd3","after":"88eb9ebe8524676631c6a4a6cfaf9172acd23818","ref":"refs/heads/master","pushedAt":"2024-05-20T04:00:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48333][PYTHON][CONNECT][TESTS] Test `test_sorting_functions_with_column` with same `Column`\n\n### What changes were proposed in this pull request?\nTest `test_sorting_functions_with_column` with same `Column` as the Spark Classic\n\n### Why are the changes needed?\nthe imported `Column` is now the parent Column class\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46654 from zhengruifeng/test_func_column.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48333][PYTHON][CONNECT][TESTS] Test `test_sorting_functions_wi…"}},{"before":"6f6b4860268dc250d8e31a251d740733798aa512","after":"74b42fd1ce7dd9353d08ea1b096b91b487953fd3","ref":"refs/heads/master","pushedAt":"2024-05-20T01:51:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-44924][SS] Add config for FileStreamSource cached files\n\n### What changes were proposed in this pull request?\nThis change adds configuration options for the streaming input File Source for `maxCachedFiles` and `discardCachedInputRatio`. These values were originally introduced with https://github.com/apache/spark/pull/27620 but were hardcoded to 10,000 and 0.2, respectively.\n\n### Why are the changes needed?\nUnder certain workloads with large `maxFilesPerTrigger` settings, the performance gain from caching the input files capped at 10,000 can cause a cluster to be underutilized and jobs to take longer to finish if each batch takes a while to finish. For example, a job with `maxFilesPerTrigger` set to 100,000 would do all 100k in batch 1, then only 10k in batch 2, but both batches could take just as long since some of the files cause skewed processing times. This results in a cluster spending nearly the same amount of time while processing only 1/10 of the files it could have.\n\n### Does this PR introduce _any_ user-facing change?\nUpdated documentation for structured streaming sources to describe new configurations options\n\n### How was this patch tested?\nNew and existing unit tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #45362 from ragnarok56/filestream-cached-files-config.\n\nAuthored-by: ragnarok56 \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-44924][SS] Add config for FileStreamSource cached files"}},{"before":"51623785c38c9b17a6d91cb8e7f686459bd4803e","after":"6f6b4860268dc250d8e31a251d740733798aa512","ref":"refs/heads/master","pushedAt":"2024-05-18T07:18:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE\n\n### What changes were proposed in this pull request?\nChanging serialization and deserialization of collated strings so that the collation information is put in the metadata of the enclosing struct field - and then read back from there during parsing.\n\nFormat of serialization will look something like this:\n```json\n{\n \"type\": \"struct\",\n \"fields\": [\n \"name\": \"colName\",\n \"type\": \"string\",\n \"nullable\": true,\n \"metadata\": {\n \"__COLLATIONS\": {\n \"colName\": \"UNICODE\"\n }\n }\n ]\n}\n```\n\nIf we have a map we will add suffixes `.key` and `.value` in the metadata:\n```json\n{\n \"type\": \"struct\",\n \"fields\": [\n {\n \"name\": \"mapField\",\n \"type\": {\n \"type\": \"map\",\n \"keyType\": \"string\",\n \"valueType\": \"string\",\n \"valueContainsNull\": true\n },\n \"nullable\": true,\n \"metadata\": {\n \"__COLLATIONS\": {\n \"mapField.key\": \"UNICODE\",\n \"mapField.value\": \"UNICODE\"\n }\n }\n }\n ]\n}\n```\nIt will be a similar story for arrays (we will add `.element` suffix). We could have multiple suffixes when working with deeply nested data types (Map[String, Array[Array[String]]] - see tests for this example)\n\n### Why are the changes needed?\nPutting collation info in field metadata is the only way to not break old clients reading new tables with collations. `CharVarcharUtils` does a similar thing but this is much less hacky, and more friendly for all 3p clients - which is especially important since delta also uses spark for schema ser/de.\n\nIt will also remove the need for additional logic introduced in #46083 to remove collations before writing to HMS as this way the tables will be fully HMS compatible.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nWith unit tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46280 from stefankandic/newDeltaSchema.\n\nLead-authored-by: Stefan Kandic \nCo-authored-by: Stefan Kandic <154237371+stefankandic@users.noreply.github.com>\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48175][SQL][PYTHON] Store collation information in metadata an…"}},{"before":"0e7156d2d80171876c7a5e674349c53ee013be38","after":"090022d475d671ee345f22eb661f644b29ca28c5","ref":"refs/heads/branch-3.4","pushedAt":"2024-05-18T03:55:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48105][SS][3.5] Fix the race condition between state store unloading and snapshotting\n\n* When we close the hdfs state store, we should only remove the entry from `loadedMaps` rather than doing the active data cleanup. JVM GC should be able to help us GC those objects.\n* we should wait for the maintenance thread to stop before unloading the providers.\n\nThere are two race conditions between state store snapshotting and state store unloading which could result in query failure and potential data corruption.\n\nCase 1:\n1. the maintenance thread pool encounters some issues and call the [stopMaintenanceTask,](https://github.com/apache/spark/blob/d9d79a54a3cd487380039c88ebe9fa708e0dcf23/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L774) this function further calls [threadPool.stop.](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L587) However, this function doesn't wait for the stop operation to be completed and move to do the state store [unload and clear.](https://github.com/apache/spark/blob/d9d79a54a3cd487380039c88ebe9fa708e0dcf23/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L775-L778)\n2. the provider unload will [close the state store](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L719-L721) which [clear the values of loadedMaps](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L353-L355) for HDFS backed state store.\n3. if the not-yet-stop maintenance thread is still running and trying to do the snapshot, but the data in the underlying `HDFSBackedStateStoreMap` has been removed. if this snapshot process completes successfully, then we will write corrupted data and the following batches will consume this corrupted data.\n\nCase 2:\n\n1. In executor_1, the maintenance thread is going to do the snapshot for state_store_1, it retrieves the `HDFSBackedStateStoreMap` object from the loadedMaps, after this, the maintenance thread [releases the lock of the loadedMaps](https://github.com/apache/spark/blob/c6696cdcd611a682ebf5b7a183e2970ecea3b58c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L750-L751).\n2. state_store_1 is loaded in another executor, e.g. executor_2.\n3. another state store, state_store_2, is loaded on executor_1 and [reportActiveStoreInstance](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L854-L871) to driver.\n4. executor_1 does the [unload](https://github.com/apache/spark/blob/c6696cdcd611a682ebf5b7a183e2970ecea3b58c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L713) for those no longer active state store which clears the data entries in the `HDFSBackedStateStoreMap`\n5. the snapshotting thread is terminated and uploads the incomplete snapshot to cloud because the [iterator doesn't have next element](https://github.com/apache/spark/blob/c6696cdcd611a682ebf5b7a183e2970ecea3b58c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L634) after doing the clear.\n6. future batches are consuming the corrupted data.\n\nNo\n\n```\n[info] Run completed in 2 minutes, 55 seconds.\n[info] Total number of tests run: 153\n[info] Suites: completed 1, aborted 0\n[info] Tests: succeeded 153, failed 0, canceled 0, ignored 0, pending 0\n[info] All tests passed.\n[success] Total time: 271 s (04:31), completed May 2, 2024, 6:26:33 PM\n```\nbefore this change\n\n```\n[info] - state store unload/close happens during the maintenance *** FAILED *** (648 milliseconds)\n[info] Vector(\"a1\", \"a10\", \"a11\", \"a12\", \"a13\", \"a14\", \"a15\", \"a16\", \"a17\", \"a18\", \"a19\", \"a2\", \"a20\", \"a3\", \"a4\", \"a5\", \"a6\", \"a7\", \"a8\", \"a9\") did not equal ArrayBuffer(\"a8\") (StateStoreSuite.scala:414)\n[info] Analysis:\n[info] Vector1(0: \"a1\" -> \"a8\", 1: \"a10\" -> , 2: \"a11\" -> , 3: \"a12\" -> , 4: \"a13\" -> , 5: \"a14\" -> , 6: \"a15\" -> , 7: \"a16\" -> , 8: \"a17\" -> , 9: \"a18\" -> , 10: \"a19\" -> , 11: \"a2\" -> , 12: \"a20\" -> , 13: \"a3\" -> , 14: \"a4\" -> , 15: \"a5\" -> , 16: \"a6\" -> , 17: \"a7\" -> , 18: \"a8\" -> , 19: \"a9\" -> )\n[info] org.scalatest.exceptions.TestFailedException:\n[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)\n[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)\n[info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)\n[info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)\n[info] at org.apache.spark.sql.execution.streaming.state.StateStoreSuite.$anonfun$new$39(StateStoreSuite.scala:414)\n[info] at org.apache.spark.sql.execution.streaming.state.StateStoreSuiteBase.tryWithProviderResource(StateStoreSuite.scala:1663)\n[info] at org.apache.spark.sql.execution.streaming.state.StateStoreSuite.$anonfun$new$38(StateStoreSuite.scala:394)\n18:32:09.694 WARN org.apache.spark.sql.execution.streaming.state.StateStoreSuite:\n\n===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.StateStoreSuite, threads: ForkJoinPool.commonPool-worker-1 (daemon=true) =====\n[info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)\n[info] at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)\n[info] at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)\n[info] at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)\n[info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)\n[info] at org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)\n[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)\n[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)\n[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)\n[info] at org.scalatest.Transformer.apply(Transformer.scala:22)\n[info] at org.scalatest.Transformer.apply(Transformer.scala:20)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)\n[info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)\n[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)\n[info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)\n[info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)\n[info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)\n[info] at org.apache.spark.sql.execution.streaming.state.StateStoreSuite.org$scalatest$BeforeAndAfter$$super$runTest(StateStoreSuite.scala:90)\n[info] at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:213)\n[info] at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:203)\n[info] at org.apache.spark.sql.execution.streaming.state.StateStoreSuite.runTest(StateStoreSuite.scala:90)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)\n[info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)\n[info] at scala.collection.immutable.List.foreach(List.scala:334)\n[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)\n[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)\n[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)\n[info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)\n[info] at org.scalatest.Suite.run(Suite.scala:1114)\n[info] at org.scalatest.Suite.run$(Suite.scala:1096)\n[info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)\n[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)\n[info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)\n[info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:69)\n[info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)\n[info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)\n[info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)\n[info] at org.apache.spark.sql.execution.streaming.state.StateStoreSuite.org$scalatest$BeforeAndAfter$$super$run(StateStoreSuite.scala:90)\n[info] at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:273)\n[info] at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:271)\n[info] at org.apache.spark.sql.execution.streaming.state.StateStoreSuite.run(StateStoreSuite.scala:90)\n[info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)\n[info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)\n[info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)\n[info] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n[info] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n[info] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n[info] at java.base/java.lang.Thread.run(Thread.java:840)\n[info] Run completed in 2 seconds, 4 milliseconds.\n[info] Total number of tests run: 1\n[info] Suites: completed 1, aborted 0\n[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0\n[info] *** 1 TEST FAILED ***\n\n```\n\nNo\n\nCloses #46351 from huanliwang-db/race.\n\nAuthored-by: Huanli Wang \n\nCloses #46415 from huanliwang-db/race-3.5.\n\nAuthored-by: Huanli Wang \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-48105][SS][3.5] Fix the race condition between state store unl…"}},{"before":"c1dd4a5df69340884f3f0f0c28ce916bf9e30159","after":"1a454287c01eb2ddda3e11afcc8c4885abc095b2","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-17T22:17:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48294][SQL][3.5] Handle lowercase in nestedTypeMissingElementTypeError\n\n### What changes were proposed in this pull request?\n\nBackport of #46623.\nHandle lowercase values inside of nestTypeMissingElementTypeError to prevent match errors.\n\n### Why are the changes needed?\n\nThe previous match error was not user-friendly. Now it gives an actionable `INCOMPLETE_TYPE_DEFINITION` error.\n\n### Does this PR introduce _any_ user-facing change?\n\nN/A\n\n### How was this patch tested?\n\nNewly added tests pass.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46643 from michaelzhan-db/SPARK-48294-3.5.\n\nAuthored-by: Michael Zhang \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48294][SQL][3.5] Handle lowercase in nestedTypeMissingElementT…"}},{"before":"3edd6c7e1d504860fefc9921208ec47ab562ae41","after":"51623785c38c9b17a6d91cb8e7f686459bd4803e","ref":"refs/heads/master","pushedAt":"2024-05-17T20:06:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hvanhovell","name":"Herman van Hovell","path":"/hvanhovell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9616802?s=80&v=4"},"commit":{"message":"[SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests\n\n### What changes were proposed in this pull request?\n\nIn [this previous PR](https://github.com/apache/spark/pull/46012), we introduced two new confs for the introduced plan cache - a static conf `spark.connect.session.planCache.maxSize` and a dynamic conf `spark.connect.session.planCache.enabled`. The plan cache is enabled by default with size 5. In this PR, we are marking them as internal because we don't expect users to deal with it.\n\n### Why are the changes needed?\n\nThese two confs are not expected to be used under normal circumstances, and we don't need to document them on the Spark Configuration reference page.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46638 from xi-db/SPARK-47818-plan-cache-followup2.\n\nAuthored-by: Xi Lyu \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnec…"}},{"before":"15fb4787354a2d5dc97afb31010beb1f3cc3b73d","after":"3edd6c7e1d504860fefc9921208ec47ab562ae41","ref":"refs/heads/master","pushedAt":"2024-05-17T12:35:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48312][SQL] Improve Alias.removeNonInheritableMetadata performance\n\n### What changes were proposed in this pull request?\nImprove `Alias.removeNonInheritableMetadata` performance - avoid using `MetadataBuilder` when there is no metadata or when there are no non-inheritable metadata keys to remove.\n\n### Why are the changes needed?\nIn case of wide VIEWs with many Aliases this method slows down the analysis\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nExisting tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46622 from vladimirg-db/vladimirg-db/improve-remove-non-inheritable-metadata-performance.\n\nAuthored-by: Vladimir Golubev \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48312][SQL] Improve Alias.removeNonInheritableMetadata perform…"}},{"before":"1ea156169ce87dad31c3241bd1ffeb63bc3ac60d","after":"15fb4787354a2d5dc97afb31010beb1f3cc3b73d","ref":"refs/heads/master","pushedAt":"2024-05-17T10:06:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48321][CONNECT][TESTS] Avoid using deprecated methods in dsl\n\n### What changes were proposed in this pull request?\nAvoid using deprecated methods in dsl\n\n### Why are the changes needed?\n`putAllRenameColumnsMap` was deprecated\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46635 from zhengruifeng/with_col_rename_dsl.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-48321][CONNECT][TESTS] Avoid using deprecated methods in dsl"}},{"before":"5643cfb71d343133a185aa257f137074f41abfb3","after":"1ea156169ce87dad31c3241bd1ffeb63bc3ac60d","ref":"refs/heads/master","pushedAt":"2024-05-17T08:03:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48319][PYTHON][CONNECT][TESTS] Test `assert_true` and `raise_error` with the same error class as Spark Classic\n\n### What changes were proposed in this pull request?\nTest `assert_true` and `raise_error` with the same error class as Spark Classic\n\n### Why are the changes needed?\nhttps://github.com/apache/spark/commit/578931678f5a6d6b33ebdae4bf866871e46fbb83 made `assert_true` and `raise_error` in Spark Connect throw `SparkRuntimeException`, then the error is the same as Spark Classic\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46633 from zhengruifeng/test_assert_raise.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48319][PYTHON][CONNECT][TESTS] Test assert_true and `raise_e…"}},{"before":"e07f1af03edf20a633a6051117a894390b41c5f1","after":"5643cfb71d343133a185aa257f137074f41abfb3","ref":"refs/heads/master","pushedAt":"2024-05-17T06:20:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48303][CORE] Reorganize `LogKeys`\n\n### What changes were proposed in this pull request?\nThe pr aims to `reorganize` `LogKeys`, includes:\n- remove some unused `LogLeys`\n ACTUAL_BROADCAST_OUTPUT_STATUS_SIZE\n DEFAULT_COMPACTION_INTERVAL\n DRIVER_LIBRARY_PATH_KEY\n EXISTING_JARS\n EXPECTED_ANSWER\n FILTERS\n HAS_R_PACKAGE\n JAR_ENTRY\n LOG_KEY_FILE\n NUM_ADDED_MASTERS\n NUM_ADDED_WORKERS\n NUM_PARTITION_VALUES\n OUTPUT_LINE\n OUTPUT_LINE_NUMBER\n PARTITIONS_SIZE\n RULE_BATCH_NAME\n SERIALIZE_OUTPUT_LENGTH\n SHELL_COMMAND\n STREAM_SOURCE\n\n- merge `PARAMETER` into `PARAM` (because some are `full` spelled, and some are `abbreviations`, which are not unified)\n ESTIMATOR_PARAMETER_MAP -> ESTIMATOR_PARAM_MAP\n FUNCTION_PARAMETER -> FUNCTION_PARAM\n METHOD_PARAMETER_TYPES -> METHOD_PARAM_TYPES\n\n- merge `NUMBER` into `NUM` (abbreviations)\n MIN_VERSION_NUMBER -> MIN_VERSION_NUM\n RULE_NUMBER_OF_RUNS -> NUM_RULE_OF_RUNS\n VERSION_NUMBER -> VERSION_NUM\n\n- merge `TOTAL` into `NUM`\n TOTAL_RECORDS_READ -> NUM_RECORDS_READ\n TRAIN_WORD_COUNT -> NUM_TRAIN_WORD\n\n- `NUM` as prefix\n CHECKSUM_FILE_NUM -> NUM_CHECKSUM_FILE\n DATA_FILE_NUM -> NUM_DATA_FILE\n INDEX_FILE_NUM -> NUM_INDEX_FILE\n\n- COUNR -> NUM\n EXECUTOR_DESIRED_COUNT -> NUM_EXECUTOR_DESIRED\n EXECUTOR_LAUNCH_COUNT -> NUM_EXECUTOR_LAUNCH\n EXECUTOR_TARGET_COUNT -> NUM_EXECUTOR_TARGET\n KAFKA_PULLS_COUNT -> NUM_KAFKA_PULLS\n KAFKA_RECORDS_PULLED_COUNT -> NUM_KAFKA_RECORDS_PULLED\n MIN_FREQUENT_PATTERN_COUNT -> MIN_NUM_FREQUENT_PATTERN\n POD_COUNT -> NUM_POD\n POD_SHARED_SLOT_COUNT -> NUM_POD_SHARED_SLOT\n POD_TARGET_COUNT -> NUM_POD_TARGET\n RETRY_COUNT -> NUM_RETRY\n\n- fix some `typo`\n MALFORMATTED_STIRNG -> MALFORMATTED_STRING\n\n- other\n MAX_LOG_NUM_POLICY -> MAX_NUM_LOG_POLICY\n WEIGHTED_NUM -> NUM_WEIGHTED_EXAMPLES\n\nChanges in other code are additional changes caused by the above adjustments.\n\n### Why are the changes needed?\nLet's make `LogKeys` easier to understand and more consistent.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46612 from panbingkun/reorganize_logkey.\n\nAuthored-by: panbingkun \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48303][CORE] Reorganize LogKeys"}},{"before":"74a1a76e811a0b6953468dc59b0f258fbd4d7691","after":"e07f1af03edf20a633a6051117a894390b41c5f1","ref":"refs/heads/master","pushedAt":"2024-05-17T05:52:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file`\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to enable the tests `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file`.\n\n### Why are the changes needed?\n\nTo make sure on the test coverage\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only.\n\n### How was this patch tested?\n\nCI in this PR.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46632 from HyukjinKwon/SPARK-48317.\n\nLead-authored-by: Hyukjin Kwon \nCo-authored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_…"}},{"before":"889820c1ff392983c52b55d80bd8d80be22785ab","after":"74a1a76e811a0b6953468dc59b0f258fbd4d7691","ref":"refs/heads/master","pushedAt":"2024-05-17T04:03:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row`\n\n### What changes were proposed in this pull request?\n\nThis PR fixes the test `test_apply_schema_to_row` to call `test_apply_schema_to_row` instead of `test_apply_schema_to_dict_and_rows`. It was a mistake.\n\n### Why are the changes needed?\n\nTo avoid a mistake when it's enabled in the future.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only\n\n### How was this patch tested?\n\nCI in this PR.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46631 from HyukjinKwon/minor-test-rename.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[MINOR][PYTHON][TESTS] Call test_apply_schema_to_dict_and_rows in `…"}},{"before":"714fc8cd872d6f583a6066e9ddb4a51caa51caf3","after":"889820c1ff392983c52b55d80bd8d80be22785ab","ref":"refs/heads/master","pushedAt":"2024-05-17T03:57:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str`\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to enable `DataFrameObservationParityTests.test_observe_str`.\n\n### Why are the changes needed?\n\nTo make sure on the test coverage\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only.\n\n### How was this patch tested?\n\nCI in this PR.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46630 from HyukjinKwon/SPARK-41625-followup.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObs…"}},{"before":"403619a3974c595ba80d6c9dbd23b8c2f1e2233e","after":"714fc8cd872d6f583a6066e9ddb4a51caa51caf3","ref":"refs/heads/master","pushedAt":"2024-05-17T03:09:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to enable `SparkFrameMethodsParityTests.test_coalesce` and `SparkFrameMethodsParityTests.test_repartition` in Spark Connect by avoiding RDD usage in the test.\n\n### Why are the changes needed?\n\nTo make sure on the test coverage\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only.\n\n### How was this patch tested?\n\nCI in this PR.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46629 from HyukjinKwon/SPARK-48316.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsP…"}},{"before":"b0e535217bf891f2320f2419d213e1c700e15b41","after":"403619a3974c595ba80d6c9dbd23b8c2f1e2233e","ref":"refs/heads/master","pushedAt":"2024-05-17T02:06:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48306][SQL] Improve UDT in error message\n\n### What changes were proposed in this pull request?\n\nThis PR improves UDT in error message when error occurs. Currently, an UDT is displayed as it's inner SQL type, which is ambiguous for debugging\n\n### Why are the changes needed?\n\nimprovement error message\n### Does this PR introduce _any_ user-facing change?\n\nNO\n\n### How was this patch tested?\n\nnew tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46616 from yaooqinn/SPARK-48306.\n\nAuthored-by: Kent Yao \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48306][SQL] Improve UDT in error message"}},{"before":"05e1706e5aa66a592e61b03263683a2dbbc64afe","after":"b0e535217bf891f2320f2419d213e1c700e15b41","ref":"refs/heads/master","pushedAt":"2024-05-17T01:56:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48301][SQL][FOLLOWUP] Update the error message\n\n### What changes were proposed in this pull request?\nUpdate the error message\n\n### Why are the changes needed?\nwe don't support `CREATE PROCEDURE` in spark, to address https://github.com/apache/spark/pull/46608#discussion_r1604205064\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46628 from zhengruifeng/nit_error.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-48301][SQL][FOLLOWUP] Update the error message"}},{"before":"153053fe6c3d62d8fa607cdcc5c4813a60a33aa1","after":"05e1706e5aa66a592e61b03263683a2dbbc64afe","ref":"refs/heads/master","pushedAt":"2024-05-17T01:28:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48310][PYTHON][CONNECT] Cached properties must return copies\n\n### What changes were proposed in this pull request?\nWhen a consumer modifies the result values of a cached property it will modify the value of the cached property.\n\nBefore:\n```python\ndf_columns = df.columns\nfor col in ['id', 'name']:\n df_columns.remove(col)\nassert len(df_columns) == df.columns\n```\n\nBut this is wrong and this patch fixes it to\n\n```python\ndf_columns = df.columns\nfor col in ['id', 'name']:\n df_columns.remove(col)\nassert len(df_columns) != df.columns\n```\n\n### Why are the changes needed?\nCorrectness of the API\n\n### Does this PR introduce _any_ user-facing change?\nNo, this makes the code consistent with Spark classic.\n\n### How was this patch tested?\nUT\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46621 from grundprinzip/grundprinzip/SPARK-48310.\n\nAuthored-by: Martin Grund \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48310][PYTHON][CONNECT] Cached properties must return copies"}},{"before":"e9d4152a319af4ad138ad1a6eb87bdf0b051ec9e","after":"153053fe6c3d62d8fa607cdcc5c4813a60a33aa1","ref":"refs/heads/master","pushedAt":"2024-05-16T23:36:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir\n\n### What changes were proposed in this pull request?\n\nThis PR adds `spark.checkpoint.dir` configuration so users can set the checkpoint dir when they submit their application.\n\n### Why are the changes needed?\n\nSeparate the configuration logic so the same app can run with a different checkpoint.\nIn addition, this would be useful for Spark Connect with https://github.com/apache/spark/pull/46570.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, it adds a new user-facing configuration.\n\n### How was this patch tested?\n\nunittest added\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46571 from HyukjinKwon/SPARK-48268.\n\nLead-authored-by: Hyukjin Kwon \nCo-authored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpoin…"}},{"before":"59f88c3725222b84b2d0b51ba40a769d99866b56","after":"e9d4152a319af4ad138ad1a6eb87bdf0b051ec9e","ref":"refs/heads/master","pushedAt":"2024-05-16T23:35:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test\n\n### What changes were proposed in this pull request?\n\nThis PR is a followup of https://github.com/apache/spark/pull/46267 that uses ANSI-enabled cast in the tests. It intentionally uses ANSI-enabled cast in `castColToType` when you look up a view.\n\n### Why are the changes needed?\n\nIn order to fix the scheduled CI build without ANSI:\n\n- https://github.com/apache/spark/actions/runs/9072308206/job/24960016975\n- https://github.com/apache/spark/actions/runs/9072308206/job/24960019187\n\n```\n[info] - look up view relation *** FAILED *** (72 milliseconds)\n[info] == FAIL: Plans do not match ===\n[info] 'SubqueryAlias spark_catalog.db3.view1 'SubqueryAlias spark_catalog.db3.view1\n[info] +- View (`spark_catalog`.`db3`.`view1`, ['col1, 'col2, 'a, 'b]) +- View (`spark_catalog`.`db3`.`view1`, ['col1, 'col2, 'a, 'b])\n[info] +- 'Project [cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, col1, 0, 1) as int) AS col1#0, cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, col2, 0, 1) as string) AS col2#0, cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, a, 0, 1) as int) AS a#0, cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, b, 0, 1) as string) AS b#0] +- 'Project [cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, col1, 0, 1) as int) AS col1#0, cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, col2, 0, 1) as string) AS col2#0, cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, a, 0, 1) as int) AS a#0, cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, b, 0, 1) as string) AS b#0]\n[info] +- 'Project [*] +- 'Project [*]\n[info] +- 'UnresolvedRelation [tbl1], [], false\n```\n\n```\n[info] - look up view created before Spark 3.0 *** FAILED *** (452 milliseconds)\n[info] == FAIL: Plans do not match ===\n[info] 'SubqueryAlias spark_catalog.db3.view2 'SubqueryAlias spark_catalog.db3.view2\n[info] +- View (`db3`.`view2`, ['col1, 'col2, 'a, 'b]) +- View (`db3`.`view2`, ['col1, 'col2, 'a, 'b])\n[info] +- 'Project [cast(getviewcolumnbynameandordinal(`db3`.`view2`, col1, 0, 1) as int) AS col1#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, col2, 0, 1) as string) AS col2#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, a, 0, 1) as int) AS a#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, b, 0, 1) as string) AS b#0] +- 'Project [cast(getviewcolumnbynameandordinal(`db3`.`view2`, col1, 0, 1) as int) AS col1#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, col2, 0, 1) as string) AS col2#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, a, 0, 1) as int) AS a#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, b, 0, 1) as string) AS b#0]\n[info] +- 'Project [*] +- 'Project [*]\n[info] +- 'UnresolvedRelation [tbl1], [], false +- 'UnresolvedRelation [tbl1], [], false (PlanTest.scala:179)\n\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, the main change has not been released yet.\n\n### How was this patch tested?\n\nManually ran the tests after ANSI disabled.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46614 from HyukjinKwon/SPARK-48031-followup.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test"}},{"before":"96e70ab579c3ac844815522fc6898d3e4dcb1882","after":null,"ref":"refs/heads/dependabot/bundler/docs/rexml-3.2.8","pushedAt":"2024-05-16T23:33:35.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"283b2ff422218b025e7b0170e4b7ed31a1294a80","after":"59f88c3725222b84b2d0b51ba40a769d99866b56","ref":"refs/heads/master","pushedAt":"2024-05-16T21:58:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError\n\n### What changes were proposed in this pull request?\n\nHandle lowercase values inside of nestTypeMissingElementTypeError to prevent match errors.\n\n### Why are the changes needed?\n\nThe previous match error was not user-friendly. Now it gives an actionable `INCOMPLETE_TYPE_DEFINITION` error.\n\n### Does this PR introduce _any_ user-facing change?\n\nN/A\n\n### How was this patch tested?\n\nNewly added tests pass.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46623 from michaelzhan-db/SPARK-48294.\n\nAuthored-by: Michael Zhang \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError"}},{"before":null,"after":"96e70ab579c3ac844815522fc6898d3e4dcb1882","ref":"refs/heads/dependabot/bundler/docs/rexml-3.2.8","pushedAt":"2024-05-16T21:44:08.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump rexml from 3.2.6 to 3.2.8 in /docs\n\nBumps [rexml](https://github.com/ruby/rexml) from 3.2.6 to 3.2.8.\n- [Release notes](https://github.com/ruby/rexml/releases)\n- [Changelog](https://github.com/ruby/rexml/blob/master/NEWS.md)\n- [Commits](https://github.com/ruby/rexml/compare/v3.2.6...v3.2.8)\n\n---\nupdated-dependencies:\n- dependency-name: rexml\n dependency-type: indirect\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump rexml from 3.2.6 to 3.2.8 in /docs"}},{"before":"57948c865e064469a75c92f8b58c632b9b40fdd3","after":"283b2ff422218b025e7b0170e4b7ed31a1294a80","ref":"refs/heads/master","pushedAt":"2024-05-16T18:55:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite*\n\n### What changes were proposed in this pull request?\nThe pr is follow up https://github.com/apache/spark/pull/46600\n\n to . Similarly, to maintain consistency, should be renamed to\n\n### Why are the changes needed?\nAfter `org.apache.spark.internal.Logger` is renamed to `org.apache.spark.internal.SparkLogger` and `org.apache.spark.internal.LoggerFactory` is renamed to `org.apache.spark.internal.SparkLoggerFactory.`, the related UT's names should also be `renamed`, so that developers can easily locate the related UT.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46615 from panbingkun/SPARK-48291_follow_up.\n\nAuthored-by: panbingkun \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLogg…"}},{"before":"3d3d18f14ba29074ca3ff8b661449ad45d84369e","after":"57948c865e064469a75c92f8b58c632b9b40fdd3","ref":"refs/heads/master","pushedAt":"2024-05-16T14:38:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48308][CORE] Unify getting data schema without partition columns in FileSourceStrategy\n\n### What changes were proposed in this pull request?\nCompute the schema of the data without partition columns only once in FileSourceStrategy.\n\n### Why are the changes needed?\nIn FileSourceStrategy, the schema of the data excluding partition columns is computed 2 times in a slightly different way, using an AttributeSet (`partitionSet`) and using the attributes directly (`partitionColumns`)\nThese don't have the exact same semantics, AttributeSet will only use expression ids for comparison while comparing with the actual attributes will use the name, type, nullability and metadata. We want to use the former here.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nExisting tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46619 from johanl-db/reuse-schema-without-partition-columns.\n\nAuthored-by: Johan Lasperas \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48308][CORE] Unify getting data schema without partition colum…"}},{"before":"4be0828e6e6afa6d9ab67958f5ef5fbe6814252d","after":"3d3d18f14ba29074ca3ff8b661449ad45d84369e","ref":"refs/heads/master","pushedAt":"2024-05-16T12:58:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE`\n\n### What changes were proposed in this pull request?\nRename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE`\n\n### Why are the changes needed?\n`IF NOT EXISTS` + `REPLACE` is standard restriction, not just for functions.\nRename it to make it reusable.\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nupdated tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46608 from zhengruifeng/sql_rename_if_not_exists_replace.\n\nLead-authored-by: Ruifeng Zheng \nCo-authored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE…"}},{"before":"fa83d0f8fce792b8db0ad1ed53cf80acdf4ea5de","after":"4be0828e6e6afa6d9ab67958f5ef5fbe6814252d","ref":"refs/heads/master","pushedAt":"2024-05-16T11:51:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48288] Add source data type for connector cast expression\n\nCurrently,\nV2ExpressionBuilder will build connector.Cast expression from catalyst.Cast expression.\nCatalyst cast have expression data type, but connector cast does not have it.\nSince some casts are not allowed on external engine, we need to know source and target data type, since we want finer granularity to block some unsupported casts.\n\n### What changes were proposed in this pull request?\nAdd source data type to connector `Cast` expression\n\n### Why are the changes needed?\nWe need finer granularity to allow implementors of `SQLBuilder` to disable some unsupported casts.\n\n### Does this PR introduce _any_ user-facing change?\nYes, visitCast function is changed, and it needs to be overriden again.\n\n### How was this patch tested?\nNo tests made. Simple code change.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46596 from urosstan-db/SPARK-48288-Add-source-data-type-to-connector-cast-expression.\n\nAuthored-by: Uros Stankovic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48288] Add source data type for connector cast expression"}},{"before":"3bd845ea930a4709b7a2f0447b5f8af64c697239","after":"fa83d0f8fce792b8db0ad1ed53cf80acdf4ea5de","ref":"refs/heads/master","pushedAt":"2024-05-16T10:13:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48296][SQL] Codegen Support for `to_xml`\n\n### What changes were proposed in this pull request?\nThe PR adds `Codegen Support` for `to_xml`.\n\n### Why are the changes needed?\nImprove codegen coverage.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\n- Add new UT & existed UT.\n- Pass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46591 from panbingkun/minor_to_xml.\n\nLead-authored-by: panbingkun \nCo-authored-by: panbingkun \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48296][SQL] Codegen Support for to_xml"}},{"before":"210ed2521d3dc1202cd1ba855ed5e729a5d940d0","after":"c1dd4a5df69340884f3f0f0c28ce916bf9e30159","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-16T09:30:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar\n\n### What changes were proposed in this pull request?\n\nTRANSFORM with char/varchar has been accidentally invalidated since 3.1 with a scala.MatchError, this PR fixes it\n\n### Why are the changes needed?\n\nbugfix\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nnew tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46603 from yaooqinn/SPARK-48297.\n\nAuthored-by: Kent Yao \nSigned-off-by: Kent Yao \n(cherry picked from commit 3bd845ea930a4709b7a2f0447b5f8af64c697239)\nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar"}},{"before":"b53d78e94f6e69c65d61d5e1b7d3e59a4815f620","after":"3bd845ea930a4709b7a2f0447b5f8af64c697239","ref":"refs/heads/master","pushedAt":"2024-05-16T09:29:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar\n\n### What changes were proposed in this pull request?\n\nTRANSFORM with char/varchar has been accidentally invalidated since 3.1 with a scala.MatchError, this PR fixes it\n\n### Why are the changes needed?\n\nbugfix\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nnew tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46603 from yaooqinn/SPARK-48297.\n\nAuthored-by: Kent Yao \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAETnboZgA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}