{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":436912897,"defaultBranch":"main","name":"celeborn","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2021-12-10T08:57:16.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1716968266.0","currentOid":""},"activityList":{"items":[{"before":"fd490013aecc2971a7c396f117b8df1465f3775e","after":"2a57fab8698a93049021caf41169c9df2f18bb0e","ref":"refs/heads/main","pushedAt":"2024-05-30T09:22:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"RexXiong","name":"Shuang","path":"/RexXiong","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/28799061?s=80&v=4"},"commit":{"message":"[CELEBORN-1400] Bump Ratis version from 2.5.1 to 3.0.1\n\n### What changes were proposed in this pull request?\n\nBump Ratis version from 2.5.1 to 3.0.1. Address incompatible changes:\n\n- RATIS-589. Eliminate buffer copying in SegmentedRaftLogOutputStream.(https://github.com/apache/ratis/pull/964)\n- RATIS-1677. Do not auto format RaftStorage in RECOVER.(https://github.com/apache/ratis/pull/718)\n- RATIS-1710. Refactor metrics api and implementation to separated modules. (https://github.com/apache/ratis/pull/749)\n\n### Why are the changes needed?\n\nBump Ratis version from 2.5.1 to 3.0.1. Ratis has released v3.0.0, v3.0.1, which release note refers to [3.0.0](https://ratis.apache.org/post/3.0.0.html), [3.0.1](https://ratis.apache.org/post/3.0.1.html). The 3.0.x version include new features like pluggable metrics and lease read, etc, some improvements and bugfixes including:\n\n- 3.0.0: Change list of ratis 3.0.0 In total, there are roughly 100 commits diffing from 2.5.1 including:\n - Incompatible Changes\n - RaftStorage Auto-Format\n - RATIS-1677. Do not auto format RaftStorage in RECOVER. (https://github.com/apache/ratis/pull/718)\n - RATIS-1694. Fix the compatibility issue of RATIS-1677. (https://github.com/apache/ratis/pull/731)\n - RATIS-1871. Auto format RaftStorage when there is only one directory configured. (https://github.com/apache/ratis/pull/903)\n - Pluggable Ratis-Metrics (RATIS-1688)\n - RATIS-1689. Remove the use of the thirdparty Gauge. (https://github.com/apache/ratis/pull/728)\n - RATIS-1692. Remove the use of the thirdparty Counter. (https://github.com/apache/ratis/pull/732)\n - RATIS-1693. Remove the use of the thirdparty Timer. (https://github.com/apache/ratis/pull/734)\n - RATIS-1703. Move MetricsReporting and JvmMetrics to impl. (https://github.com/apache/ratis/pull/741)\n - RATIS-1704. Fix SuppressWarnings(“VisibilityModifier”) in RatisMetrics. (https://github.com/apache/ratis/pull/742)\n - RATIS-1710. Refactor metrics api and implementation to separated modules. (https://github.com/apache/ratis/pull/749)\n - RATIS-1712. Add a dropwizard 3 implementation of ratis-metrics-api. (https://github.com/apache/ratis/pull/751)\n - RATIS-1391. Update library dropwizard.metrics version to 4.x (https://github.com/apache/ratis/pull/632)\n - RATIS-1601. Use the shaded dropwizard metrics and remove the dependency (https://github.com/apache/ratis/pull/671)\n - Streaming Protocol Change\n - RATIS-1569. Move the asyncRpcApi.sendForward(..) call to the client side. (https://github.com/apache/ratis/pull/635)\n - New Features\n - Leader Lease (RATIS-1864)\n - RATIS-1865. Add leader lease bound ratio configuration (https://github.com/apache/ratis/pull/897)\n - RATIS-1866. Maintain leader lease after AppendEntries (https://github.com/apache/ratis/pull/898)\n - RATIS-1894. Implement ReadOnly based on leader lease (https://github.com/apache/ratis/pull/925)\n - RATIS-1882. Support read-after-write consistency (https://github.com/apache/ratis/pull/913)\n - StateMachine API\n - RATIS-1874. Add notifyLeaderReady function in IStateMachine (https://github.com/apache/ratis/pull/906)\n - RATIS-1897. Make TransactionContext available in DataApi.write(..). (https://github.com/apache/ratis/pull/930)\n - New Configuration Properties\n - RATIS-1862. Add the parameter whether to take Snapshot when stopping to adapt to different services (https://github.com/apache/ratis/pull/896)\n - RATIS-1930. Add a conf for enable/disable majority-add. (https://github.com/apache/ratis/pull/961)\n - RATIS-1918. Introduces parameters that separately control the shutdown of RaftServerProxy by JVMPauseMonitor. (https://github.com/apache/ratis/pull/950)\n - RATIS-1636. Support re-config ratis properties (https://github.com/apache/ratis/pull/800)\n - RATIS-1860. Add ratis-shell cmd to generate a new raft-meta.conf. (https://github.com/apache/ratis/pull/901)\n - Improvements & Bug Fixes\n - Netty\n - RATIS-1898. Netty should use EpollEventLoopGroup by default (https://github.com/apache/ratis/pull/931)\n - RATIS-1899. Use EpollEventLoopGroup for Netty Proxies (https://github.com/apache/ratis/pull/932)\n - RATIS-1921. Shared worker group in WorkerGroupGetter should be closed. (https://github.com/apache/ratis/pull/955)\n - RATIS-1923. Netty: atomic operations require side-effect-free functions. (https://github.com/apache/ratis/pull/956)\n - RaftServer\n - RATIS-1924. Increase the default of raft.server.log.segment.size.max. (https://github.com/apache/ratis/pull/957)\n - RATIS-1892. Unify the lifetime of the RaftServerProxy thread pool (https://github.com/apache/ratis/pull/923)\n - RATIS-1889. NoSuchMethodError: RaftServerMetricsImpl.addNumPendingRequestsGauge https://github.com/apache/ratis/pull/922 (https://github.com/apache/ratis/pull/922)\n - RATIS-761. Handle writeStateMachineData failure in leader. (https://github.com/apache/ratis/pull/927)\n - RATIS-1902. The snapshot index is set incorrectly in InstallSnapshotReplyProto. (https://github.com/apache/ratis/pull/933)\n - RATIS-1912. Fix infinity election when perform membership change. (https://github.com/apache/ratis/pull/954)\n - RATIS-1858. Follower keeps logging first election timeout. (https://github.com/apache/ratis/pull/894)\n\n- 3.0.1:This is a bugfix release. See the [changes between 3.0.0 and 3.0.1](https://github.com/apache/ratis/compare/ratis-3.0.0...ratis-3.0.1) releases.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nCluster manual test.\n\nCloses #2480 from SteNicholas/CELEBORN-1400.\n\nAuthored-by: SteNicholas \nSigned-off-by: Shuang ","shortMessageHtmlLink":"[CELEBORN-1400] Bump Ratis version from 2.5.1 to 3.0.1"}},{"before":"6346b591aa13513d786d1e687bd73cc3c17e082f","after":"fd490013aecc2971a7c396f117b8df1465f3775e","ref":"refs/heads/main","pushedAt":"2024-05-30T03:40:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"Revert \"[CELEBORN-1388] Use finer grained locks in changePartitionManager\"\n\nThis reverts commit 9f304798cb2147fe4e9d900e85832c1034397863.","shortMessageHtmlLink":"Revert \"[CELEBORN-1388] Use finer grained locks in changePartitionMan…"}},{"before":null,"after":"6346b591aa13513d786d1e687bd73cc3c17e082f","ref":"refs/heads/branch-0.5","pushedAt":"2024-05-29T07:37:46.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1441] RocksDBLogger uses Logger#Logger(InfoLogLevel) instead of deprecated constructor of o.rocksdb.Logger\n\n### What changes were proposed in this pull request?\n\n`RocksDBLogger` uses `Logger#Logger(InfoLogLevel)` instead of deprecated constructor of `o.rocksdb.Logger` to clean up the use of deprecated APIs related to `o.rocksdb.Logger`.\n\n### Why are the changes needed?\n\n`RocksDBLogger` uses `Logger#Logger(InfoLogLevel)` instead of deprecated constructor of `o.rocksdb.Logger` that refers to [Logger.java#L39-L54](https://github.com/facebook/rocksdb/blob/5c2be544f5509465957706c955b6d623e889ac4e/java/src/main/java/org/rocksdb/Logger.java#L39-L54).\n\n```\n/**\n *

AbstractLogger constructor.

\n *\n *

Important: the log level set within\n * the {link org.rocksdb.Options} instance will be used as\n * maximum log level of RocksDB.

\n *\n * param options {link org.rocksdb.Options} instance.\n *\n * deprecated Use {link Logger#Logger(InfoLogLevel)} instead, e.g. {code new\n * Logger(options.infoLogLevel())}.\n */\nDeprecated\npublic Logger(final Options options) {\n this(options.infoLogLevel());\n}\n```\n\nBackport https://github.com/apache/spark/pull/46436.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nGA.\n\nCloses #2533 from SteNicholas/CELEBORN-1441.\n\nAuthored-by: SteNicholas \nSigned-off-by: Shuang ","shortMessageHtmlLink":"[CELEBORN-1441] RocksDBLogger uses Logger#Logger(InfoLogLevel) instea…"}},{"before":"043a20e85cb23213d04143e5ce1c1343a383a355","after":"6346b591aa13513d786d1e687bd73cc3c17e082f","ref":"refs/heads/main","pushedAt":"2024-05-29T04:05:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"RexXiong","name":"Shuang","path":"/RexXiong","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/28799061?s=80&v=4"},"commit":{"message":"[CELEBORN-1441] RocksDBLogger uses Logger#Logger(InfoLogLevel) instead of deprecated constructor of o.rocksdb.Logger\n\n### What changes were proposed in this pull request?\n\n`RocksDBLogger` uses `Logger#Logger(InfoLogLevel)` instead of deprecated constructor of `o.rocksdb.Logger` to clean up the use of deprecated APIs related to `o.rocksdb.Logger`.\n\n### Why are the changes needed?\n\n`RocksDBLogger` uses `Logger#Logger(InfoLogLevel)` instead of deprecated constructor of `o.rocksdb.Logger` that refers to [Logger.java#L39-L54](https://github.com/facebook/rocksdb/blob/5c2be544f5509465957706c955b6d623e889ac4e/java/src/main/java/org/rocksdb/Logger.java#L39-L54).\n\n```\n/**\n *

AbstractLogger constructor.

\n *\n *

Important: the log level set within\n * the {link org.rocksdb.Options} instance will be used as\n * maximum log level of RocksDB.

\n *\n * param options {link org.rocksdb.Options} instance.\n *\n * deprecated Use {link Logger#Logger(InfoLogLevel)} instead, e.g. {code new\n * Logger(options.infoLogLevel())}.\n */\nDeprecated\npublic Logger(final Options options) {\n this(options.infoLogLevel());\n}\n```\n\nBackport https://github.com/apache/spark/pull/46436.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nGA.\n\nCloses #2533 from SteNicholas/CELEBORN-1441.\n\nAuthored-by: SteNicholas \nSigned-off-by: Shuang ","shortMessageHtmlLink":"[CELEBORN-1441] RocksDBLogger uses Logger#Logger(InfoLogLevel) instea…"}},{"before":"493e0f10cfae5e1283b726b586684eac13b6f732","after":"043a20e85cb23213d04143e5ce1c1343a383a355","ref":"refs/heads/main","pushedAt":"2024-05-28T08:46:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1432] ShuffleClientImpl should invoke loadFileGroupInternal only once when using the reduce partition mode\n\n### What changes were proposed in this pull request?\n\n`ShuffleClientImpl` invokes `loadFileGroupInternal` only once when using the reduce partition mode.\n\n### Why are the changes needed?\n\n`ShuffleClientImpl` may call `loadFileGroupInternal` multiple times when using reduce partition mode, which is not as expected. This bug was introduced in #2219.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nGA.\n\nCloses #2531 from SteNicholas/CELEBORN-1432.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1432] ShuffleClientImpl should invoke loadFileGroupInternal…"}},{"before":"dd87419044c8a855abd65f7a237f9f8e0d71a657","after":"493e0f10cfae5e1283b726b586684eac13b6f732","ref":"refs/heads/main","pushedAt":"2024-05-27T07:12:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SteNicholas","name":"Nicholas Jiang","path":"/SteNicholas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10048174?s=80&v=4"},"commit":{"message":"[CELEBORN-1317][FOLLOWUP] Fix threadDump UT stuck issue\n\n### What changes were proposed in this pull request?\n\nTry to fix ApiWorkerResourceSuite::threadDump UT stuck issue.\n1. Using program way to get thread dump.\n\nRelated code copied from apache/spark\nhttps://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/util/Utils.scala\nhttps://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/status/api/v1/api.scala\n\n### Why are the changes needed?\nI found that sometimes the UT stuck for threadDump api:\nFor example: https://github.com/apache/celeborn/actions/runs/8462056188/job/23182806487?pr=2428\n\"image\"\n\n\"image\"\n\nthreadDump api UT is new introduced in [CELEBORN-1317](https://issues.apache.org/jira/browse/CELEBORN-1317).\n\nBefore there is no UT to cover that, and now it stuck sometimes.\n\nAnd for getThreadDump, before it leverages processBuilder to get the thread info.\n\nI wonder that the process is stuck because of some unknown reason, so, in this pr, we try to use program way to get thread info.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\n\nUT.\n\n![image](https://github.com/apache/celeborn/assets/6757692/51aaa44e-0523-4b60-b6c8-f4e83c709497)\n\nCloses #2429 from turboFei/thread_dump.\n\nLead-authored-by: Fei Wang \nCo-authored-by: SteNicholas \nSigned-off-by: SteNicholas ","shortMessageHtmlLink":"[CELEBORN-1317][FOLLOWUP] Fix threadDump UT stuck issue"}},{"before":"d7c67512f3c2480a5ad8a8d34c77cf2dd5ffc671","after":"794ca7e1faa555e3829f67060776970032b15e38","ref":"refs/heads/branch-0.4","pushedAt":"2024-05-27T06:10:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1380][FOLLOWUP] leveldbjni uses org.openlabtesting.leveldbjni to support linux aarch64 platform for leveldb via aarch64 profile\n\nDependency leveldbjni uses `org.openlabtesting.leveldbjni` to support linux aarch64 platform for leveldb via `aarch64` profile.\n\nFollow up #2476.\n\nCeleborn worker could not start on arm arch devices if db backend is `LevelDB`, which should support leveldbjni on the aarch64 platform.\n\naarch64 uses `org.openlabtesting.leveldbjni:leveldbjni-all.1.8`, and other platforms use `org.fusesource.leveldbjni:leveldbjni-all.1.8`. Meanwhile, because some hadoop dependencies packages are also depend on `org.fusesource.leveldbjni:leveldbjni-all`, but hadoop merge the similar change on trunk, details see\n[HADOOP-16614](https://issues.apache.org/jira/browse/HADOOP-16614), therefore it should exclude the dependency of `org.fusesource.leveldbjni` for these hadoop packages related.\n\nIn addtion, `org.openlabtesting.leveldbjni` requires glibc version 3.4.21. Otherwise, there will be the following potential runtime risks:\n\n```\n\n--------------- T H R E A D ---------------\n\nCurrent thread (0x00007f9308001000): JavaThread \"leveldb\" [_thread_in_native, id=878, stack(0x00007f9338cf0000,0x00007f93394f0000)]\n\nsiginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007f97380d2220\n```\n\nBackport:\n\n- https://github.com/apache/spark/pull/26636\n- https://github.com/apache/spark/pull/31036\n\nNo.\n\nNo.\n\nCloses #2530 from SteNicholas/CELEBORN-1380.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji \n(cherry picked from commit dd87419044c8a855abd65f7a237f9f8e0d71a657)\nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1380][FOLLOWUP] leveldbjni uses org.openlabtesting.leveldbj…"}},{"before":"4a3552103465365f216542a7e04c213ce43d20ed","after":"dd87419044c8a855abd65f7a237f9f8e0d71a657","ref":"refs/heads/main","pushedAt":"2024-05-27T06:07:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1380][FOLLOWUP] leveldbjni uses org.openlabtesting.leveldbjni to support linux aarch64 platform for leveldb via aarch64 profile\n\n### What changes were proposed in this pull request?\n\nDependency leveldbjni uses `org.openlabtesting.leveldbjni` to support linux aarch64 platform for leveldb via `aarch64` profile.\n\nFollow up #2476.\n\n### Why are the changes needed?\n\nCeleborn worker could not start on arm arch devices if db backend is `LevelDB`, which should support leveldbjni on the aarch64 platform.\n\naarch64 uses `org.openlabtesting.leveldbjni:leveldbjni-all.1.8`, and other platforms use `org.fusesource.leveldbjni:leveldbjni-all.1.8`. Meanwhile, because some hadoop dependencies packages are also depend on `org.fusesource.leveldbjni:leveldbjni-all`, but hadoop merge the similar change on trunk, details see\n[HADOOP-16614](https://issues.apache.org/jira/browse/HADOOP-16614), therefore it should exclude the dependency of `org.fusesource.leveldbjni` for these hadoop packages related.\n\nIn addtion, `org.openlabtesting.leveldbjni` requires glibc version 3.4.21. Otherwise, there will be the following potential runtime risks:\n\n```\n#\n# A fatal error has been detected by the Java Runtime Environment:\n#\n# SIGBUS (0x7) at pc=0x00007fad3630b12a, pid=62, tid=0x00007f93394ef700\n#\n# JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12)\n# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode linux-amd64 )\n# Problematic frame:\n# C [libc.so.6+0x8412a]\n#\n# Core dump written. Default location: /data/service/celeborn/core or core.62\n#\n# If you would like to submit a bug report, please visit:\n# http://bugreport.java.com/bugreport/crash.jsp\n# The crash happened outside the Java Virtual Machine in native code.\n# See problematic frame for where to report the bug.\n#\n\n--------------- T H R E A D ---------------\n\nCurrent thread (0x00007f9308001000): JavaThread \"leveldb\" [_thread_in_native, id=878, stack(0x00007f9338cf0000,0x00007f93394f0000)]\n\nsiginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007f97380d2220\n```\n\nBackport:\n\n- https://github.com/apache/spark/pull/26636\n- https://github.com/apache/spark/pull/31036\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nNo.\n\nCloses #2530 from SteNicholas/CELEBORN-1380.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1380][FOLLOWUP] leveldbjni uses org.openlabtesting.leveldbj…"}},{"before":"74102d8e3e214418066c146e0c0bef7e0ef46262","after":"d7c67512f3c2480a5ad8a8d34c77cf2dd5ffc671","ref":"refs/heads/branch-0.4","pushedAt":"2024-05-24T15:28:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SteNicholas","name":"Nicholas Jiang","path":"/SteNicholas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10048174?s=80&v=4"},"commit":{"message":"[MINOR] Remove incubator in vcs.xml\n\n### What changes were proposed in this pull request?\n\n### Why are the changes needed?\n\n### Does this PR introduce _any_ user-facing change?\n\n### How was this patch tested?\n\nCloses #2528 from cxzl25/minor_incubator_vcs.\n\nAuthored-by: sychen \nSigned-off-by: SteNicholas \n(cherry picked from commit 4a3552103465365f216542a7e04c213ce43d20ed)\nSigned-off-by: SteNicholas ","shortMessageHtmlLink":"[MINOR] Remove incubator in vcs.xml"}},{"before":"40ce49cbd5af6d15187fa3d0e7e4f5e71086782c","after":"4a3552103465365f216542a7e04c213ce43d20ed","ref":"refs/heads/main","pushedAt":"2024-05-24T15:28:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SteNicholas","name":"Nicholas Jiang","path":"/SteNicholas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10048174?s=80&v=4"},"commit":{"message":"[MINOR] Remove incubator in vcs.xml\n\n### What changes were proposed in this pull request?\n\n### Why are the changes needed?\n\n### Does this PR introduce _any_ user-facing change?\n\n### How was this patch tested?\n\nCloses #2528 from cxzl25/minor_incubator_vcs.\n\nAuthored-by: sychen \nSigned-off-by: SteNicholas ","shortMessageHtmlLink":"[MINOR] Remove incubator in vcs.xml"}},{"before":"f527b22b4d579b5ccd37a5de29df391d7e42e0ac","after":"40ce49cbd5af6d15187fa3d0e7e4f5e71086782c","ref":"refs/heads/main","pushedAt":"2024-05-24T10:20:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1438] Exclude celeborn-service_xx-test jar\n\n### What changes were proposed in this pull request?\nadd scope test for testjars\n\n### Why are the changes needed?\nRelease package should not include test jars\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GA\n\nCloses #2527 from RexXiong/CELEBORN-1438.\n\nAuthored-by: Shuang \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1438] Exclude celeborn-service_xx-test jar"}},{"before":"a2feb1287678491300f2ea7c2ae93a221f471e35","after":"74102d8e3e214418066c146e0c0bef7e0ef46262","ref":"refs/heads/branch-0.4","pushedAt":"2024-05-23T13:50:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"waitinfuture","name":"Keyong Zhou","path":"/waitinfuture","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/948245?s=80&v=4"},"commit":{"message":"[CELEBORN-1410] Combine multiple ShuffleBlockInfo into a single ShuffleBlockInfo\n\nMerging smaller `ShuffleBlockInfo` corresponding into same mapID, such that size of each block does not exceeds `celeborn.shuffle.chunk.size`\n\nAs sorted ShuffleBlocks are contiguous, we can compact multiple `ShuffleBlockInfo` into one as long as the size of compacted one does not exceeds half of `celeborn.shuffle.chunk.size`. This way we can decrease the number of ShuffleBlock objects.\n\nNo\n\nExisting UTs\n\nCloses #2524 from s0nskar/CELEBORN-1410.\n\nLead-authored-by: Sanskar Modi \nCo-authored-by: Fu Chen \nSigned-off-by: zky.zhoukeyong ","shortMessageHtmlLink":"[CELEBORN-1410] Combine multiple ShuffleBlockInfo into a single Shuff…"}},{"before":"89d56c9bbcef3249ff92d5e727ab1cbfd4a0057b","after":"f527b22b4d579b5ccd37a5de29df391d7e42e0ac","ref":"refs/heads/main","pushedAt":"2024-05-23T13:22:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"waitinfuture","name":"Keyong Zhou","path":"/waitinfuture","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/948245?s=80&v=4"},"commit":{"message":"[CELEBORN-1410] Combine multiple ShuffleBlockInfo into a single ShuffleBlockInfo\n\n### What changes were proposed in this pull request?\n\nMerging smaller `ShuffleBlockInfo` corresponding into same mapID, such that size of each block does not exceeds `celeborn.shuffle.chunk.size`\n\n### Why are the changes needed?\nAs sorted ShuffleBlocks are contiguous, we can compact multiple `ShuffleBlockInfo` into one as long as the size of compacted one does not exceeds half of `celeborn.shuffle.chunk.size`. This way we can decrease the number of ShuffleBlock objects.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nExisting UTs\n\nCloses #2524 from s0nskar/CELEBORN-1410.\n\nLead-authored-by: Sanskar Modi \nCo-authored-by: Fu Chen \nSigned-off-by: zky.zhoukeyong ","shortMessageHtmlLink":"[CELEBORN-1410] Combine multiple ShuffleBlockInfo into a single Shuff…"}},{"before":"308eed28c92263d7f96cce073c0c9916da5cb1bf","after":"89d56c9bbcef3249ff92d5e727ab1cbfd4a0057b","ref":"refs/heads/main","pushedAt":"2024-05-23T13:10:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-914] Support memory file storage\n\n### What changes were proposed in this pull request?\nTo support memory file storage.\n\n### Why are the changes needed?\nTo improve shuffle performance for small shuffle files.\n\nDesign doc: https://docs.google.com/document/d/1SM-oOM0JHEIoRHTYhE9PYH60_1D3NMxDR50LZIM7uW0/edit?usp=sharing\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA and manually test on a cluster.\n\nCloses #2300 from FMX/B914.\n\nAuthored-by: mingji \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-914] Support memory file storage"}},{"before":"ba499704d1b058000ac20cffda93a6a39578eabe","after":"308eed28c92263d7f96cce073c0c9916da5cb1bf","ref":"refs/heads/main","pushedAt":"2024-05-23T08:06:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SteNicholas","name":"Nicholas Jiang","path":"/SteNicholas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10048174?s=80&v=4"},"commit":{"message":"[CELEBORN-1427] Add Capacity metrics for Celeborn\n\n### What changes were proposed in this pull request?\nAs title\n\n### Why are the changes needed?\nThe Celeborn cluster does not currently provide metrics for 'TotalCapacity' and 'TotalFreeCapacity\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA\n\nCloses #2521 from RexXiong/CELEBORN-1427.\n\nAuthored-by: Shuang \nSigned-off-by: SteNicholas ","shortMessageHtmlLink":"[CELEBORN-1427] Add Capacity metrics for Celeborn"}},{"before":"44e060733b3265e2d8e3f2f889b8ebf8ff318d92","after":"a2feb1287678491300f2ea7c2ae93a221f471e35","ref":"refs/heads/branch-0.4","pushedAt":"2024-05-22T08:37:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1380][FOLLOWUP] Add org.openlabtesting.leveldbjni in BSD 3-clause of LICENSE-binary\n\n### What changes were proposed in this pull request?\n\nAdd `org.openlabtesting.leveldbjni` in BSD 3-clause of LICENSE-binary instead of `org.fusesource.leveldbjni`.\n\n### Why are the changes needed?\n\nBSD 3-clause of LICENSE-binary includes `org.fusesource.leveldbjni` which has been already changed to `org.openlabtesting.leveldbjni`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nNo.\n\nCloses #2526 from SteNicholas/CELEBORN-1380.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji \n(cherry picked from commit ba499704d1b058000ac20cffda93a6a39578eabe)\nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1380][FOLLOWUP] Add org.openlabtesting.leveldbjni in BSD 3-…"}},{"before":"cd5609971f38b31b6f2cae6462c12bcb2c1aaa66","after":"ba499704d1b058000ac20cffda93a6a39578eabe","ref":"refs/heads/main","pushedAt":"2024-05-22T08:36:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1380][FOLLOWUP] Add org.openlabtesting.leveldbjni in BSD 3-clause of LICENSE-binary\n\n### What changes were proposed in this pull request?\n\nAdd `org.openlabtesting.leveldbjni` in BSD 3-clause of LICENSE-binary instead of `org.fusesource.leveldbjni`.\n\n### Why are the changes needed?\n\nBSD 3-clause of LICENSE-binary includes `org.fusesource.leveldbjni` which has been already changed to `org.openlabtesting.leveldbjni`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nNo.\n\nCloses #2526 from SteNicholas/CELEBORN-1380.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1380][FOLLOWUP] Add org.openlabtesting.leveldbjni in BSD 3-…"}},{"before":"a13d16761772f638245a449187b0dc01e0b9ae12","after":"cd5609971f38b31b6f2cae6462c12bcb2c1aaa66","ref":"refs/heads/main","pushedAt":"2024-05-22T07:32:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1434] Support MRAppMasterWithCeleborn to disable job recovery and job reduce slow start by default\n\n### What changes were proposed in this pull request?\n\n`MRAppMasterWithCeleborn` disables `yarn.app.mapreduce.am.job.recovery.enable` and sets `mapreduce.job.reduce.slowstart.completedmaps` to 1 by default.\n\n### Why are the changes needed?\n\nMapReduce does not set the flag which indicates whether to keep containers across application attempts in ApplicationSubmissionContext. Meanwhile, make sure reduces are scheduled only after all map are completed. Therefore, `MRAppMasterWithCeleborn` could disable `yarn.app.mapreduce.am.job.recovery.enable` and set `mapreduce.job.reduce.slowstart.completedmaps` to 1 by default.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\n`WordCountTest`\n\nCloses #2525 from SteNicholas/CELEBORN-1434.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1434] Support MRAppMasterWithCeleborn to disable job recove…"}},{"before":"8a10a2d465c5ec9d947128d9828bcd5bb1f6bfa8","after":"a13d16761772f638245a449187b0dc01e0b9ae12","ref":"refs/heads/main","pushedAt":"2024-05-17T09:08:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"RexXiong","name":"Shuang","path":"/RexXiong","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/28799061?s=80&v=4"},"commit":{"message":"[CELEBORN-1401] Add SSL support for ratis communication\n\n### What changes were proposed in this pull request?\n\nWhen SSL is enabled for master, secure the Ratis communication as well with TLS\n\n### Why are the changes needed?\n\nCurrently, when TLS is enabled for RPC, Ratis comms still goes in the clear - add support for TLS.\nNote that currently this only supports GRPC, and not netty.\n\n### Does this PR introduce _any_ user-facing change?\nSecures ratis communication when TLS is enabled at master for rpc.\n\n### How was this patch tested?\nLocal tests and additional unit tests added\n\nCloses #2515 from mridulm/CELEBORN-1401-add-ratis-ssl-support.\n\nAuthored-by: Mridul Muralidharan \nSigned-off-by: Shuang ","shortMessageHtmlLink":"[CELEBORN-1401] Add SSL support for ratis communication"}},{"before":"9908035ba890aab5d335d80e36333c47369ff67c","after":"8a10a2d465c5ec9d947128d9828bcd5bb1f6bfa8","ref":"refs/heads/main","pushedAt":"2024-05-17T06:06:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SteNicholas","name":"Nicholas Jiang","path":"/SteNicholas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10048174?s=80&v=4"},"commit":{"message":"[CELEBORN-1421] Refine code in master to reduce unnecessary sync to get workers/lostworkers/shutdownWorkers\n\n### What changes were proposed in this pull request?\n\n1. Use ConcurrentSet to replace ArrayList for workers.\n2. Remove unnecessary sync and snapshot when get workers/lostworkers/shutdownWorkers\n\n### Why are the changes needed?\n\n1. Reduce unnecessary sync to get workers/lostworkers/shutdownWorkers.\n2. Somewhere in the Master, directly using statusSystem.workers(ArrayList) is not safe, potentially leading to concurrent modification issues.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GA\n\nCloses #2507 from RexXiong/CELEBORN-1421.\n\nAuthored-by: Shuang \nSigned-off-by: SteNicholas ","shortMessageHtmlLink":"[CELEBORN-1421] Refine code in master to reduce unnecessary sync to g…"}},{"before":"5ea83f39bb9d13cca1976afc357a1476f5c0bf2e","after":"9908035ba890aab5d335d80e36333c47369ff67c","ref":"refs/heads/main","pushedAt":"2024-05-17T03:03:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SteNicholas","name":"Nicholas Jiang","path":"/SteNicholas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10048174?s=80&v=4"},"commit":{"message":"[CELEBORN-1402] SparkShuffleManager print warning log for spark.executor.userClassPathFirst=true with ShuffleManager defined in user jar\n\n### What changes were proposed in this pull request?\n\n`SparkShuffleManager` print warning log for `spark.executor.userClassPathFirst=true` with `ShuffleManager` defined in user jar via `--jar` or `spark.jars`.\n\n### Why are the changes needed?\n\nWhen `spark.executor.userClassPathFirst` is enabled with ShuffleManager defined in user jar, the `ClassLoader` of `handle` is `ChildFirstURLClassLoader`, which is different from `CelebornShuffleHandle` of which the `ClassLoader` is `AppClassLoader` in `SparkShuffleManager#getWriter/getReader`. The local test log is as follows:\n\n```\n./bin/spark-sql --master yarn --deploy-mode client \\\n--conf spark.celeborn.master.endpoints=localhost:9099 \\\n--conf spark.executor.userClassPathFirst=true \\\n--conf spark.jars=/tmp/celeborn-client-spark-3-shaded_2.12-0.5.0-SNAPSHOT.jar \\\n--conf spark.shuffle.manager=org.apache.spark.shuffle.celeborn.SparkShuffleManager \\\n--conf spark.shuffle.service.enabled=false\n\n./bin/spark-sql --master yarn --deploy-mode client --jars /tmp/celeborn-client-spark-3-shaded_2.12-0.5.0-SNAPSHOT.jar \\\n--conf spark.celeborn.master.endpoints=localhost:9099 \\\n--conf spark.executor.userClassPathFirst=true \\\n--conf spark.shuffle.manager=org.apache.spark.shuffle.celeborn.SparkShuffleManager \\\n--conf spark.shuffle.service.enabled=false\n```\n```\n24/04/28 18:03:31 [Executor task launch worker for task 0.0 in stage 5.0 (TID 8)] WARN SparkShuffleManager: [getWriter] handle classloader: org.apache.spark.util.ChildFirstURLClassLoader, CelebornShuffleHandle classloader: sun.misc.Launcher$AppClassLoader\n```\n\nIt causes that `SparkShuffleManager` fallback to vanilla Spark `SortShuffleManager` for `spark.executor.userClassPathFirst=true` with `ShuffleManager` defined in user jar before https://github.com/apache/spark/pull/43627. After [SPARK-45762](https://issues.apache.org/jira/browse/SPARK-45762), the `ClassLoader` of `handle` and `CelebornShuffleHandle` are both `ChildFirstURLClassLoader`.\n\n```\n24/04/28 18:03:31 [Executor task launch worker for task 0.0 in stage 5.0 (TID 8)] WARN SparkShuffleManager: [getWriter] handle classloader: org.apache.spark.util.ChildFirstURLClassLoader, CelebornShuffleHandle classloader: org.apache.spark.util.ChildFirstURLClassLoader\n```\n\nTherefore, `SparkShuffleManager` should print warning log to remind for `spark.executor.userClassPathFirst=true` with `ShuffleManager` defined in user jar.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManual test.\n\nCloses #2482 from SteNicholas/CELEBORN-1402.\n\nAuthored-by: SteNicholas \nSigned-off-by: SteNicholas ","shortMessageHtmlLink":"[CELEBORN-1402] SparkShuffleManager print warning log for spark.execu…"}},{"before":"8875f20e727d107a6c65556935029ac0283236be","after":"5ea83f39bb9d13cca1976afc357a1476f5c0bf2e","ref":"refs/heads/main","pushedAt":"2024-05-16T12:16:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pan3793","name":"Cheng Pan","path":"/pan3793","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/26535726?s=80&v=4"},"commit":{"message":"[CELEBORN-1416] Add CI for helm charts lint and test\n\n### What changes were proposed in this pull request?\n\n- Move the CI `tests/kubernetes-it/docker/helm/values.yaml` to `charts/celeborn/ci/values.yaml`, as this is a common convention, for example [prometheus-community/helm-charts](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus/ci).\n- Add GitHub CI workflow to run lint and unit tests against helm charts.\n- Add `.helmignore` file to specify patterns for files and directories that should be ignored when packaging the chart.\n- Bump `actions/setup-helm` to `v4.2.0`\n- Bump `actions/setup-python` to `v5`\n- Bump `actions/setup-java` to `v4`\n- Bump `actions/checkout` to `v4`\n\n### Why are the changes needed?\n\n- CI/CD\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nLocal test.\n\nCloses #2513 from ChenYi015/workflow/helm-charts.\n\nAuthored-by: Yi Chen \nSigned-off-by: Cheng Pan ","shortMessageHtmlLink":"[CELEBORN-1416] Add CI for helm charts lint and test"}},{"before":"b8647947c1552a95d3aeb97e336430fdab1c9136","after":"8875f20e727d107a6c65556935029ac0283236be","ref":"refs/heads/main","pushedAt":"2024-05-16T11:47:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"waitinfuture","name":"Keyong Zhou","path":"/waitinfuture","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/948245?s=80&v=4"},"commit":{"message":"[CELEBORN-1361] MaxInFlightPerWorker should use the value provided by PushStrategy\n\n### What changes were proposed in this pull request?\nThe data push thread should first send requests to workers that are not under pressure.\nUse PushStrategy's `currentMaxReqsInFlight` to better filter requests.\n\n### Why are the changes needed?\nPrevent blocking other requests\n\n### Does this PR introduce _any_ user-facing change?\n\n### How was this patch tested?\n\nCloses #2432 from mcdull-zhang/CELEBORN-1361.\n\nAuthored-by: mcdull-zhang \nSigned-off-by: zky.zhoukeyong ","shortMessageHtmlLink":"[CELEBORN-1361] MaxInFlightPerWorker should use the value provided by…"}},{"before":"8b7a2dac573e47f090a96a44e61403977101cded","after":"b8647947c1552a95d3aeb97e336430fdab1c9136","ref":"refs/heads/main","pushedAt":"2024-05-16T09:57:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1430] TransportClientFactory should check whether handler is null when creating client\n\n### What changes were proposed in this pull request?\n\n`TransportClientFactory` checks whether `handler` is null when creating client.\n\n### Why are the changes needed?\n\nThere is a case that `cachedClient.isActive()` may return true and may return false when checked for the second time when another thread is closing the channel, which causes that the `handler` may be null. Therefore, `TransportClientFactory` should check whether handler is null when creating client.\n\nBackport https://github.com/apache/spark/pull/46506.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nGA.\n\nCloses #2517 from SteNicholas/CELEBORN-1430.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1430] TransportClientFactory should check whether handler i…"}},{"before":"90db37b73346cb067cc08bbd710705d7fd6d6389","after":"8b7a2dac573e47f090a96a44e61403977101cded","ref":"refs/heads/main","pushedAt":"2024-05-16T07:30:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"FMX","name":"Ethan Feng","path":"/FMX","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4150993?s=80&v=4"},"commit":{"message":"[CELEBORN-1428] WrappedRpcResponseCallback should stop timer of PrimaryPushDataTime and ReplicaPushDataTime for failure\n\n### What changes were proposed in this pull request?\n\n`WrappedRpcResponseCallback` stops timer of `PrimaryPushDataTime` and `ReplicaPushDataTime` for failure.\n\n### Why are the changes needed?\n\n`WrappedRpcResponseCallback` does not stop timer of `PrimaryPushDataTime` and `ReplicaPushDataTime` for failure, which causes that the value of metric `PrimaryPushDataTime` and `ReplicaPushDataTime` is incorrect when failing to push data.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nGA.\n\nCloses #2514 from SteNicholas/CELEBORN-1428.\n\nAuthored-by: SteNicholas \nSigned-off-by: mingji ","shortMessageHtmlLink":"[CELEBORN-1428] WrappedRpcResponseCallback should stop timer of Prima…"}},{"before":"cbaef742efec136cf4ed550223ad6d2d72f9301d","after":"90db37b73346cb067cc08bbd710705d7fd6d6389","ref":"refs/heads/main","pushedAt":"2024-05-16T01:43:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"waitinfuture","name":"Keyong Zhou","path":"/waitinfuture","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/948245?s=80&v=4"},"commit":{"message":"[CELEBORN-1422] Remove tmpRecords array when collecting written count metrics\n\n### What changes were proposed in this pull request?\nFor spark3 client, use a long variable to help to count written records instead of a `tmpRecords` array.\n\n### Why are the changes needed?\nThere is no need to use a array for spark3.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nCluster test. Both shuffle writer count records correctly.\n\nCloses #2508 from onebox-li/remove_tmpRecords.\n\nAuthored-by: onebox-li \nSigned-off-by: zky.zhoukeyong ","shortMessageHtmlLink":"[CELEBORN-1422] Remove tmpRecords array when collecting written count…"}},{"before":"e66d509a956a80af83cb0eba9271a4f6a458da1f","after":"cbaef742efec136cf4ed550223ad6d2d72f9301d","ref":"refs/heads/main","pushedAt":"2024-05-16T01:33:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"waitinfuture","name":"Keyong Zhou","path":"/waitinfuture","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/948245?s=80&v=4"},"commit":{"message":"[CELEBORN-1424] Fix getChunk NPE when enable local read\n\n### What changes were proposed in this pull request?\nWhen build the open stream request, additionally check whether the host is equivalent to judge whether to read locally.\n\n### Why are the changes needed?\nWhen local read is enabled, batch open stream forgot to determine whether the hosts are equal, causing the worker to think it is a local read situation(stream state's `buffers` is null). When this is actually a remote read, NPE will be thrown during getChunk as below stack, and then a remote read retry will occur.\n```\n2024/05/14 11:07:43,933 WARN [data-client-5-7] TransportResponseHandler: Receive ChunkFetchFailure, errorMsg java.lang.NullPointerException\n\tat org.apache.celeborn.service.deploy.worker.storage.ChunkStreamManager.getChunk(ChunkStreamManager.java:85)\n\tat org.apache.celeborn.service.deploy.worker.FetchHandler.handleChunkFetchRequest(FetchHandler.scala:503)\n\tat org.apache.celeborn.service.deploy.worker.FetchHandler.handleRpcRequest(FetchHandler.scala:181)\n\tat org.apache.celeborn.service.deploy.worker.FetchHandler.receive(FetchHandler.scala:101)\n\tat org.apache.celeborn.common.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:96)\n\tat org.apache.celeborn.common.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:84)\n\tat org.apache.celeborn.common.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:156)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:289)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat org.apache.celeborn.common.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:74)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.lang.Thread.run(Thread.java:748)\n```\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nManual test, and exception is gone.\n\nCloses #2510 from onebox-li/fix-local-read-npe.\n\nAuthored-by: onebox-li \nSigned-off-by: zky.zhoukeyong ","shortMessageHtmlLink":"[CELEBORN-1424] Fix getChunk NPE when enable local read"}},{"before":"c20536e5c53cb9d47069b36d159d92fccb3fb783","after":"e66d509a956a80af83cb0eba9271a4f6a458da1f","ref":"refs/heads/main","pushedAt":"2024-05-15T11:18:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pan3793","name":"Cheng Pan","path":"/pan3793","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/26535726?s=80&v=4"},"commit":{"message":"[CELEBORN-1369][FOLLOWUP] Improve docs for shuffle fallback policy\n\n### What changes were proposed in this pull request?\n\nImprove docs for shuffle fallback policy\n\nRename a configuration\n\n```patch\n- celeborn.client.spark.shuffle.forceFallback.numPartitionsThreshold\n+ celeborn.client.spark.shuffle.fallback.numPartitionsThreshold\n````\n\n### Why are the changes needed?\n\nCanonicalize the words to \"spark built-in shuffle implementation\" everywhere.\n\nAnd `...forceFallback...` is confusing, use `...fallback...` with explicit docs instead.\n\n### Does this PR introduce _any_ user-facing change?\n\nDeprecate a configuration but still effective.\n\n### How was this patch tested?\n\nPass CI.\n\nCloses #2494 from pan3793/CELEBORN-1369-followup.\n\nLead-authored-by: Cheng Pan \nCo-authored-by: Fu Chen \nSigned-off-by: Cheng Pan ","shortMessageHtmlLink":"[CELEBORN-1369][FOLLOWUP] Improve docs for shuffle fallback policy"}},{"before":"6548759243813b878f2fccccf8cc3fd822019a59","after":"c20536e5c53cb9d47069b36d159d92fccb3fb783","ref":"refs/heads/main","pushedAt":"2024-05-15T11:17:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pan3793","name":"Cheng Pan","path":"/pan3793","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/26535726?s=80&v=4"},"commit":{"message":"[CELEBORN-1425][HELM] Add helm chart unit tests to ensure manifests are rendered as expected\n\n### What changes were proposed in this pull request?\n\nAdd helm chart unit tests.\n\n### Why are the changes needed?\n\nUnit tests can make resource manifests are rendered as expected with various configurations.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nDetailed information about how to run helm chart unit tests can be found here [helm-unittest/helm-unittest](https://github.com/helm-unittest/helm-unittest). First, you need to install helm unit test plugin:\n\n```shell\nhelm plugin install https://github.com/helm-unittest/helm-unittest.git\n```\n\nThen, run helm chart unitt tests as follows:\n\n```shell\n$ helm unittest charts/celeborn --file \"tests/**/*_test.yaml\" --strict --debug\nload_plugins.go:110: [info] file (/Users/chenyi/Library/helm/plugins/helm-acr/completion.yaml) not provided by plugin. No plugin auto-completion possible\n\n### Chart [ celeborn ] charts/celeborn\n\n PASS Test Celeborn configmap charts/celeborn/tests/configmap_test.yaml\n PASS Test Celeborn master pod monitor charts/celeborn/tests/master/podmonitor_test.yaml\n PASS Test Celeborn master priority class charts/celeborn/tests/master/priorityclass_test.yaml\n PASS Test Celeborn master service charts/celeborn/tests/master/service_test.yaml\n PASS Test Celeborn master statefulset charts/celeborn/tests/master/statefulset_test.yaml\n PASS Test Celeborn worker pod monitor charts/celeborn/tests/worker/podmonitor_test.yaml\n PASS Test Celeborn worker priority class charts/celeborn/tests/worker/priorityclass_test.yaml\n PASS Test Celeborn worker service charts/celeborn/tests/worker/service_test.yaml\n PASS Test Celeborn worker statefulset charts/celeborn/tests/worker/statefulset_test.yaml\n\nCharts: 1 passed, 1 total\nTest Suites: 9 passed, 9 total\nTests: 48 passed, 48 total\nSnapshot: 0 passed, 0 total\nTime: 183.011375ms\n\n```\n\nCloses #2511 from ChenYi015/helm-unittest.\n\nAuthored-by: Yi Chen \nSigned-off-by: Cheng Pan ","shortMessageHtmlLink":"[CELEBORN-1425][HELM] Add helm chart unit tests to ensure manifests a…"}},{"before":"06dccee770c07a78f1957f3e502da6ad3b48211f","after":"6548759243813b878f2fccccf8cc3fd822019a59","ref":"refs/heads/main","pushedAt":"2024-05-15T09:16:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SteNicholas","name":"Nicholas Jiang","path":"/SteNicholas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10048174?s=80&v=4"},"commit":{"message":"[CELEBORN-1414] PartitionFilesSorter resolve DiskFileInfo without sorting lock\n\n### What changes were proposed in this pull request?\n\n`PartitionFilesSorter` calls `resolve` of `DiskFileInfo` without `sorting` lock to reduce the lock scope for performance improvement of `getSortedFileInfo`.\n\n### Why are the changes needed?\n\n`PartitionFilesSorter#resolve` is thread safe. Therefore, `PartitionFilesSorter` invokes `resolve` with `sorting` lock at present, which does not need to lock the resolving of `DiskFileInfo`. `PartitionFilesSorter` could resolve `DiskFileInfo` without `sorting` lock to improve performance of `getSortedFileInfo`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nGA.\n\nCloses #2498 from SteNicholas/CELEBORN-1414.\n\nAuthored-by: SteNicholas \nSigned-off-by: SteNicholas ","shortMessageHtmlLink":"[CELEBORN-1414] PartitionFilesSorter resolve DiskFileInfo without sor…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEV9LXgQA","startCursor":null,"endCursor":null}},"title":"Activity · apache/celeborn"}