Releases: apache/orc
v2.0.1
Milestone
Branch
Improvements (tools)
- ORC-1644: Add
merge
tool to merge multiple ORC files into a single ORC file - ORC-1647: Tips for supporting ORC in the
convert
command - ORC-1667: Add
check
tool to check the index of the specified column
Bug Fix
- ORC-1646: Close the reader when reading the schema with the
convert
command - ORC-1654: [C++] Count up EvaluatedRowGroupCount correctly
- ORC-1684: [C++] Find tzdb without TZDIR when in conda-environments
- ORC-1688: [C++] Do not access TZDB if there is no timestamp type
- ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark
Task
- ORC-1649:[C++][Conan] Add 2.0.0 to conan recipe and update release guide
- ORC-1669: [C++] Deprecate HDFS support
- ORC-1686: [C++] Avoid using std::filesystem
Test
- ORC-1648: Add test to convert ORC in the
convert
command - ORC-1663: [C++] Enable TestTimezone.testMissingTZDB on Windows
- ORC-1672: Remove test packages
o.a.o.tools.check
- ORC-1673: Remove test packages
o.a.o.tools.[count|merge|sizes]
- ORC-1676: Use Hive 4.0.0 in benchmark
- ORC-1681: Remove redundant import statement in tests to fix checkstyle failures
- ORC-1699: Fix SparkBenchmark in Parquet format according to SPARK-40918
- ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
- ORC-1707: Fix
sun.util.calendar
IllegalAccessException when SparkBenchmark runs on JDK17 - ORC-1708: Support data/compress options in Hive benchmark
Build and Dependency Changes
- ORC-1670: Upgrade
zstd-jni
to 1.5.6-1 - ORC-1679: Bump
zstd-jni
to 1.5.6-2 - ORC-1695: Upgrade gson to 2.10.1
- ORC-1698: Upgrade
commons-cli
to 1.7.0 - ORC-1705: Upgrade
zstd-jni
to 1.5.6-3 - ORC-1714: Bump commons-csv to 1.11.0
- ORC-1715: Bump org.objenesis:objenesis to 3.3
Documentation
- ORC-1668: Add
merge
command to Java tools documentation
v1.8.7
Milestone
Changelog
Bug
ORC-1528: Fix readBytes potential overflow in RecordReaderUtils.ChunkReader#create
ORC-1602: [C++] limit compression block size
Test
ORC-1556: Add Rocky Linux 9 Docker Test
ORC-1557: Add GitHub Action CI for Docker Test
ORC-1560: Remove Java11 and clang variants from docker/os-list.txt
in branch-1.8
ORC-1562: Bump guava
to 33.0.0-jre
ORC-1578: Fix SparkBenchmark
on sales data according to SPARK-40918
ORC-1621: Switch to oraclelinux9
from rocky9
Documentation
ORC-1536: Remove hive-storage-api
link from maven-javadoc-plugin
ORC-1563: Fix orc.bloom.filter.fpp
default value and orc.compress
notes of Spark and Hive config docs
v1.9.3
Milestone
Changelog
BugFix
- ORC-634 Fix the json output for double NaN and infinite
- ORC-1553 Reading information from Row group, where there are 0 records of SArg column
- ORC-1563 Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
- ORC-1578 Fix SparkBenchmark according to SPARK-40918
- ORC-1586 Fix IllegalAccessError when SparkBenchmark runs on JDK17
- ORC-1602 [C++] limit compression block size
- ORC-1607 Fix
testDoubleNaNAndInfinite
to useTestFileDump.checkOutput
- ORC-1609 Fix the compilation problem of TestJsonFileDump in branch 1.9
Test
- ORC-1556 Add
Rocky Linux 9
Docker Test - ORC-1557 Add GitHub Action CI for
Docker Test
- ORC-1559 Remove Java11 and clang variants from
docker/os-list.txt
frombranch-1.9
Task
- ORC-1532 Upgrade
opencsv
to 5.9 - ORC-1536 Remove
hive-storage-api
link frommaven-javadoc-plugin
- ORC-1576 Upgrade spark.jackson.version to 2.15.2 in bench module
- ORC-1591 Lower log level from INFO to DEBUG in *ReaderImpl/WriterImpl/PhysicalFsWriter
- ORC-1592 Suppress KeyProvider missing log
- ORC-1616 Upgrade
aircompressor
to 0.26 - ORC-1618 Disable building tests for snappy
Documentation:
- ORC-1535 Remove generated Java docs from source tree
v2.0.0
Milestone
Branch
This is a new major release which we cannot provide a changelog.
Summary of notable changes
ORC-1547: Spin-off ORC Format
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1507: Support Java 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1577: Use ZSTD as the default compression
ORC-1430: Use Hadoop 3.3.5 shaded clients
ORC-1456: Update Hadoop to 3.3.6
ORC-1251: Use Hadoop Vectored IO
ORC-1463: Support brotli codec
ORC-1100: Support vcpkg
ORC-1620: Add Apple Silicon Test Coverage
New Feature
ORC-998: Refactor compression output buffer within OutStream for better portability
ORC-1088: Suport ZSTD_JNI and columnn compress to set compression level
ORC-1100: Support vcpkg
ORC-1251: Use Hadoop Vectored IO
ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
ORC-1440: Check for protobuf config based module
ORC-1463: Support brotli codec
ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1531: Create orc-format module and repo
ORC-1545: Use orc-format 1.0.0-SNAPSHOT
ORC-1546: Use orc-format 1.0.0-alpha
ORC-1547: Spin-off ORC Format
ORC-1551: Use orc-format 1.0.0-beta
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1585: [C++] Add orc-format_ep as a dependency of orc
Improvement
ORC-1459: Mark DataBuffer::size() and DataBuffer::capacity() as const
ORC-1460: specification: Clarify how dictionary entries are sorted
ORC-1461: Mark Int128::getHighBits() and Int128::getLowBits() as const
ORC-1472: Replace deprecated method in TestMurmur3.java
ORC-1479: Enhance example usage message to use Uber jar
ORC-1481: [C++] Better error message when TZDB is unavailable
ORC-1504: Add lower bound check in get API for DynamicIntArray
ORC-1506: Replacing deprecated valueOf() with recommended forNumber()
ORC-1509: Auto grant contributor role to first-time contributors
ORC-1520: Remove JDK 8 settings from pom
ORC-1567: Add the -ignoreExtension
configuration to the sizes
and count
commands of orc-tools
ORC-1570: Add supportVectoredIO
API to HadoopShimsCurrent
and use it
ORC-1571: Supports displaying raw data size in the meta command of orc-tools
ORC-1577: Use ZSTD as the default compression
ORC-1580: Change default DataBuffer constructor to use reserve instead of resize
ORC-1595: Add a short-cut to skip tiny inputs for ZstdCodec.compress
ORC-1596: Remove redundant Zstd.isError
JNI usage
ORC-1597: Set bloom filter fpp to 1%
ORC-1600: Reduce getStaticMemoryManager sync block in OrcFile
ORC-1601: Reduce get HadoopShims sync block in HadoopShimsFactory
ORC-1610: Reduce the number of hash computation in CuckooSetBytes
ORC-1613: Zstd decompression supports direct buffer
ORC-1631: Supports summary output in sizes command
ORC-1637: [C++] Port conan recipe from upstream conan center
ORC-1638: Avoid System.exit(0) in count command
ORC-1639: [C++] Reduce unnecessary compiler flags in CMake
ORC-1641: Remove sourceFileExcludes
from maven-javadoc-plugin
ORC-1642: Avoid System.exit(0)
in scan
command
ORC-1593: Set orc.compression.zstd.level to 3 by default
Bug Fix
ORC-634: Fix the json output for double NaN and infinite
ORC-1455: [C++] Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
ORC-1473: Zero-copy zeroCopyReadRanges and releaseBuffer bugs
ORC-1476: Maven build fail with unsupported platform: protoc-3.17.3-osx-aarch_64.exe
ORC-1480: [C++] Build failed when the BUILD_CPP_ENABLE_METRICS is ON
ORC-1500: [C++] The partition field does not support English special characters
ORC-1528: When using the orc.min.disk.seek.size configuration to read extremely large ORC files, a java.nio.BufferOverflowException may occur.
ORC-1553: Reading information from Row group, where there are 0 records of SArg column
ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
ORC-1568: Use readDiskRanges
if orc.use.zerocopy
is enabled
ORC-1575: Use ASF Archive URL instead Download URL
ORC-1578: Fix SparkBenchmark according to SPARK-40918
ORC-1588: Fix incorrect Decimal assert in LeafFilterFactory
ORC-1602: [C++] limit compression block size
Task
ORC-1422: Setting version to 2.0.0-SNAPSHOT
ORC-1434: Remove org.apache.hadoop
from dependabot.yml
ORC-1484: Use JIRA_ACCESS_TOKEN in merge_orc_pr.py
ORC-1485: Enable checkstyle checks for test classes
ORC-1486: Fix checkstyle violations for tests in orc-core module
ORC-1492: Fix checkstyle violations for tests in mapreduce
, tools
, bench
modules
ORC-1496: Use iterator to suggest backporting branches
ORC-1515: Skip publishing orc-example module
ORC-1516: Fix minor typo in comments in IOUtils
ORC-1518: Remove findbugs folders
ORC-1529: Fix minor typos in pom.xml
ORC-1530: Rename variables in RecordReaderUtils.ChunkReader#create
ORC-1535: Remove generated Java docs from source tree
ORC-1536: Remove hive-storage-api
link from maven-javadoc-plugin
ORC-1540: Remove MacOS 11 from GitHub Action CI
ORC-1542: Use Pattern Matching for instanceof
(JEP-394)
ORC-1549: Update libhdfspp.tar.gz
by adding #include <cstdint>
ORC-1569: Remove HadoopShimsPre2_3, HadoopShimsPre2_6, HadoopShimsPre2_7 classes
ORC-1579: Add ASF Generative Tooling Guidance
to PR template
ORC-1591: Lower log level from INFO ...
v1.9.2
Milestone
Changelog
Bug
ORC-1475: [C++] Fix the failure of UT when char is unsigned
ORC-1480: [C++] Fix build break w/ BUILD_CPP_ENABLE_METRICS=ON
ORC-1482: Adaptation to read ORC files created by CUDF
ORC-1489: Assign a writer id to CUDF
ORC-1525: Fix bad read in RleDecoderV2::readByte
Test
ORC-1431: Use parquet to 1.13.1 in bench module
ORC-1454: Update Spark to 3.4.1
ORC-1487: Enable checkstyle on src/test with checkstyle-suppressions.xml
ORC-1498: Add Debian 12
Docker test
ORC-1502: Upgrade Maven to 3.9.4
ORC-1505: Upgrade Spark to 3.5.0
ORC-1511: Bump Avro to 1.11.3 in bench module
ORC-1513: Upgrade snappy-java to 1.1.10.4 in bench module
ORC-1517: Bump snappy-java to 1.1.10.5 in bench module
Task
ORC-1497: Bump maven-enforcer-plugin
to 3.4.0
ORC-1499: Add MacOS 13 and 14 to building.md
ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
ORC-1518: Remove findbugs folders
Documentation
ORC-1503: Updated README.md with Maven version 3.9.4
v1.8.6
v1.7.10
v1.8.5
v1.9.1
Milestone
Changelog
Bug
- ORC-1455 Fix build failure on non-x86 with unused macro in
CpuInfoUtil.cc
- ORC-1457 Fix ambiguous overload of
Type::createRowBatch
- ORC-1462 Bump
aircompressor
to 0.25 to fix JDK-8081450
Test
v1.9.0
Milestone
Changelog
New Feature and Notable Changes
- ORC-961 Expose metrics of the reader
- ORC-1167 Support orc.row.batch.size configuration
- ORC-1252 Expose io metrics for write operation
- ORC-1301 Enforce C++17
- ORC-1310 allowlist Support for plugin filter
- ORC-1356 Use Intel AVX-512 instructions to accelerate the Rle-bit-packing decode
- ORC-1385 Support schema evolution from numeric to numeric
- ORC-1386 Support schema evolution from primitive to string group/decimal/timestamp
Improvement
- ORC-827 Utilize Array copyOf
- ORC-1170 Optimize the RowReader::seekToRow function
- ORC-1232 Disable metrics collector by default
- ORC-1278 Update Readme.md cmake to 3.12
- ORC-1279 Update cmake version
- ORC-1286 Replace DataBuffer with BlockBuffer in the BufferedOutputStream
- ORC-1298 Support dedicated ColumnVectorBatch of numeric types
- ORC-1302 Upgrade Github workflow to build on Windows
- ORC-1306 Fixed indented code style for Java modules
- ORC-1307 Add coding style enforcement
- ORC-1314 Remove macros defined before C++11
- ORC-1347 Use make_unique and make_shared when creating unique_ptr and shared_ptr
- ORC-1348 TimezoneImpl constructor should pass std::vector<> & instead of std::vector<>
- ORC-1349 Remove useless bufStream definition
- ORC-1352 Remove ORC_[NOEXCEPT|NULLPTR|OVERRIDE|UNIQUE_PTR] macro usages
- ORC-1355 Writer::addUserMetadata change parameter to reference
- ORC-1373 Add log when DynamicByteArray length overflow
- ORC-1401 Allow writing an intermediate footer
- ORC-1421 Use PyArrow 12.0.0 in document
Bug
- ORC-1225 Bump maven-assembly-plugin to 3.4.2
- ORC-1266 DecimalColumnVector resets the isRepeating flag in the nextVector method
- ORC-1273 Bump opencsv to 5.7.0
- ORC-1297 Bump opencsv to 5.7.1
- ORC-1304 throw ParseError when using SearchArgument with nested struct
- ORC-1315 Byte to integer conversions fail on platforms with unsigned char type
- ORC-1320 Fix build break of C++ code on docker images
- ORC-1363 Upgrade
zookeeper
to 3.8.1 - ORC-1368 Bump commons-csv to 1.10.0
- ORC-1398 Bump
aircompressor
to 0.24 - ORC-1399 Fix boolean type with useTightNumericVector enabled
- ORC-1433 Fix comment in the Vector.hh
- ORC-1447 Fix a bug in CpuInfoUtil.cc to support ARM platform
- ORC-1449 Add
-Wno-unused-macros
for Clang 14.0 - ORC-1450 Stop enforcing override keyword
- ORC-1453 Fix
fall-through
warning cases
Task
- ORC-1164 Setting version to 1.9.0-SNAPSHOT
- ORC-1218 Bump apache pom to 27
- ORC-1219 Remove redundant
toString
- ORC-1237 Remove a wrong image link to
article-footer.png
- ORC-1239 Upgrade maven-shade-plugin to 3.3.0
- ORC-1256 Publish test-jar to maven central
- ORC-1259 Bump
slf4j
to 2.0.0 - ORC-1269 Remove FindBugs
- ORC-1270 Move opencsv dependency to the tools module.
- ORC-1274 Add a checkstyle rule to ban starting LAND and LOR
- ORC-1275 Bump maven-jar-plugin to 3.3.0
- ORC-1276 Bump
slf4j
to 2.0.1 - ORC-1277 Bump maven-shade-plugin to 3.4.0
- ORC-1284 Add permissions to GitHub Action labeler
- ORC-1296 Bump reproducible-build-maven-plugin to 0.16
- ORC-1311 Bump maven-shade-plugin to 3.4.1
- ORC-1316 Bump slf4j.version to 2.0.4
- ORC-1334 Bump slf4j.version to 2.0.6
- ORC-1335 Bump netty-all to 4.1.86.Final
- ORC-1351 Update PR Labeler definition
- ORC-1358 Use spotless to format pom files
- ORC-1371 Remove unsupported SLF4J bindings from classpath
- ORC-1372 Bump
zstd
to v1.5.4 - ORC-1375 Cancel old running ci tasks when a pr has a new commit
- ORC-1377 Enforce override keyword
- ORC-1383 Upgrade
aircompressor
to 0.22 - ORC-1395 Enforce license check
- ORC-1396 Bump
slf4j
to 2.0.7 - ORC-1410 Bump
zstd
to v1.5.5 - ORC-1411 Remove Ubuntu18.04 from docker-based tests
- ORC-1419 Bump
protobuf-java
to 3.22.3 - ORC-1428 Setup GitHub Action CI on
branch-1.9
- ORC-1443 Enforce Java version
- ORC-1444 Enforce JDK Bytecode version
- ORC-1446 Publish snapshot from branch-1.9
Test
- ORC-1231 Update supported OS list in
building.md
- ORC-1233 Bump
junit
to 5.9.0 - ORC-1234 Upgrade
objenesis
to 3.2 in Spark benchmark - ORC-1235 Bump
avro
to 1.11.1 - ORC-1240 Update site README to use apache/orc-dev
- ORC-1241 Use
apache/orc-dev
DockerHub repository in Docker tests - ORC-1250 Bump
mockito
to 4.7.0 - ORC-1254 Add
spotbugs
check - ORC-1258 Bump
byte-buddy
to 1.12.14 - ORC-1262 Bump
maven-checkstyle-plugin
to 3.2.0 - ORC-1265 Upgrade
spotbugs
to 4.7.2 - ORC-1267 Bump
mockito
to 4.8.0 - ORC-1271 Bump
spotbugs-maven-plugin
to 4.7.2.0 - ORC-1272 Bump
byte-buddy
to 1.12.16 - ORC-1300 Update Spark to 3.3.1 and its dependencies
- ORC-1303 Upgrade
GoogleTest
to 1.12.1 - ORC-1318 Upgrade mockito.version to 4.9.0
- ORC-1319 Upgrade byte-buddy to 1.12.19
- ORC-1321 Bump checkstyle to 10.5.0
- ORC-1322 Upgrade centos7 docker image to use gcc9
- ORC-1324 Use Java 19 instead of 18 in GHA
- ORC-1333 Bump
mockito
to 4.10.0 - ORC-1341 Bump
mockito
to 4.11.0 - ORC-1353 Bump
byte-buddy
to 1.12.21 - ORC-1359 Bump
byte-buddy
to 1.12.22 - ORC-1366 Bump
checkstyle
to 10.7.0 - ORC-1367 Bump
maven-enforcer-plugin
to 3.2.1 - ORC-1369 Bump
byte-buddy
to 1.12.23 - ORC-1370 Bump
snappy-java
to 1.1.9.1 - ORC-1374 Update Spark to 3.3.2
- ORC-1378 Add slf4j impl to avoid warning message in example module
- ORC-1379 Upgrade
spotbugs
to 4.7.3.2 - ORC-1380 Upgrade
checkstyle
to 10.8.0 - ORC-1394 Bump
maven-assembly-plugin
to 3.5.0 - ORC-1397 Bump
checkstyle
to 10.9.2 - ORC-1405 Bump
spotbugs-maven-plugin
to 4.7.3.4 - ORC-1406 Bump
maven-enforcer-plugin
to 3.3.0 - ORC-1408 Add
testVectorBatchHasNull
test case and comment - ORC-1415 Add Java 20 to GitHub Action CI
- ORC-1417 Bump
checkstyle
to 10.10.0 - ORC-1418 Bump
junit
to 5.9.3 - ORC-1426 Use Java
21-ea
instead of 20 in GitHub Action - ORC-1435 Bump
maven-checkstyle-plugin
to 3.3.0 - ORC-1436 Bump
snappy-java
to 1.1.10.0 - ORC-1452 Use the latest OS versions in variant tests