[Website]: Add nanoarrow 0.1 release post (#288)

Co-authored-by: David Li <li.davidm96@gmail.com>
apache · Mar 7, 2023 · a2c45ab · a2c45ab
1 parent 8f870fc
commit a2c45ab
Showing 1 changed file with 231 additions and 0 deletions.
diff --git a/_posts/2023-03-01-nanoarrow-0.1.0-release.md b/_posts/2023-03-01-nanoarrow-0.1.0-release.md
@@ -0,0 +1,231 @@
+---
+layout: post
+title: "Apache Arrow nanoarrow 0.1 Release"
+date: "2023-03-07 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 0.1.0 release of
+Apache Arrow nanoarrow. This initial release covers 31 resolved issues from
+6 contributors.
+
+## Release Highlights
+
+This initial release includes the following:
+
+- A [C library](#using-nanoarrow-in-c) bundled as two files (nanoarrow.c
+  and nanoarrow.h).
+- An [R package](#using-nanoarrow-in-c-r-and-python) providing bindings for users
+  of the R programming language.
+
+See the
+[Changelog](https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.1.0/CHANGELOG.md)
+for a detailed list of contributions leading up to the initial release.
+
+## Why nanoarrow?
+
+The adoption of the
+[Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html)
+and the [Arrow C Stream Interface](https://arrow.apache.org/docs/format/CStreamInterface.html)
+since their
+[introduction](https://arrow.apache.org/blog/2020/05/03/introducing-arrow-c-data-interface/)
+have been impressive and enthusiastic: not only have Arrow language bindings
+adopted the standard to pass data among themselves, a growing number of
+high-profile libraries like
+[GDAL](https://gdal.org/development/rfc/rfc86_column_oriented_api.html) and
+[DuckDB](https://duckdb.org/2021/12/03/duck-arrow.html) use the standard to
+improve performance and provide an ABI-stable interface to tabular input and output.
+
+GDAL and DuckDB are fortunate to have hard-working and forward-thinking maintainers
+that were motivated to provide support for the Arrow C Data and Stream interfaces
+even though the code to do so required an intimate knowledge of both the interface
+and the columnar specification on which it is based.
+
+The vision of [nanoarrow](https://github.com/apache/arrow-nanoarrow)
+is that it should be trivial for a library or application to implement an Arrow-based
+interface: if a library consumes or produces tabular data, Arrow should be the
+first place developers look. Developers shouldn't have to be familiar with the
+details of the columnar specification---nor should they have to take on any
+build-time dependencies---to get started.
+
+The [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/docs/format/ADBC.html)
+specification is a good example of such a project, and provided a strong
+motivator for the development of nanoarrow: at the heart of ADBC is the
+idea of a core "driver manager" and database-specific drivers that are distributed
+as independent C/C++/Python/R/Java/Go projects. At least in R and Python,
+embedding an existing Arrow implementation (e.g., Arrow C++) is challenging
+in the context of multiple packages intended to be loaded into the same process.
+As of this writing, ADBC includes nanoarrow-based SQLite and PostgreSQL drivers
+and a nanoarrow-based validation suite for drivers.
+
+## Using nanoarrow in C
+
+The nanoarrow C library is distributed as
+[two files (nanoarrow.h and nanoarrow.c)](https://github.com/apache/arrow-nanoarrow/tree/main/dist)
+that can be copied and vendored into an existing code base. This results in
+a static library of about 50  KB and builds in less than a second. Some features
+that nanoarrow provides are:
+
+* [Helpers to create types, schemas, and metadata](https://apache.github.io/arrow-nanoarrow/dev/c.html#creating-schemas)
+* [Growable buffers](https://apache.github.io/arrow-nanoarrow/dev/c.html#owning-growable-buffers),
+  including the option for custom allocators/deallocators.
+* [Bitmap (i.e., bitpacked boolean) utilities](https://apache.github.io/arrow-nanoarrow/dev/c.html#bitmap-utilities)
+* An [API for building arrays from buffers](https://apache.github.io/arrow-nanoarrow/dev/c.html#creating-arrays)
+* An [API for building arrays element-wise](https://apache.github.io/arrow-nanoarrow/dev/c.html#creating-arrays)
+* An [API to extract elements element-wise](https://apache.github.io/arrow-nanoarrow/dev/c.html#reading-arrays)
+  from an existing array.
+
+For example, one can build an integer array element-wise:
+
+```c
+#include "nanoarrow.h"
+
+int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) {
+  struct ArrowError error;
+  array_out->release = NULL;
+  schema_out->release = NULL;
+
+  NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32));
+
+  NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out));
+  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1));
+  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2));
+  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3));
+  NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuilding(array_out, &error));
+
+  NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema_out, NANOARROW_TYPE_INT32));
+
+  return NANOARROW_OK;
+}
+```
+
+Similarly, one can extract elements from an array:
+
+```c
+#include <stdio.h>
+#include "nanoarrow.h"
+
+int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) {
+  struct ArrowError error;
+  struct ArrowArrayView array_view;
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error));
+
+  if (array_view.storage_type != NANOARROW_TYPE_INT32) {
+    printf("Array has storage that is not int32\n");
+  }
+
+  int result = ArrowArrayViewSetArray(&array_view, array, &error);
+  if (result != NANOARROW_OK) {
+    ArrowArrayViewReset(&array_view);
+    return result;
+  }
+
+  for (int64_t i = 0; i < array->length; i++) {
+    printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i));
+  }
+
+  ArrowArrayViewReset(&array_view);
+  return NANOARROW_OK;
+}
+```
+
+## Using nanoarrow in C++, R, and Python
+
+Recognizing that many projects for which nanoarrow may be useful will have
+access a higher-level runtime than C, there are experiments to provide
+these users with a minimal set of useful tools.
+
+For C++ projects, an experimental
+["nanoarrow.hpp"](https://apache.github.io/arrow-nanoarrow/dev/cpp.html)
+interface provides `unique_ptr`-like wrappers for nanoarrow C objects to
+reduce the verbosity of using the nanoarrow API. For example, the previous
+`print_simple_array()` implementation would collapse to:
+
+```cpp
+#include <stdio.h>
+#include "nanoarrow.hpp"
+
+int print_simple_array2(struct ArrowArray* array, struct ArrowSchema* schema) {
+  struct ArrowError error;
+  nanoarrow::UniqueArrayView array_view;
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(array_view.get(), schema, &error));
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewSetArray(array_view.get(), array, &error));
+  for (int64_t i = 0; i < array->length; i++) {
+    printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(array_view.get(), i));
+  }
+  return NANOARROW_OK;
+}
+```
+
+For R packages, experimental
+[R bindings](https://apache.github.io/arrow-nanoarrow/dev/r/index.html) provide
+a limited set of conversions between R vectors and Arrow arrays such that
+R bindings for a library with an Arrow-based interface do not need to provide
+this behaviour themselves. Additional features include printing and validating
+the content of the C structures at the heart of the C Data and C Stream
+interfaces to facilitate the development of bindings to Arrow-based libraries.
+
+```r
+# install.packages("remotes")
+remotes::install_github("apache/arrow-nanoarrow/r", build = FALSE)
+library(nanoarrow)
+
+as_nanoarrow_array(1:5)
+#> <nanoarrow_array int32[5]>
+#>  $ length    : int 5
+#>  $ null_count: int 0
+#>  $ offset    : int 0
+#>  $ buffers   :List of 2
+#>   ..$ :<nanoarrow_buffer_validity[0 b] at 0x0>
+#>   ..$ :<nanoarrow_buffer_data_int32[20 b] at 0x135d13c28>
+#>  $ dictionary: NULL
+#>  $ children  : list()
+```
+
+A [Python package skeleton](https://github.com/apache/arrow-nanoarrow/tree/main/python)
+exists in the nanoarrow repository and further functionality may be added once
+the C library interface has stabilized.
+
+## Try nanoarrow
+
+For any interested in giving nanoarrow a try, the easiest way to get started is to clone the
+[nanoarrow repository from GitHub](https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.1.0)
+and build/modify the
+[minimal CMake build example](https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.1.0/examples/cmake-minimal).
+For applied usage, one can refer to the
+[ADBC SQLite driver](https://github.com/apache/arrow-adbc/tree/main/c/driver/sqlite)
+and the [ADBC PostgreSQL driver](https://github.com/apache/arrow-adbc/tree/main/c/driver/postgresql).
+
+## Contributors
+
+This initial release consists of contributions from 6 contributors in addition
+to the invaluable advice and support of the Apache Arrow developer mailing list.
+
+```
+$ git shortlog -sn 8339114637919b661c1c8fae6764ceed532c935e..apache-arrow-nanoarrow-0.1.0 | grep -v "GitHub Actions"
+   100  Dewey Dunnington
+     7  David Li
+     2  Dirk Eddelbuettel
+     1  Dane Pitkin
+     1  Jonathan Keane
+     1  Joris Van den Bossche
+```