Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Releases: NVIDIA/thrust

Thrust 2.1.0

08 Mar 22:02
Compare
Choose a tag to compare

New Features

  • #1805: Add default constructors to transform_output_iterator and transform_input_output_iterator. Thanks to Mark Harris (@harrism) for this contribution.
  • #1836: Enable constructions of vectors from std::initializer_list.

Bug Fixes

  • #1768: Fix type conversion warning in the thrust::complex utilities. Thanks to Zishi Wu (@zishiwu123) for this contribution.
  • #1809: Fix some warnings about usage of __host__ functions in __device__ code.
  • #1825: Fix Thrust’s CMake install rules. Thanks to Robert Maynard (@robertmaynard) for this contribution.
  • #1827: Fix thrust::reduce_by_key when using non-default-initializable iterators.
  • #1832: Fix bug in device-side CDP thrust::reduce when using a large number of inputs.

Other Enhancements

  • #1815: Update Thrust’s libcu++ git submodule to version 1.8.1.
  • #1841: Fix invalid code in execution policy documentation example. Thanks to Raphaël Frantz (@Eren121) for this contribution.
  • #1848: Improve error messages when attempting to launch a kernel on a device that is not supported by compiled PTX versions. Thanks to Zahra Khatami (@zkhatami) for this contribution.
  • #1855: Remove usage of deprecated CUDA error codes.

Thrust 2.0.1

08 Mar 21:49
Compare
Choose a tag to compare

Other Enhancements

  • Disable CDP parallelization of device-side invocations of Thrust algorithms on SM90+. The removal of device-side synchronization support in recent architectures makes Thrust’s fork-join model unimplementable on device, so a serial implementation will be used instead. Host-side invocations of Thrust algorithms are not affected.

Thrust 1.17.2

13 Sep 17:03
Compare
Choose a tag to compare

Summary

Thrust 1.17.2 is a minor bugfix release that provides an updated version of CUB.

Thrust 2.0.0

15 Aug 16:53
786e5aa
Compare
Choose a tag to compare

Summary

The Thrust 2.0.0 major release adds a dependency on libcu++ and contains several breaking changes. These include new diagnostics when inspecting device-only lambdas from the host, removal of the cub symlink in the Thrust repository root, and removal of the deprecated THRUST_*_BACKEND macros. It also includes several minor bugfixes and cleanups.

Breaking Changes

  • #1605: Add libcu++ dependency.
    • A suitable version of libcu++ is provided through the ${THRUST_ROOT}/dependencies/libcudacxx/ submodule.
    • Non-cmake users may need to add the libcu++ include path to their builds (-I ${THRUST_ROOT}/dependencies/libcudacxx/include/).
    • The Thrust CMake packages have been updated to add this include path.
  • #1605: The following macros are no longer defined by default. They can be re-enabled by defining THRUST_PROVIDE_LEGACY_ARCH_MACROS. These will be removed completely in a future release.
    • THRUST_IS_HOST_CODE: Replace with NV_IF_TARGET.
    • THRUST_IS_DEVICE_CODE: Replace with NV_IF_TARGET.
    • THRUST_INCLUDE_HOST_CODE: Replace with NV_IF_TARGET.
    • THRUST_INCLUDE_DEVICE_CODE: Replace with NV_IF_TARGET.
    • THRUST_DEVICE_CODE: Replace with NV_IF_TARGET.
  • #1661: Thrust’s CUDA Runtime support macros have been updated to support NV_IF_TARGET. They are now defined consistently across all host/device compilation passes. This should not affect most usages of these macros, but may require changes for some edge cases.
    • THRUST_RUNTIME_FUNCTION: Execution space annotations for functions that invoke CUDA Runtime APIs.
      • Old behavior:
        • RDC enabled: Defined to __host__ __device__
        • RDC not enabled:
          • NVCC host pass: Defined to __host__ __device__
          • NVCC device pass: Defined to __host__
      • New behavior:
        • RDC enabled: Defined to __host__ __device__
        • RDC not enabled: Defined to __host__
    • __THRUST_HAS_CUDART__: No change in behavior, but no longer used in Thrust. Provided for legacy support only. Legacy behavior:
      • RDC enabled: Defined to 1.
      • RDC not enabled:
        • NVCC host pass: Defined to 1.
        • NVCC device pass: Defined to 0.
    • THRUST_RDC_ENABLED: New macro, may be combined with NV_IF_TARGET to replace most usages of __THRUST_HAS_CUDART__. Behavior:
      • RDC enabled: Macro is defined.
      • RDC not enabled: Macro is not defined.
  • #1701: Remove the cub symlink from the root of the Thrust repository.
    • This symlink caused issues in certain build environments (e.g. #1328).
    • Builds that relied on this symlink will need to add the full CUB include path (-I ${THRUST_ROOT}/dependencies/cub).
    • CMake builds that use the Thrust packages via CPM, add_subdirectory, or find_package are not affected.
  • #1760: A compile-time error is now emitted when a __device__-only lambda’s return type is queried from host code (requires libcu++ ≥ 1.9.0).
    • Due to limitations in the CUDA programming model, the result of this query is unreliable, and will silently return an incorrect result. This leads to difficult to debug errors.
    • When using libcu++ 1.9.0, an error will be emitted with information about work-arounds:
      • Use a named function object with a __device__-only implementation of operator().
      • Use a __host__ __device__ lambda.
      • Use cuda::proclaim_return_type (Added in libcu++ 1.9.0)
  • #1761: Removed support for deprecated THRUST_DEVICE_BACKEND and THRUST_HOST_BACKEND macros. The THRUST_DEVICE_SYSTEM and THRUST_HOST_SYSTEM macros should be used instead.

Bug Fixes

  • #1605: Fix some execution space warnings in the allocator library.
  • #1683: Fix bug in iterator_category_to_traversal metafunctions.
  • #1715: Add missing __thrust_exec_check_disable__ annotation to thrust::make_zip_function. Thanks to @mfbalin for this contribution.
  • #1722: Remove CUDA-specific error handler from code that may be executed on non-CUDA backends. Thanks to @dkolsen-pgi for this contribution.
  • #1756: Fix copy_if for output iterators that don’t support copy assignment. Thanks for @mfbalin for this contribution.

Other Enhancements

  • #1605: Removed special case code for unsupported CUDA architectures.
  • #1605: Replace several usages of __CUDA_ARCH__ with <nv/target> to handle host/device code divergence.
  • #1752: Remove a leftover merge conflict from a documentation file. Thanks to @tabedzki for this contribution.

Thrust 1.17.1

15 Aug 16:18
Compare
Choose a tag to compare

Summary

Thrust 1.17.1 is a minor bugfix release that provides an updated version of CUB.

Thrust 1.17.0

09 May 18:06
Compare
Choose a tag to compare

Thrust 1.17.0

Summary

Thrust 1.17.0 is the final minor release of the 1.X series. This release provides GDB pretty-printers for device vectors/references, a new unique_count algorithm, and an easier way to create tagged Thrust iterators. Several documentation fixes are included, which can be found on the new Thrust documentation site at https://nvidia.github.io/thrust. We’ll be migrating existing documentation sources to this new location over the next few months.

New Features

  • #1586: Add new thrust::make_tagged_iterator convenience function. Thanks to @karthikeyann for this contribution.
  • #1619: Add unique_count algorithm. Thanks to @upsj for this contribution.
  • #1631: Add GDB pretty-printers for device vectors/references to scripts/gdb-pretty-printers.py. Thanks to @upsj for this contribution.

Bug Fixes

  • #1671: Fixed reduce_by_key when called with 2^31 elements.

Other Enhancements

  • #1512: Use CUB to implement adjacent_difference.
  • #1555: Use CUB to implement scan_by_key.
  • #1611: Add new doxybook-based Thrust documentation at https://nvidia.github.io/thrust.
  • #1639: Fixed broken link in documentation. Thanks to @jrhemstad for this contribution.
  • #1644: Increase contrast of search input text in new doc site. Thanks to @bdice for this contribution.
  • #1647: Add __forceinline__ annotations to a functor wrapper. Thanks to @mkuron for this contribution.
  • #1660: Fixed typo in documentation example for permutation_iterator.
  • #1669: Add a new explicit_cuda_stream.cu example that shows how to use explicit CUDA streams and par/par_nosync execution policies.

Thrust 1.16.0

08 Feb 19:35
Compare
Choose a tag to compare

Summary

Thrust 1.16.0 provides a new “nosync” hint for the CUDA backend, as well as numerous bugfixes and stability improvements.

New thrust::cuda::par_nosync Execution Policy

Most of Thrust’s parallel algorithms are fully synchronous and will block the calling CPU thread until all work is completed. This design avoids many pitfalls associated with asynchronous GPU programming, resulting in simpler and less-error prone usage for new CUDA developers. Unfortunately, this improvement in user experience comes at a performance cost that often frustrates more experienced CUDA programmers.

Prior to this release, the only synchronous-to-asynchronous migration path for existing Thrust codebases involved significant refactoring, replacing calls to thrust algorithms with a limited set of future-based thrust::async algorithms or lower-level CUB kernels. The new thrust::cuda::par_nosync execution policy provides a new, less-invasive entry point for asynchronous computation.

par_nosync is a hint to the Thrust execution engine that any non-essential internal synchronizations should be skipped and that an explicit synchronization will be performed by the caller before accessing results.

While some Thrust algorithms require internal synchronization to safely compute their results, many do not. For example, multiple thrust::for_each invocations can be launched without waiting for earlier calls to complete:

// Queue three `for_each` kernels:
thrust::for_each(thrust::cuda::par_nosync, vec1.begin(), vec1.end(), Op{});
thrust::for_each(thrust::cuda::par_nosync, vec2.begin(), vec2.end(), Op{});
thrust::for_each(thrust::cuda::par_nosync, vec3.begin(), vec3.end(), Op{});

// Do other work while kernels execute:
do_something();

// Must explictly synchronize before accessing `for_each` results:
cudaDeviceSynchronize();

Thanks to @fkallen for this contribution.

Deprecation Notices

CUDA Dynamic Parallelism Support

A future version of Thrust will remove support for CUDA Dynamic Parallelism (CDP).

This will only affect calls to Thrust algorithms made from CUDA device-side code that currently launches a kernel; such calls will instead execute sequentially on the calling GPU thread instead of launching a device-wide kernel.

Breaking Changes

  • Thrust 1.14.0 included a change that aliased the cub namespace to thrust::cub. This has caused issues with ambiguous namespaces for projects that declare using namespace thrust; from the global namespace. We recommend against this practice.
  • #1572: Removed several unnecessary header includes. Downstream projects may need to update their includes if they were relying on this behavior.

New Features

  • #1568: Add thrust::cuda::par_nosync policy. Thanks to @fkallen for this contribution.

Enhancements

  • #1511: Use CUB’s new DeviceMergeSort API and remove Thrust’s internal implementation.
  • #1566: Improved performance of thrust::shuffle. Thanks to @djns99 for this contribution.
  • #1584: Support user-defined CMAKE_INSTALL_INCLUDEDIR values in Thrust’s CMake install rules. Thanks to @robertmaynard for this contribution.

Bug Fixes

  • #1496: Fix some issues affecting icc builds.
  • #1552: Fix some collisions with the min/max macros defined in windows.h.
  • #1582: Fix issue with function type alias on 32-bit MSVC builds.
  • #1591: Workaround issue affecting compilation with nvc++.
  • #1597: Fix some collisions with the small macro defined in windows.h.
  • #1599, #1603: Fix some issues with version handling in Thrust’s CMake packages.
  • #1614: Clarify that scan algorithm results are non-deterministic for pseudo-associative operators (e.g. floating-point addition).

Thrust 1.15.0

25 Oct 19:07
Compare
Choose a tag to compare

Summary

Thrust 1.15.0 provides numerous bugfixes, including non-numeric thrust::sequence support, several MSVC-related compilation fixes, fewer conversion warnings, counting_iterator initialization, and documentation updates.

Deprecation Notices

A future version of Thrust will remove support for CUDA Dynamic Parallelism (CDP).

This will only affect calls to Thrust algorithms made from CUDA device-side code that currently launches a kernel; such calls will instead execute sequentially on the calling GPU thread instead of launching a device-wide kernel.

Bug Fixes

  • #1507: Allow thrust::sequence to work with non-numeric types. Thanks to Ben Jude (@bjude) for this contribution.
  • #1509: Avoid macro collision when calling max() on MSVC. Thanks to Thomas (@tomintheshell) for this contribution.
  • #1514: Initialize all members in counting_iterator's default constructor.
  • #1518: Fix std::allocator_traits on MSVC + C++17.
  • #1530: Fix several -Wconversion warnings. Thanks to Matt Stack (@matt-stack) for this contribution.
  • #1539: Fixed typo in thrust::for_each documentation. Thanks to Salman (@untamedimpala) for this contribution.
  • #1548: Avoid name collision with B0 macro in termios.h system header. Thanks to Philip Deegan (@PhilipDeegan) for this contribution.

Thrust 1.14.0 (NVIDIA HPC SDK 21.9)

24 Aug 18:39
Compare
Choose a tag to compare

Thrust 1.14.0 is a major release accompanying the NVIDIA HPC SDK 21.9.

This release adds the ability to wrap the thrust:: namespace in an external namespace, providing a workaround for a variety of shared library linking issues. Thrust also learned to detect when CUB's symbols are in a wrapped namespace and properly import them. To enable this feature, use #define THRUST_CUB_WRAPPED_NAMESPACE foo to wrap both Thrust and CUB in the foo:: namespace. See thrust/detail/config/namespace.h for details and more namespace options.

Several bugfixes are also included: The tuple_size and tuple_element helpers now support cv-qualified types.
scan_by_key uses less memory. thrust::iterator_traits is better integrated with std::iterator_traits. See below for more details and references.

New Features

  • #1464: Add preprocessor hooks that allow thrust:: to be wrapped in an external namespace, and support cases when CUB is wrapped in an external namespace.

Bug Fixes

  • #1457: Support cv-qualified types in thrust::tuple_size and thrust::tuple_element. Thanks to Jake Hemstad for this contribution.
  • #1471: Fixed excessive memory allocation in scan_by_key. Thanks to Lilo Huang for this contribution.
  • #1476: Removed dead code from the expand example. Thanks to Lilo Huang for this contribution.
  • #1488: Fixed the path to the installed CUB headers in the CMake find_package configuration files.
  • #1491: Fallback to std::iterator_traits when no thrust::iterator_traits specialization exists for an iterator type. Thanks to Divye Gala for this contribution.

Thrust 1.13.1 (CUDA Toolkit 11.5)

25 Oct 19:02
Compare
Choose a tag to compare

Thrust 1.13.1 is a minor release accompanying the CUDA Toolkit 11.5.

This release provides a new hook for embedding the thrust:: namespace inside a custom namespace. This is intended to work around various issues related to linking multiple shared libraries that use Thrust. The existing CUB_NS_PREFIX and CUB_NS_POSTFIX macros already provided this capability for CUB; this update provides a simpler mechanism that is extended to and integrated with Thrust. Simply define THRUST_CUB_WRAPPED_NAMESPACE to a namespace name, and both thrust:: and cub:: will be placed inside the new namespace. Using different wrapped namespaces for each shared library will prevent issues like those reported in #1401.

New Features

  • #1464: Add THRUST_CUB_WRAPPED_NAMESPACE hooks.

Bug Fixes

  • #1488: Fix path to installed CUB in Thrust's CMake config files.