pax_global_header00006660000000000000000000000064134462111620014512gustar00rootroot0000000000000052 comment=aded199441e022f7fb89d51c1b2804ebc448c1af thrust-1.9.5/000077500000000000000000000000001344621116200130575ustar00rootroot00000000000000thrust-1.9.5/.gitignore000066400000000000000000000000451344621116200150460ustar00rootroot00000000000000thrust/system/cuda/detail/.gitignore thrust-1.9.5/CHANGELOG000066400000000000000000001474431344621116200143060ustar00rootroot00000000000000####################################### # Thrust v1.9.5 (CUDA 10.1 Update 1) # ####################################### Summary Thrust v1.9.5 is a minor release accompanying the CUDA 10.1 Update 1 CUDA Toolkit release. Bug Fixes 2502854 Assignment of of complex vector between host and device fails to compile in CUDA >=9.1 with GCC 6 ####################################### # Thrust v1.9.4 (CUDA 10.1) # ####################################### Summary Thrust v1.9.4 adds asynchronous interfaces for parallel algorithms, a new allocator system including caching allocators and unified memory support, as well as a variety of other enhancements, mostly related to C++11/C++14/C++17/C++20 support. The new asynchronous algorithms in the `thrust::async` namespace return `thrust::event` or `thrust::future` objects, which can be waited upon to synchronize with the completion of the parallel operation. New Features `thrust::event` and `thrust::future`, uniquely-owned asynchronous handles consisting of a state (ready or not ready), content (some value; for `thrust::future` only), and an optional set of objects that should be destroyed only when the future's value is ready and has been consumed. The design is loosely based on C++11's `std::future`. They can be `.wait`'d on, and the value of a future can be waited on and retrieved with `.get` or `.extract`. Multiple `thrust::event`s and `thrust::future`s can be combined with `thrust::when_all`. `thrust::future`s can be converted to `thrust::event`s. Currently, these primitives are only implemented for the CUDA backend and are C++11 only. New asynchronous algorithms that return `thrust::event`/`thrust::future`s, implemented as C++20 range style customization points: `thrust::async::reduce`. `thrust::async::reduce_into`, which takes a target location to store the reduction result into. `thrust::async::copy`, including a two-policy overload that allows explicit cross system copies which execution policy properties can be attached to. `thrust::async::transform`. `thrust::async::for_each`. `thrust::async::stable_sort`. `thrust::async::sort`. By default the asynchronous algorithms use the new caching allocators. Deallocation of temporary storage is deferred until the destruction of the returned `thrust::future`. The content of `thrust::future`s is stored in either device or universal memory and transferred to the host only upon request to prevent unnecessary data migration. Asynchronous algorithms are currently only implemented for the CUDA system and are C++11 only. `exec.after(f, g, ...)`, a new execution policy method that takes a set of `thrust::event`/`thrust::future`s and returns an execution policy that operations on that execution policy should depend upon. New logic and mindset for the type requirements for cross-system sequence copies (currently only used by `thrust::async::copy`), based on: `thrust::is_contiguous_iterator` and `THRUST_PROCLAIM_CONTIGUOUS_ITERATOR` for detecting/indicating that an iterator points to contiguous storage. `thrust::is_trivially_relocatable` and `THRUST_PROCLAIM_TRIVIALLY_RELOCATABLE` for detecting/indicating that a type is `memcpy`able (based on principles from https://wg21.link/P1144). The new approach reduces buffering, increases performance, and increases correctness. The fast path is now enabled when copying fp16 and CUDA vector types with `thrust::async::copy`. All Thrust synchronous algorithms for the CUDA backend now actually synchronize. Previously, any algorithm that did not allocate temporary storage (counterexample: `thrust::sort`) and did not have a computation-dependent result (counterexample: `thrust::reduce`) would actually be launched asynchronously. Additionally, synchronous algorithms that allocated temporary storage would become asynchronous if a custom allocator was supplied that did not synchronize on allocation/deallocation, unlike `cudaMalloc`/`cudaFree`. So, now `thrust::for_each`, `thrust::transform`, `thrust::sort`, etc are truly synchronous. In some cases this may be a performance regression; if you need asynchrony, use the new asynchronous algorithms. Thrust's allocator framework has been rewritten. It now uses a memory resource system, similar to C++17's `std::pmr` but supporting static polymorphism. Memory resources are objects that allocate untyped storage and allocators are cheap handles to memory resources in this new model. The new facilities live in ``. `thrust::mr::memory_resource`, the memory resource base class, which takes a (possibly tagged) pointer to `void` type as a parameter. `thrust::mr::allocator`, an allocator backed by a memory resource object. `thrust::mr::polymorphic_adaptor_resource`, a type-erased memory resource adaptor. `thrust::mr::polymorphic_allocator`, a C++17-style polymorphic allocator backed by a type-erased memory resource object. New tunable C++17-style caching memory resources, `thrust::mr::(disjoint_)?(un)?synchronized_pool_resource`, designed to cache both small object allocations and large repetitive temporary allocations. The disjoint variants use separate storage for management of the pool, which is necessary if the memory being allocated cannot be accessed on the host (e.g. device memory). System-specific allocators were rewritten to use the new memory resource framework. New `thrust::device_memory_resource` for allocating device memory. New `thrust::universal_memory_resource` for allocating memory that can be accessed from both the host and device (e.g. `cudaMallocManaged`). New `thrust::universal_host_pinned_memory_resource` for allocating memory that can be accessed from the host and the device but always resides in host memory (e.g. `cudaMallocHost`). `thrust::get_per_device_resource` and `thrust::per_device_allocator`, which lazily create and retrieve a per-device singleton memory resource. Rebinding mechanisms (`rebind_traits` and `rebind_alloc`) for `thrust::allocator_traits`. `thrust::device_make_unique`, a factory function for creating a `std::unique_ptr` to a newly allocated object in device memory. ``, a C++11 implementation of the C++17 uninitialized memory algorithms. `thrust::allocate_unique` and friends, based on the proposed C++23 `std::allocate_unique` (https://wg21.link/P0211). New type traits and metaprogramming facilities. Type traits are slowly being migrated out of `thrust::detail::` and ``; their new home will be `thrust::` and ``. `thrust::is_execution_policy`. `thrust::is_operator_less_or_greater_function_object`, which detects `thrust::less`, `thrust::greater`, `std::less`, and `std::greater`. `thrust::is_operator_plus_function_object``, which detects `thrust::plus` and `std::plus`. `thrust::remove_cvref(_t)?`, a C++11 implementation of C++20's `thrust::remove_cvref(_t)?`. `thrust::void_t`, and various other new type traits. `thrust::integer_sequence` and friends, a C++11 implementation of C++20's `std::integer_sequence` `thrust::conjunction`, `thrust::disjunction`, and `thrust::disjunction`, a C++11 implementation of C++17's logical metafunctions. Some Thrust type traits (such as `thrust::is_constructible`) have been redefined in terms of C++11's type traits when they are available. ``, new `std::tuple` algorithms: `thrust::tuple_transform`. `thrust::tuple_for_each`. `thrust::tuple_subset`. Miscellaneous new `std::`-like facilities: `thrust::optional`, a C++11 implementation of C++17's `std::optional`. `thrust::addressof`, an implementation of C++11's `std::addressof`. `thrust::next` and `thrust::prev`, an implementation of C++11's `std::next` and `std::prev`. `thrust::square`, a `` style unary function object that multiplies its argument by itself. `` and `thrust::numeric_limits`, a customized version of `` and `std::numeric_limits`. ``, new general purpose preprocessor facilities: `THRUST_PP_CAT[2-5]`, concatenates two to five tokens. `THRUST_PP_EXPAND(_ARGS)?`, performs double expansion. `THRUST_PP_ARITY` and `THRUST_PP_DISPATCH`, tools for macro overloading. `THRUST_PP_BOOL`, boolean conversion. `THRUST_PP_INC` and `THRUST_PP_DEC`, increment/decrement. `THRUST_PP_HEAD`, a variadic macro that expands to the first argument. `THRUST_PP_TAIL`, a variadic macro that expands to all its arguments after the first. `THRUST_PP_IIF`, bitwise conditional. `THRUST_PP_COMMA_IF`, and `THRUST_PP_HAS_COMMA`, facilities for adding and detecting comma tokens. `THRUST_PP_IS_VARIADIC_NULLARY`, returns true if called with a nullary `__VA_ARGS__`. `THRUST_CURRENT_FUNCTION`, expands to the name of the current function. New C++11 compatibility macros: `THRUST_NODISCARD`, expands to `[[nodiscard]]` when available and the best equivalent otherwise. `THRUST_CONSTEXPR`, expands to `constexpr` when available and the best equivalent otherwise. `THRUST_OVERRIDE`, expands to `override` when available and the best equivalent otherwise. `THRUST_DEFAULT`, expands to `= default;` when available and the best equivalent otherwise. `THRUST_NOEXCEPT`, expands to `noexcept` when available and the best equivalent otherwise. `THRUST_FINAL`, expands to `final` when available and the best equivalent otherwise. `THRUST_INLINE_CONSTANT`, expands to `inline constexpr` when available and the best equivalent otherwise. ``, new C++11-only type deduction helpers: `THRUST_DECLTYPE_RETURNS*`, expand to function definitions with suitable conditional `noexcept` qualifiers and trailing return types. `THRUST_FWD(x)`, expands to `::std::forward(x)`. `THRUST_MVCAP`, expands to a lambda move capture. `THRUST_RETOF`, expands to a decltype computing the return type of an invocable. New Examples mr_basic demonstrates how to use the new memory resource allocator system. Other Enhancements Tagged pointer enhancements: New `thrust::pointer_traits` specialization for `void const*`. `nullptr` support to Thrust tagged pointers. New `explicit operator bool` for Thrust tagged pointers when using C++11 for `std::unique_ptr` interoperability. Added `thrust::reinterpret_pointer_cast` and `thrust::static_pointer_cast` for casting Thrust tagged pointers. Iterator enhancements: `thrust::iterator_system` is now SFINAE friendly. Removed cv qualifiers from iterator types when using `thrust::iterator_system`. Static assert enhancements: New `THRUST_STATIC_ASSERT_MSG`, takes an optional string constant to be used as the error message when possible. Update `THRUST_STATIC_ASSERT(_MSG)` to use C++11's `static_assert` when it's available. Introduce a way to test for static assertions. Testing enhancements: Additional scalar and sequence types, including non-builtin types and vectors with unified memory allocators, have been added to the list of types used by generic unit tests. The generation of random input data has been improved to increase the range of values used and catch more corner cases. New `truncate_to_max_representable` utility for avoiding the generation of ranges that cannot be represented by the underlying element type in generic unit test code. The test driver now synchronizes with CUDA devices and check for errors after each test, when switching devices, and after each raw kernel launch. The warningtester uber header is now compiled with NVCC to avoid needing to disable CUDA-specific code with the preprocessor. Fixed the unit test framework's `ASSERT_*` to print `char`s as `int`s. New `DECLARE_INTEGRAL_VARIABLE_UNITTEST` test declaration macro. New `DECLARE_VARIABLE_UNITTEST_WITH_TYPES_AND_NAME` test declaration macro. `thrust::system_error` in the CUDA backend now print out its `cudaError_t` enumerator in addition to the diagnostic message. Stopped using conditionally signed types like `char`. Bug Fixes #897, 2062242 Fix compilation error when using `__device__` lambdas with `reduce` on MSVC. #908, 2089386 Static assert that `thrust::generate`/`thrust::fill` isn't operate on const iterators. #919 Fix compilation failure with `thrust::zip_iterator` and `thrust::complex`. #924, 2096679, 2315990 Fix dispatch for the CUDA backend's `thrust::reduce` to use two functions (one with the pragma for disabling exec checks, one with THRUST_RUNTIME_FUNCTION) instead of one. This fixes a regression with device compilation that started in CUDA 9.2. #928, 2341455 Add missing `__host__ __device__` annotations to a `thrust::complex::operator=` to satisfy GoUDA. 2094642 Make `thrust::vector_base::clear` not depend on the element type being default constructible. 2289115 Remove flaky `simple_cuda_streams` example. 2328572 Add missing `thrust::device_vector` constructor that takes an allocator parameter. 2455740 Update the `range_view` example to not use device-side launch. 2455943 Ensure that sized unit tests that use `counting_iterator` perform proper truncation. 2455952 Refactor questionable `copy_if` unit tests. ####################################### # Thrust v1.9.3 (CUDA 10.0) # ####################################### Summary Thrust v1.9.3 unifies and integrates CUDA Thrust and GitHub Thrust. Bug Fixes #725, #850, #855, #859, #860 Unify `iter_swap` interface and fix `device_reference` swapping. 2004663 Add a `data` method to `detail::temporary_array` and refactor temporary memory allocation in the CUDA backend to be exception and leak safe. #886, #894, #914 Various documentation typo fixes. #724 Provide NVVMIR_LIBRARY_DIR environment variable to NVCC. #878 Optimize min/max_element to only use `get_iterator_value` for non-numeric types. #899 Make `pinned_allocator`'s comparison operators `const`. 2092152 Remove all includes of ``. #911 Fix default comparator element type for `merge_by_key`. Acknowledgments Thanks to Andrew Corrigan for contributing fixes for swapping interfaces. Thanks to Francisco Facioni for contributing optimizations for min/max_element. ####################################### # Thrust v1.9.2 (CUDA 9.2) # ####################################### Summary Thrust v1.9.2 brings a variety of performance enhancements, bug fixes and test improvements. CUB 1.7.5 was integrated, enhancing the performance of `sort` on small data types and `reduce`. Changes were applied to `complex` to optimize memory access. Thrust now compiles with compiler warnings enabled and treated as errors. Additionally, the unit test suite and framework was enhanced to increase coverage. New Features `` - utilities for memory alignment. Breaking Changes The `fallback_allocator` example was removed, as it was buggy and difficult to support. Bug Fixes 200385527, 200385119, 200385113, 200349350, 2058778 Various compiler warning issues. 200355591 `reduce` performance issues. 2053727 ADL bug causing user-supplied `allocate` to be overlooked but `deallocate` to be called with GCC <= 4.3. 1777043 `complex` does not work with `sequence`. ####################################### # Thrust v1.9.1-2 (CUDA 9.1) # ####################################### Summary Thrust v1.9.1-2 integrates version 1.7.4 of CUB for the new CUDA backend and introduces a new CUDA backend for `reduce` based on CUB. Bug Fixes 1965743 Remove unnecessary static qualifiers. 1940974 Fix regression causing a compilation error when using `merge_by_key` with `constant_iterator`s. 1904217 Allow callables that take non-const refs to be used with reduce and scan. ####################################### # Thrust v1.9.0-4 (CUDA 9.0) # ####################################### Summary Thrust v1.9.0-4 replaces the original CUDA backend (bulk) with a new one written using CUB, a high performance CUDA collectives library. This brings a substantial performance improvement to the CUDA backend across the board. Breaking API Changes Any code depending on CUDA backend implementation details will likely be broken. New Features thrust::transform_output_iterator New Examples transform_output_iterator demonstrates use of a transform_output_iterator - a new fancy output iterator which transform output before storing result the memory Other Enhancements If C++11 support is enabled, functors do not have to inherit from thrust::unary_function/thrust::binary_function anymore when using them with thrust::transform_iterator. Additionally, the move constructor and move assignment operator have been implemented for host_vector, device_vector, cpp::vector, cuda::vector, omp::vector and tbb::vector. Bug Fixes Calculating sin(complex) no longer has precision loss to float Acknowledgments Thanks to Manuel Schiller for contributing a C++11 based enhancement regarding the deduction of functor return types, improving the performance of thrust::unique and implementing transform_output_iterator. Thanks to Thibault Notargiacomo for the implementation of move semantics for the vector_base based class. Thanks to Duane Merrill for developing CUB and helping to integrate it into Thrust's backend. ####################################### # Thrust v1.8.3-2 (CUDA 8.0) # ####################################### Summary Small bug fixes New Examples range_view demonstrates use of a view: a non-owning wrapper for an iterator range with a container-like interface Bug Fixes copy_if, set_operations, reduce_by_key, and their ilks access temporary data in a user provided stream instead of a default one {min,max,minmax}_element can now accept raw device pointer with device execution policy If C++11 support is enabled, functors do not have to inherit from thrust::unary_function/thrust::binary_function anymore when using them with thrust::transform_iterator. clear() operations on vector types no longer requires the element type to have a default constructor ####################################### # Thrust v1.8.2 (CUDA 7.0) # ####################################### Summary Small bug fixes Bug Fixes Avoid warnings and errors concerning user functions called from __host__ __device__ functions #632 CUDA set_intersection_by_key error #651 thrust::copy between host & device is not interoperable with thrust::cuda::par.on(stream) #664 CUDA for_each ignores execution policy's stream Known Issues #628 CUDA's reduce_by_key fails on sm_50 devices ####################################### # Thrust v1.8.1 (CUDA 7.0) # ####################################### Summary Small bug fixes Bug Fixes #615 CUDA for_each accesses illegal memory locations when given a large range #620 CUDA's reduce_by_key fails on large input Known Issues #628 CUDA's reduce_by_key fails on sm_50 devices ####################################### # Thrust v1.8.0 # ####################################### Summary Thrust 1.8.0 introduces support for algorithm invocation from CUDA __device__ code, support for CUDA streams, and algorithm performance improvements. Users may now invoke Thrust algorithms from CUDA __device__ code, providing a parallel algorithms library to CUDA programmers authoring custom kernels, as well as allowing Thrust programmers to nest their algorithm calls within functors. The thrust::seq execution policy allows users to require sequential algorithm execution in the calling thread and makes a sequential algorithms library available to individual CUDA threads. The .on(stream) syntax allows users to request a CUDA stream for kernels launched during algorithm execution. Finally, new CUDA algorithm implementations provide substantial performance improvements. Breaking API Changes None. New Features Algorithms in CUDA __device__ code Thrust algorithms may now be invoked from CUDA __device__ and __host__ __device__ functions. Algorithms invoked in this manner must be invoked with an execution policy as the first parameter: __device__ int my_device_sort(int *data, size_t n) { thrust::sort(thrust::device, data, data + n); } The following execution policies are supported in CUDA __device__ code: thrust::seq thrust::cuda::par thrust::device, when THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA Parallel algorithm execution may not be accelerated unless CUDA Dynamic Parallelism is available. Execution Policies CUDA Streams The thrust::cuda::par.on(stream) syntax allows users to request that CUDA __global__ functions launched during algorithm execution should occur on a given stream: // execute for_each on stream s thrust::for_each(thrust::cuda::par.on(s), begin, end, my_functor); Algorithms executed with a CUDA stream in this manner may still synchronize with other streams when allocating temporary storage or returning results to the CPU. thrust::seq The thrust::seq execution policy allows users to require that an algorithm execute sequentially in the calling thread: // execute for_each sequentially in this thread thrust::for_each(thrust::seq, begin, end, my_functor); Other The new thrust::complex template provides complex number support. New Examples simple_cuda_streams demonstrates how to request a CUDA stream during algorithm execution. async_reduce demonstrates ways to achieve algorithm invocations which are asynchronous with the calling thread. Other Enhancements CUDA sort performance for user-defined types is 300% faster on Tesla K20c for large problem sizes. CUDA merge performance is 200% faster on Tesla K20c for large problem sizes. CUDA sort performance for primitive types is 50% faster on Tesla K20c for large problem sizes. CUDA reduce_by_key performance is 25% faster on Tesla K20c for large problem sizes. CUDA scan performance is 15% faster on Tesla K20c for large problem sizes. fallback_allocator example is simpler. Bug Fixes #364 iterators with unrelated system tags may be used with algorithms invoked with an execution policy #371 do not redefine __CUDA_ARCH__ #379 fix crash when dereferencing transform_iterator on the CPU #391 avoid use of uppercase variable names #392 fix thrust::copy between cusp::complex & std::complex #396 program compiled with gcc < 4.3 hangs during comparison sort #406 fallback_allocator.cu example checks device for unified addressing support #417 avoid using std::less in binary search algorithms #418 avoid various warnings #443 including version.h no longer configures default systems #578 nvcc produces warnings when sequential algorithms are used with cpu systems Known Issues When invoked with primitive data types, thrust::sort, thrust::sort_by_key, thrust::stable_sort, & thrust::stable_sort_by_key may fail to link in some cases with nvcc -rdc=true. The CUDA implementation of thrust::reduce_by_key incorrectly outputs the last element in a segment of equivalent keys instead of the first. Acknowledgments Thanks to Sean Baxter for contributing faster CUDA reduce, merge, and scan implementations. Thanks to Duane Merrill for contributing a faster CUDA radix sort implementation. Thanks to Filipe Maia for contributing the implementation of thrust::complex. ####################################### # Thrust v1.7.2 (CUDA 6.5) # ####################################### Summary Small bug fixes Bug Fixes Avoid use of std::min in generic find implementation ####################################### # Thrust v1.7.1 (CUDA 6.0) # ####################################### Summary Small bug fixes Bug Fixes Eliminate identifiers in set_operations.cu example with leading underscore Eliminate unused variable warning in CUDA reduce_by_key implementation Avoid deriving function objects from std::unary_function and std::binary_function ####################################### # Thrust v1.7.0 (CUDA 5.5) # ####################################### Summary Thrust 1.7.0 introduces a new interface for controlling algorithm execution as well as several new algorithms and performance improvements. With this new interface, users may directly control how algorithms execute as well as details such as the allocation of temporary storage. Key/value versions of thrust::merge and the set operation algorithms have been added, as well stencil versions of partitioning algorithms. thrust::tabulate has been introduced to tabulate the values of functions taking integers. For 32b types, new CUDA merge and set operations provide 2-15x faster performance while a new CUDA comparison sort provides 1.3-4x faster performance. Finally, a new TBB reduce_by_key implementation provides 80% faster performance. Breaking API Changes Dispatch Custom user backend systems' tag types must now inherit from the corresponding system's execution_policy template (e.g. thrust::cuda::execution_policy) instead of the tag struct (e.g. thrust::cuda::tag). Otherwise, algorithm specializations will silently go unfound during dispatch. See examples/minimal_custom_backend.cu and examples/cuda/fallback_allocator.cu for usage examples. thrust::advance and thrust::distance are no longer dispatched based on iterator system type and thus may no longer be customized. Iterators iterator_facade and iterator_adaptor's Pointer template parameters have been eliminated. iterator_adaptor has been moved into the thrust namespace (previously thrust::experimental::iterator_adaptor). iterator_facade has been moved into the thrust namespace (previously thrust::experimental::iterator_facade). iterator_core_access has been moved into the thrust namespace (previously thrust::experimental::iterator_core_access). All iterators' nested pointer typedef (the type of the result of operator->) is now void instead of a pointer type to indicate that such expressions are currently impossible. Floating point counting_iterators' nested difference_type typedef is now a signed integral type instead of a floating point type. Other normal_distribution has been moved into the thrust::random namespace (previously thrust::random::experimental::normal_distribution). Placeholder expressions may no longer include the comma operator. New Features Execution Policies Users may directly control the dispatch of algorithm invocations with optional execution policy arguments. For example, instead of wrapping raw pointers allocated by cudaMalloc with thrust::device_ptr, the thrust::device execution_policy may be passed as an argument to an algorithm invocation to enable CUDA execution. The following execution policies are supported in this version: thrust::host thrust::device thrust::cpp::par thrust::cuda::par thrust::omp::par thrust::tbb::par Algorithms free get_temporary_buffer malloc merge_by_key partition with stencil partition_copy with stencil return_temporary_buffer set_difference_by_key set_intersection_by_key set_symmetric_difference_by_key set_union_by_key stable_partition with stencil stable_partition_copy with stencil tabulate New Examples uninitialized_vector demonstrates how to use a custom allocator to avoid the automatic initialization of elements in thrust::device_vector. Other Enhancements Authors of custom backend systems may manipulate arbitrary state during algorithm dispatch by incorporating it into their execution_policy parameter. Users may control the allocation of temporary storage during algorithm execution by passing standard allocators as parameters via execution policies such as thrust::device. THRUST_DEVICE_SYSTEM_CPP has been added as a compile-time target for the device backend. CUDA merge performance is 2-15x faster. CUDA comparison sort performance is 1.3-4x faster. CUDA set operation performance is 1.5-15x faster. TBB reduce_by_key performance is 80% faster. Several algorithms have been parallelized with TBB. Support for user allocators in vectors has been improved. The sparse_vector example is now implemented with merge_by_key instead of sort_by_key. Warnings have been eliminated in various contexts. Warnings about __host__ or __device__-only functions called from __host__ __device__ functions have been eliminated in various contexts. Documentation about algorithm requirements have been improved. Simplified the minimal_custom_backend example. Simplified the cuda/custom_temporary_allocation example. Simplified the cuda/fallback_allocator example. Bug Fixes #248 fix broken counting_iterator behavior with OpenMP #231, #209 fix set operation failures with CUDA #187 fix incorrect occupancy calculation with CUDA #153 fix broken multigpu behavior with CUDA #142 eliminate warning produced by thrust::random::taus88 and MSVC 2010 #208 correctly initialize elements in temporary storage when necessary #16 fix compilation error when sorting bool with CUDA #10 fix ambiguous overloads of reinterpret_tag Known Issues g++ versions 4.3 and lower may fail to dispatch thrust::get_temporary_buffer correctly causing infinite recursion in examples such as cuda/custom_temporary_allocation. Acknowledgments Thanks to Sean Baxter, Bryan Catanzaro, and Manjunath Kudlur for contributing a faster merge implementation for CUDA. Thanks to Sean Baxter for contributing a faster set operation implementation for CUDA. Thanks to Cliff Woolley for contributing a correct occupancy calculation algorithm. ####################################### # Thrust v1.6.0 # ####################################### Summary Thrust v1.6.0 provides an interface for customization and extension and a new backend system based on the Threading Building Blocks library. With this new interface, programmers may customize the behavior of specific algorithms as well as control the allocation of temporary storage or invent entirely new backends. These enhancements also allow multiple different backend systems such as CUDA and OpenMP to coexist within a single program. Support for TBB allows Thrust programs to integrate more naturally into applications which may already employ the TBB task scheduler. Breaking API Changes The header has been moved to thrust::experimental::cuda::pinned_allocator has been moved to thrust::cuda::experimental::pinned_allocator The macro THRUST_DEVICE_BACKEND has been renamed THRUST_DEVICE_SYSTEM The macro THRUST_DEVICE_BACKEND_CUDA has been renamed THRUST_DEVICE_SYSTEM_CUDA The macro THRUST_DEVICE_BACKEND_OMP has been renamed THRUST_DEVICE_SYSTEM_OMP thrust::host_space_tag has been renamed thrust::host_system_tag thrust::device_space_tag has been renamed thrust::device_system_tag thrust::any_space_tag has been renamed thrust::any_system_tag thrust::iterator_space has been renamed thrust::iterator_system New Features Backend Systems Threading Building Blocks (TBB) is now supported Functions for_each_n raw_reference_cast Types pointer reference New Examples cuda/custom_temporary_allocation cuda/fallback_allocator device_ptr expand minimal_custom_backend raw_reference_cast set_operations Other Enhancements thrust::for_each now returns the end of the input range similar to most other algorithms thrust::pair and thrust::tuple have swap functionality all CUDA algorithms now support large data types iterators may be dereferenced in user __device__ or __global__ functions the safe use of different backend systems is now possible within a single binary Bug Fixes #469 min_element and max_element algorithms no longer require a const comparison operator Known Issues cudafe++.exe may crash when parsing TBB headers on Windows. ####################################### # Thrust v1.5.3 (CUDA 5.0) # ####################################### Summary Small bug fixes Bug Fixes Avoid warnings about potential race due to __shared__ non-POD variable ####################################### # Thrust v1.5.2 (CUDA 4.2) # ####################################### Summary Small bug fixes Bug Fixes Fixed warning about C-style initialization of structures ####################################### # Thrust v1.5.1 (CUDA 4.1) # ####################################### Summary Small bug fixes Bug Fixes Sorting data referenced by permutation_iterators on CUDA produces invalid results ####################################### # Thrust v1.5.0 # ####################################### Summary Thrust v1.5.0 provides introduces new programmer productivity and performance enhancements. New functionality for creating anonymous "lambda" functions has been added. A faster host sort provides 2-10x faster performance for sorting arithmetic types on (single-threaded) CPUs. A new OpenMP sort provides 2.5x-3.0x speedup over the host sort using a quad-core CPU. When sorting arithmetic types with the OpenMP backend the combined performance improvement is 5.9x for 32-bit integers and ranges from 3.0x (64-bit types) to 14.2x (8-bit types). A new CUDA reduce_by_key implementation provides 2-3x faster performance. Breaking API Changes device_ptr no longer unsafely converts to device_ptr without an explicit cast. Use the expression device_pointer_cast(static_cast(void_ptr.get())) to convert, for example, device_ptr to device_ptr. New Features Functions stencil-less transform_if Types lambda placeholders New Examples lambda Other Enhancements host sort is 2-10x faster for arithmetic types OMP sort provides speedup over host sort reduce_by_key is 2-3x faster reduce_by_key no longer requires O(N) temporary storage CUDA scan algorithms are 10-40% faster host_vector and device_vector are now documented out-of-memory exceptions now provide detailed information from CUDART improved histogram example device_reference now has a specialized swap reduce_by_key and scan algorithms are compatible with discard_iterator Removed Functionality Bug Fixes #44 allow host_vector to compile when value_type uses __align__ #198 allow adjacent_difference to permit safe in-situ operation #303 make thrust thread-safe #313 avoid race conditions in device_vector::insert #314 avoid unintended adl invocation when dispatching copy #365 fix merge and set operation failures Known Issues None Acknowledgments Thanks to Manjunath Kudlur for contributing his Carbon library, from which the lambda functionality is derived. Thanks to Jean-Francois Bastien for suggesting a fix for issue 303. ####################################### # Thrust v1.4.0 (CUDA 4.0) # ####################################### Summary Thrust v1.4.0 provides support for CUDA 4.0 in addition to many feature and performance improvements. New set theoretic algorithms operating on sorted sequences have been added. Additionally, a new fancy iterator allows discarding redundant or otherwise unnecessary output from algorithms, conserving memory storage and bandwidth. Breaking API Changes Eliminations thrust/is_sorted.h thrust/utility.h thrust/set_intersection.h thrust/experimental/cuda/ogl_interop_allocator.h and the functionality therein thrust::deprecated::copy_when thrust::deprecated::absolute_value New Features Functions copy_n merge set_difference set_symmetric_difference set_union Types discard_iterator Device support Compute Capability 2.1 GPUs New Examples run_length_decoding Other Enhancements Compilation warnings are substantially reduced in various contexts. The compilation time of thrust::sort, thrust::stable_sort, thrust::sort_by_key, and thrust::stable_sort_by_key are substantially reduced. A fast sort implementation is used when sorting primitive types with thrust::greater. The performance of thrust::set_intersection is improved. The performance of thrust::fill is improved on SM 1.x devices. A code example is now provided in each algorithm's documentation. thrust::reverse now operates in-place Removed Functionality thrust::deprecated::copy_when thrust::deprecated::absolute_value thrust::experimental::cuda::ogl_interop_allocator thrust::gather and thrust::scatter from host to device and vice versa are no longer supported. Operations which modify the elements of a thrust::device_vector are no longer available from source code compiled without nvcc when the device backend is CUDA. Instead, use the idiom from the cpp_interop example. Bug Fixes #212 set_intersection works correctly for large input sizes. #275 counting_iterator and constant_iterator work correctly with OpenMP as the backend when compiling with optimization #256 min and max correctly return their first argument as a tie-breaker #248 NDEBUG is interpreted correctly Known Issues nvcc may generate code containing warnings when compiling some Thrust algorithms. When compiling with -arch=sm_1x, some Thrust algorithms may cause nvcc to issue benign pointer advisories. When compiling with -arch=sm_1x and -G, some Thrust algorithms may fail to execute correctly. thrust::inclusive_scan, thrust::exclusive_scan, thrust::inclusive_scan_by_key, and thrust::exclusive_scan_by_key are currently incompatible with thrust::discard_iterator. Acknowledgments Thanks to David Tarjan for improving the performance of set_intersection. Thanks to Duane Merrill for continued help with sort. Thanks to Nathan Whitehead for help with CUDA Toolkit integration. ####################################### # Thrust v1.3.0 (CUDA 3.2) # ####################################### Summary Thrust v1.3.0 provides support for CUDA 3.2 in addition to many feature and performance enhancements. Performance of the sort and sort_by_key algorithms is improved by as much as 3x in certain situations. The performance of stream compaction algorithms, such as copy_if, is improved by as much as 2x. Reduction performance is also improved, particularly for small input sizes. CUDA errors are now converted to runtime exceptions using the system_error interface. Combined with a debug mode, also new in v1.3, runtime errors can be located with greater precision. Lastly, a few header files have been consolidated or renamed for clarity. See the deprecations section below for additional details. Breaking API Changes Promotions thrust::experimental::inclusive_segmented_scan has been renamed thrust::inclusive_scan_by_key and exposes a different interface thrust::experimental::exclusive_segmented_scan has been renamed thrust::exclusive_scan_by_key and exposes a different interface thrust::experimental::partition_copy has been renamed thrust::partition_copy and exposes a different interface thrust::next::gather has been renamed thrust::gather thrust::next::gather_if has been renamed thrust::gather_if thrust::unique_copy_by_key has been renamed thrust::unique_by_key_copy Deprecations thrust::copy_when has been renamed thrust::deprecated::copy_when thrust::absolute_value has been renamed thrust::deprecated::absolute_value The header thrust/set_intersection.h is now deprecated; use thrust/set_operations.h instead The header thrust/utility.h is now deprecated; use thrust/swap.h instead The header thrust/swap_ranges.h is now deprecated; use thrust/swap.h instead Eliminations thrust::deprecated::gather thrust::deprecated::gather_if thrust/experimental/arch.h and the functions therein thrust/sorting/merge_sort.h thrust/sorting/radix_sort.h New Features Functions exclusive_scan_by_key find find_if find_if_not inclusive_scan_by_key is_partitioned is_sorted_until mismatch partition_point reverse reverse_copy stable_partition_copy Types system_error and related types experimental::cuda::ogl_interop_allocator bit_and, bit_or, and bit_xor Device support gf104-based GPUs New Examples opengl_interop.cu repeated_range.cu simple_moving_average.cu sparse_vector.cu strided_range.cu Other Enhancements Performance of thrust::sort and thrust::sort_by_key is substantially improved for primitive key types Performance of thrust::copy_if is substantially improved Performance of thrust::reduce and related reductions is improved THRUST_DEBUG mode added Callers of Thrust functions may detect error conditions by catching thrust::system_error, which derives from std::runtime_error The number of compiler warnings generated by Thrust has been substantially reduced Comparison sort now works correctly for input sizes > 32M min & max usage no longer collides with definitions Compiling against the OpenMP backend no longer requires nvcc Performance of device_vector initialized in .cpp files is substantially improved in common cases Performance of thrust::sort_by_key on the host is substantially improved Removed Functionality nvcc 2.3 is no longer supported Bug Fixes Debug device code now compiles correctly thrust::uninitialized_copy and thrust::unintialized_fill now dispatch constructors on the device rather than the host Known Issues #212 set_intersection is known to fail for large input sizes partition_point is known to fail for 64b types with nvcc 3.2 Acknowledgments Thanks to Duane Merrill for contributing a fast CUDA radix sort implementation Thanks to Erich Elsen for contributing an implementation of find_if Thanks to Andrew Corrigan for contributing changes which allow the OpenMP backend to compile in the absence of nvcc Thanks to Andrew Corrigan, Cliff Wooley, David Coeurjolly, Janick Martinez Esturo, John Bowers, Maxim Naumov, Michael Garland, and Ryuta Suzuki for bug reports Thanks to Cliff Woolley for help with testing ####################################### # Thrust v1.2.1 (CUDA 3.1) # ####################################### Summary Small fixes for compatibility with CUDA 3.1 Known Issues inclusive_scan & exclusive_scan may fail with very large types the Microsoft compiler may fail to compile code using both sort and binary search algorithms uninitialized_fill & uninitialized_copy dispatch constructors on the host rather than the device # 109 some algorithms may exhibit poor performance with the OpenMP backend with large numbers (>= 6) of CPU threads default_random_engine::discard is not accelerated with nvcc 2.3 nvcc 3.1 may fail to compile code using types derived from thrust::subtract_with_carry_engine, such as thrust::ranlux24 & thrust::ranlux48. ####################################### # Thrust v1.2.0 # ####################################### Summary Thrust v1.2 introduces support for compilation to multicore CPUs and the Ocelot virtual machine, and several new facilities for pseudo-random number generation. New algorithms such as set intersection and segmented reduction have also been added. Lastly, improvements to the robustness of the CUDA backend ensure correctness across a broad set of (uncommon) use cases. Breaking API Changes thrust::gather's interface was incorrect and has been removed. The old interface is deprecated but will be preserved for Thrust version 1.2 at thrust::deprecated::gather & thrust::deprecated::gather_if. The new interface is provided at thrust::next::gather & thrust::next::gather_if. The new interface will be promoted to thrust:: in Thrust version 1.3. For more details, please refer to this thread: http://groups.google.com/group/thrust-users/browse_thread/thread/f5f0583cb97b51fd The thrust::sorting namespace has been deprecated in favor of the top-level sorting functions, such as thrust::sort() and thrust::sort_by_key(). New Features Functions reduce_by_key set_intersection tie unique_copy unique_by_key unique_copy_by_key Types Random Number Generation discard_block_engine default_random_engine linear_congruential_engine linear_feedback_shift_engine minstd_rand minstd_rand0 normal_distribution (experimental) ranlux24 ranlux48 ranlux24_base ranlux48_base subtract_with_carry_engine taus88 uniform_int_distribution uniform_real_distribution xor_combine_engine Functionals project1st project2nd Fancy Iterators permutation_iterator reverse_iterator Device support Add support for multicore CPUs via OpenMP Add support for Fermi-class GPUs Add support for Ocelot virtual machine New Examples cpp_integration histogram mode monte_carlo monte_carlo_disjoint_sequences padded_grid_reduction permutation_iterator row_sum run_length_encoding segmented_scan stream_compaction summary_statistics transform_iterator word_count Other Enhancements vector functions operator!=, rbegin, crbegin, rend, crend, data, & shrink_to_fit integer sorting performance is improved when max is large but (max - min) is small and when min is negative performance of inclusive_scan() and exclusive_scan() is improved by 20-25% for primitive types support for nvcc 3.0 Removed Functionality removed support for equal between host & device sequences removed support for gather() and scatter() between host & device sequences Bug Fixes # 8 cause a compiler error if the required compiler is not found rather than a mysterious error at link time # 42 device_ptr & device_reference are classes rather than structs, eliminating warnings on certain platforms # 46 gather & scatter handle any space iterators correctly # 51 thrust::experimental::arch functions gracefully handle unrecognized GPUs # 52 avoid collisions with common user macros such as BLOCK_SIZE # 62 provide better documentation for device_reference # 68 allow built-in CUDA vector types to work with device_vector in pure C++ mode # 102 eliminated a race condition in device_vector::erase various compilation warnings eliminated Known Issues inclusive_scan & exclusive_scan may fail with very large types the Microsoft compiler may fail to compile code using both sort and binary search algorithms uninitialized_fill & uninitialized_copy dispatch constructors on the host rather than the device # 109 some algorithms may exhibit poor performance with the OpenMP backend with large numbers (>= 6) of CPU threads default_random_engine::discard is not accelerated with nvcc 2.3 Acknowledgments Thanks to Gregory Diamos for contributing a CUDA implementation of set_intersection Thanks to Ryuta Suzuki & Gregory Diamos for rigorously testing Thrust's unit tests and examples against Ocelot Thanks to Tom Bradley for contributing an implementation of normal_distribution Thanks to Joseph Rhoads for contributing the example summary_statistics ####################################### # Thrust v1.1.1 # ####################################### Summary Small fixes for compatibility with CUDA 2.3a and Mac OSX Snow Leopard. ####################################### # Thrust v1.1.0 # ####################################### Summary Thrust v1.1 introduces fancy iterators, binary search functions, and several specialized reduction functions. Experimental support for segmented scan has also been added. Breaking API Changes counting_iterator has been moved into the thrust namespace (previously thrust::experimental) New Features Functions copy_if lower_bound upper_bound vectorized lower_bound vectorized upper_bound equal_range binary_search vectorized binary_search all_of any_of none_of minmax_element advance inclusive_segmented_scan (experimental) exclusive_segmented_scan (experimental) Types pair tuple device_malloc_allocator Fancy Iterators constant_iterator counting_iterator transform_iterator zip_iterator New Examples computing the maximum absolute difference between vectors computing the bounding box of a two-dimensional point set sorting multiple arrays together (lexicographical sorting) constructing a summed area table using zip_iterator to mimic an array of structs using constant_iterator to increment array values Other Enhancements added pinned memory allocator (experimental) added more methods to host_vector & device_vector (issue #4) added variant of remove_if with a stencil argument (issue #29) scan and reduce use cudaFuncGetAttributes to determine grid size exceptions are reported when temporary device arrays cannot be allocated Bug Fixes #5 make vector work for larger data types #9 stable_partition_copy doesn't respect OutputIterator concept semantics #10 scans should return OutputIterator #16 make algorithms work for larger data types #27 dispatch radix_sort even when comp=less is explicitly provided Known Issues Using functors with Thrust entry points may not compile on Mac OSX with gcc-4.0.1 uninitialized_copy & uninitialized_fill dispatch constructors on the host rather than the device. inclusive_scan, inclusive_scan_by_key, exclusive_scan, and exclusive_scan_by_key may fail when used with large types with the CUDA 3.1 driver ####################################### # Thrust v1.0.0 # ####################################### Breaking API changes Rename top level namespace komrade to thrust. Move partition_copy() & stable_partition_copy() into thrust::experimental namespace until we can easily provide the standard interface. Rename range() to sequence() to avoid collision with Boost.Range. Rename copy_if() to copy_when() due to semantic differences with C++0x copy_if(). New Features Add C++0x style cbegin() & cend() methods to host_vector & device_vector. Add transform_if function. Add stencil versions of replace_if() & replace_copy_if(). Allow counting_iterator to work with for_each(). Allow types with constructors in comparison sort & reduce. Other Enhancements merge_sort and stable_merge_sort are now 2 to 5x faster when executed on the parallel device. Bug fixes Workaround an issue where an incremented iterator causes nvcc to crash. (Komrade issue #6) Fix an issue where const_iterators could not be passed to transform. (Komrade issue #7) thrust-1.9.5/CMakeLists.txt000066400000000000000000000262121344621116200156220ustar00rootroot00000000000000cmake_minimum_required(VERSION 3.0) project(Thrust CXX) set(CMAKE_SKIP_INSTALL_ALL_DEPENDENCY true) file(READ "thrust/version.h" thrust_version_file) string(REGEX MATCH "THRUST_VERSION ([0-9]+)" DUMMY ${thrust_version_file}) set(thrust_version ${CMAKE_MATCH_1}) #message("thrust_version= ${thrust_version}") math(EXPR Thrust_VERSION_MAJOR "(${thrust_version} / 100000)") math(EXPR Thrust_VERSION_MINOR "(${thrust_version} / 100) % 1000") math(EXPR Thrust_VERSION_PATCH " ${thrust_version} % 100") message(STATUS "Thrust version ${Thrust_VERSION_MAJOR}.${Thrust_VERSION_MINOR}.${Thrust_VERSION_PATCH}") include(CTest) enable_testing() function(print_flags flags) message("${flags}:") set(flags ${${flags}}) set(__is_name True) foreach(arg ${flags}) if (__is_name) set(__arg_name ${arg}) set(__is_name False) else() separate_arguments(arg) set(arg ${arg}) message(" | ${__arg_name} : '${arg}'") set(__is_name True) endif() endforeach() endfunction() set( GNU_COMPILER_FLAGS WARN_ALL "-Wall" WARNINGS_AS_ERRORS "-Werror" RELEASE "-O2" DEBUG "-g" EXCEPTION_HANDLING " " CPP " " OMP "-fopenmp" TBB " " CUDA " " CUDA_BULK " " WORKAROUNDS " " C++03 " " C++11 "-std=c++11" ) set( GNU_LINKER_FLAGS DEBUG " " RELEASE " " WORKAROUNDS " " CPP " " OMP "-fopenmp" TBB " " CUDA " " CUDA_BULK " " ) set( CLANG_COMPILER_FLAGS WARN_ALL "-Wall" WARNINGS_AS_ERRORS "-Werror" RELEASE "-O2" DEBUG "-g" EXCEPTION_HANDLING " " CPP " " OMP "-fopenmp" TBB " " CUDA " " CUDA_BULK " " WORKAROUNDS " " C++03 " " C++11 "-std=c++11" ) set( CLANG_LINKER_FLAGS DEBUG " " RELEASE " " WORKAROUNDS " " #-stdlib=libstdc++" CPP " " OMP "-fopenmp" TBB " " CUDA " " CUDA_BULK " " ) set( MSVC_COMPILER_FLAGS WARN_ALL "/Wall" WARNINGS_AS_ERRORS "/Wx" RELEASE "/Ox" DEBUG "/Zi -D_DEBUG /MTd" EXCEPTION_HANDLING "/EHsc" CPP " " OMP "/openmp" TBB " " CUDA " " CUDA_BULK " " WORKAROUNDS "/DNOMINMAX /wd4503" C++03 " " C++11 "-std=c++11" ) set( MSVC_LINKER_FLAGS DEBUG "/debug" RELEASE " " WORKAROUND "/nologo" CPP " " OMP "/openmp" TBB " " CUDA " " CUDA_BULK " " WORKAROUNDS " " ) set(NV_LINKER_FLAGS ${GNU_LINKER_FLAGS}) print_flags(MSVC_COMPILER_FLAGS) function(add_option OPTION_NAME DESCRIPTION TYPE) if (${ARGC} EQUAL 3) message(FATAL_ERROR "No option value [list] is provided") endif() if (${OPTION_NAME} AND "x${TYPE}" STREQUAL "xSTRING") LIST(FIND ARGN ${${OPTION_NAME}} index) if (index EQUAL -1) message(FATAL_ERROR "Invalid value '${${OPTION_NAME}}' for '${DESCRIPTION}'") endif() endif() set(value_list ${ARGN}) LIST(GET value_list 0 default_value) LIST(SORT value_list) set(${OPTION_NAME} ${default_value} CACHE ${TYPE} ${DESCRIPTION}) if ("x${TYPE}" STREQUAL "xSTRING") set_property(CACHE ${OPTION_NAME} PROPERTY STRINGS ${value_list}) endif() endfunction() add_option(CUDA_ARCH "Compute capability code generation" STRING sm_61 sm_30 sm_32 sm_35 sm_37 sm_50 sm_52 sm_61) add_option(HOST_BACKEND "The host backend to target" STRING CPP OMP TBB) add_option(DEVICE_BACKEND "The device backend to target" STRING CUDA CUDA_BULK CPP OMP TBB) add_option(CUDA_CDP "Enable CUDA dynamic parallelism" BOOL False) add_option(CXX_STD "C++ standard" STRING C++03 C++11) add_option(THRUST_MODE "Release versus debug mode" STRING RELEASE DEBUG) if (WIN32) set(WINNT True) set(NOT_WINNT False) add_option(MSVC_VERSION "MS Visual C++ version" STRING NONE 8.0 9.0 10.0 11.0 12.0 13.0 1900) else() set(WINNT False) set(NOT_WINNT True) endif() add_option(WARN_ALL "Enable all compilation warnings" BOOL ${NOT_WINNT}) add_option(WARN_ERROR "Treat warnings as errors" BOOL ${NOT_WINNT}) IF(NOT CMAKE_BUILD_TYPE) # possible cmake bug (?) : RelWithDebInfo passes -DNDEBUG SET(CMAKE_BUILD_TYPE RelWithDebInfo CACHE STRING "Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel." FORCE) ENDIF(NOT CMAKE_BUILD_TYPE) # Helpers macro(set_thrust_flags THRUST_FLAGS_) set(${THRUST_FLAGS_} "-DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_${HOST_BACKEND}") LIST(APPEND ${THRUST_FLAGS_} "-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_${DEVICE_BACKEND}") if (THRUST_MODE STREQUAL "DEBUG") LIST(APPEND ${THRUST_FLAGS_} "-DTHRUST_DEBUG") endif() endmacro() macro(get_compiler_id COMPILER_ID_) if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU") set(${COMPILER_ID_} "GNU") elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang") set(${COMPILER_ID_} "CLANG") elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "AppleClang") set(${COMPILER_ID_} "CLANG") elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel") set(${COMPILER_ID_} "Intel") elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC") set(${COMPILER_ID_} "MSVC") elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "PGI") set(${COMPILER_ID_} "PGI") endif() endmacro() macro(find_key_value LIST_ KEY_ VALUE_) LIST(FIND ${LIST_} ${KEY_} index_) if (index_ EQUAL -1) message(FATAL_ERROR "${KEY_} is not found in ${LIST_}." ) endif() math(EXPR index_ "${index_}+1") LIST(GET ${LIST_} ${index_} ${VALUE_}) separate_arguments(${VALUE_}) endmacro() macro(set_cc_compiler_flags CC_COMPILER_FLAGS_) get_compiler_id(CXX_) set(CXX_ ${CXX_}_COMPILER_FLAGS) find_key_value(${CXX_} EXCEPTION_HANDLING flags_) LIST(APPEND ${CC_COMPILER_FLAGS_} ${flags_}) find_key_value(${CXX_} ${HOST_BACKEND} flags_) LIST(APPEND ${CC_COMPILER_FLAGS_} ${flags_}) find_key_value(${CXX_} ${DEVICE_BACKEND} flags_) LIST(APPEND ${CC_COMPILER_FLAGS_} ${flags_}) if (${WARN_ALL}) find_key_value(${CXX_} WARN_ALL flags_) LIST(APPEND ${CC_COMPILER_FLAGS_} ${flags_}) endif() if (${WARN_ERROR}) find_key_value(${CXX_} WARNINGS_AS_ERRORS flags_) LIST(APPEND ${CC_COMPILER_FLAGS_} ${flags_}) endif() find_key_value(${CXX_} ${CXX_STD} flags_) LIST(APPEND ${CC_COMPILER_FLAGS_} ${flags_}) endmacro() macro(set_nv_compiler_flags NV_COMPILER_FLAGS_) set(MACHINE_ARCH_ ${CUDA_ARCH}) # Transform sm_XX to compute_XX string(REGEX REPLACE "sm" "compute" VIRTUAL_ARCH_ ${MACHINE_ARCH_}) # Produce -gencode flags like this: -gencode=arch=compute_XX,code=\"sm_XX,compute_XX\" LIST(APPEND ${NV_COMPILER_FLAGS_} "-gencode=arch=${VIRTUAL_ARCH_},\\\"code=${MACHINE_ARCH_},${VIRTUAL_ARCH_}\\\"") if ("${THRUST_MODE}" STREQUAL "DEBUG") # turn on debug mode # XXX make this work when we've debugged nvcc -G # LIST(APPEND ${NV_COMPILER_FLAGS_} "-G") endif() if ((NOT "${DEVICE_BACKEND}" STREQUAL "CUDA") AND (NOT "${DEVICE_BACKEND}" STREQUAL "CUDA_BULK")) LIST(APPEND ${NV_COMPILER_FLAGS_} "--x=c++") endif() if (${CUDA_CDP}) # LIST(APPEND ${NV_COMPILER_FLAGS_} "-rdc=true") endif() # Untested on OSX 10.8.* if ("${CMAKE_SYSTEM_NAME}" STREQUAL "Darwin") if ("${CMAKE_SYSTEM_VERSION}" STREQUAL "10.8.") LIST(APPEND ${NV_COMPILER_FLAGS_} "-ccbin ${CMAKE_CXX_COMPILER}") endif() endif() endmacro() macro(set_linker_flags LINKER_FLAGS_) get_compiler_id(LINK_) set(LINK_ ${LINK_}_LINKER_FLAGS) find_key_value(${LINK_} ${THRUST_MODE} flags_) LIST(APPEND ${LINKER_FLAGS_} ${flags_}) find_key_value(${LINK_} WORKAROUNDS flags_) LIST(APPEND ${LINKER_FLAGS_} ${flags_}) find_key_value(${LINK_} ${HOST_BACKEND} flags_) LIST(APPEND ${LINKER_FLAGS_} ${flags_}) find_key_value(${LINK_} ${DEVICE_BACKEND} flags_) LIST(APPEND ${LINKER_FLAGS_} ${flags_}) endmacro() macro(thrust_add_executable TARGET) if ((NOT "${DEVICE_BACKEND}" STREQUAL "CUDA") AND (NOT "${DEVICE_BACKEND}" STREQUAL "CUDA_BULK")) # AND "${CMAKE_SYSTEM_NAME}" STREQUAL "Darwin") set_source_files_properties(${ARGN} PROPERTIES LANGUAGE CXX) add_executable(${TARGET} ${ARGN}) set_target_properties(${TARGET} PROPERTIES LINKER_LANGUAGE CXX) set_target_properties(${TARGET} PROPERTIES COMPILE_FLAGS "-x c++") else() cuda_add_executable(${TARGET} ${ARGN}) endif() endmacro() #macro(thrust_include_directories TARGET) # if (NOT "${DEVICE_BACKEND}" STREQUAL "CUDA") # AND "${CMAKE_SYSTEM_NAME}" STREQUAL "Darwin") # target_include_directories(${TARGET} PRIVATE ${ARGN}) # else() # cuda_include_directories(${ARGN}) # endif() #endmacro() # Find backends find_package(CUDA) find_package(OpenMP) # Set flags set_thrust_flags(THRUST_FLAGS) set_cc_compiler_flags(CC_FLAGS) set_nv_compiler_flags(NV_FLAGS) set_linker_flags(LINKER_FLAGS) # Debug output # message("THRUST_FLAGS= ${THRUST_FLAGS}") # message("CC_FLAGS= ${CC_FLAGS}") # message("NV_FLAGS= ${NV_FLAGS}") # message("LINKER_FLAGS= ${LINKER_FLAGS}") string (REPLACE ";" " " CC_FLAGS_STR "${CC_FLAGS} ${THRUST_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CC_FLAGS_STR}") set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} ${NV_FLAGS}) string (REPLACE ";" " " LINKER_FLAGS_STR "${LINKER_FLAGS}") set(CMAKE_EXEC_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LINKER_FLAGS_STR}") # Enable separable compilation when building with CUDA Dynamic Parallelism set(CUDA_SEPARABLE_COMPILATION ${CUDA_CDP}) # and find "cudadevrt" library for linking, otherwise <<<,>>> will fail to build if (${CUDA_CDP}) cuda_find_library_local_first(CUDADEVRT_LIBRARY cudadevrt "\"cudadevrt\" library") if ("${CUDADEVRT_LIBRARY}" STREQUAL "CUDADEVRT_LIBRARY-NOTFOUND") message(FATAL_ERROR "\"cudadevrt\" library not found. Consider disabling CUDA_CDP.") endif() link_libraries(${CUDADEVRT_LIBRARY}) endif() include_directories(${CMAKE_SOURCE_DIR}) cuda_include_directories(${CMAKE_SOURCE_DIR}) # Add targets # thrust target install(DIRECTORY ${CMAKE_SOURCE_DIR}/thrust/ DESTINATION thrust COMPONENT thrust) install(FILES ${CMAKE_SOURCE_DIR}/CHANGELOG DESTINATION thrust COMPONENT thrust) add_custom_target(install-thrust COMMAND "${CMAKE_COMMAND}" -DCMAKE_INSTALL_COMPONENT=thrust -P "${CMAKE_BINARY_DIR}/cmake_install.cmake" ) # add examples, testing and performance testing targets add_subdirectory(examples) add_subdirectory(testing) add_subdirectory(performance) ### make zip acrhive set(CPACK_ARCHIVE_COMPONENT_INSTALL ON) set(CPACK_GENERATOR "ZIP") set(CPACK_PACKAGE_VERSION "${Thrust_VERSION_MAJOR}.${Thrust_VERSION_MINOR}.${Thrust_VERSION_PATCH}") set(CPACK_PACKAGE_VERSION_MAJOR "${Thrust_VERSION_MAJOR}") set(CPACK_PACKAGE_VERSION_MINOR "${Thrust_VERSION_MINOR}") set(CPACK_PACKAGE_VERSION_PATCH "${Thrust_VERSION_PATCH}") set(CPACK_COMPONENTS_ALL thrust examples) set(CPACK_ZIP_USE_DISPLAY_NAME_IN_FILENAME ON) set(CPACK_PACKAGE_FILE_NAME "Thrust-${CPACK_PACKAGE_VERSION}") include(CPack) cpack_add_component(thrust DISPLAY_NAME "headers") cpack_add_component(examples DISPLAY_NAME "examples") thrust-1.9.5/LICENSE000066400000000000000000000236771344621116200141030ustar00rootroot00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS thrust-1.9.5/Makefile000066400000000000000000000145651344621116200145320ustar00rootroot00000000000000# Copyright 1993-2010 NVIDIA Corporation. All rights reserved. # # NOTICE TO USER: # # This source code is subject to NVIDIA ownership rights under U.S. and # international Copyright laws. # # This software and the information contained herein is being provided # under the terms and conditions of a Source Code License Agreement. # # NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE # CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR # IMPLIED WARRANTY OF ANY KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH # REGARD TO THIS SOURCE CODE, INCLUDING ALL IMPLIED WARRANTIES OF # MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. # IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, # OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS # OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE # OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE # OR PERFORMANCE OF THIS SOURCE CODE. # # U.S. Government End Users. This source code is a "commercial item" as # that term is defined at 48 C.F.R. 2.101 (OCT 1995), consisting of # "commercial computer software" and "commercial computer software # documentation" as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) # and is provided to the U.S. Government only as a commercial end item. # Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202-1 through # 227.7202-4 (JUNE 1995), all U.S. Government End Users acquire the # source code with only those rights set forth herein. # Makefile for building Thrust unit test driver # Force C++11 mode. NVCC will ignore it if the host compiler doesn't support it. #export CXX_STD = c++11 export VERBOSE = 1 ifndef PROFILE ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/getprofile.mk include $(VULCAN_TOOLKIT_BASE)/build/config/$(PROFILE).mk else include ../build/getprofile.mk include ../build/config/$(PROFILE).mk endif endif SOLNDIR := . ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/config/DetectOS.mk else include ../build/config/DetectOS.mk endif ifeq ($(OS),win32) export I_AM_SLOPPY := 1 endif TMP_DIR := built TMP_PREFIX := $(ROOTDIR) TMP_ARCH := $(ARCH)_$(PROFILE)_agnostic THRUST_MKDIR := $(TMP_PREFIX)/$(TMP_DIR)/$(TMP_ARCH)/thrust/mk THRUST_DIR := $(ROOTDIR)/thrust res:=$(shell $(PYTHON) ./generate_mk.py $(THRUST_MKDIR) $(THRUST_DIR)) # Use these environment variables to control what gets built: # # TEST_ALL # TEST_UNITTESTS # TEST_EXAMPLES # TEST_BENCH # TEST_OTHER ifneq ($(TEST_ALL),) override TEST_UNITTESTS := 1 override TEST_EXAMPLES := 1 override TEST_BENCH := 1 override TEST_OTHER := 1 endif ifeq ($(TEST_UNITTESTS)$(TEST_EXAMPLES)$(TEST_BENCH)$(TEST_OTHER),) override TEST_UNITTESTS := 1 override TEST_EXAMPLES := 1 override TEST_BENCH := 1 override TEST_OTHER := 1 endif ifneq ($(TEST_OTHER),) PROJECTS += internal/build/warningstester endif ifneq ($(TEST_BENCH),) PROJECTS += internal/benchmark/bench endif ifneq ($(TEST_UNITTESTS),) # copy existing projects PROJECTS_COPY := $(PROJECTS) # empty PROJECTS PROJECTS := # populate PROJECTS with unit tests. include $(THRUST_MKDIR)/testing.mk # Once PROJECTS is populated with unit tests, re-add the previous projects. PROJECTS += $(PROJECTS_COPY) endif ifneq ($(TEST_EXAMPLES),) # Copy existing projects. PROJECTS_COPY := $(PROJECTS) # Empty PROJECTS. PROJECTS := # Populate PROJECTS with examples. include $(THRUST_MKDIR)/examples.mk # Once PROJECTS is populated with examples, re-add the previous projects. PROJECTS += $(PROJECTS_COPY) endif ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/common.mk else include ../build/common.mk endif # Print host compiler version. VERSION_FLAG := ifeq ($(OS),$(filter $(OS),Linux Darwin)) ifdef USEPGCXX # PGI VERSION_FLAG := -V else ifdef USEXLC # XLC VERSION_FLAG := -qversion else # GCC, ICC or Clang AKA the sane ones. VERSION_FLAG := --version endif endif else ifeq ($(OS),win32) # MSVC # cl.exe run without any options will print its version info and exit. VERSION_FLAG := endif CCBIN_ENVIRONMENT := ifeq ($(OS), QNX) # QNX's GCC complains if QNX_HOST and QNX_TARGET aren't defined in the # environment. CCBIN_ENVIRONMENT := QNX_HOST=$(QNX_HOST) QNX_TARGET=$(QNX_TARGET) endif $(info #### CCBIN : $(CCBIN)) $(info #### CCBIN VERSION : $(shell $(CCBIN_ENVIRONMENT) $(CCBIN) $(VERSION_FLAG))) $(info #### CXX_STD : $(CXX_STD)) ifeq ($(OS), win32) CREATE_DVS_PACKAGE = $(ZIP) -r built/CUDA-thrust-package.zip bin thrust/internal/test thrust/internal/scripts thrust/internal/benchmark thrust/*.trs $(DVS_COMMON_TEST_PACKAGE_FILES) APPEND_HEADERS_DVS_PACKAGE = $(ZIP) -rg built/CUDA-thrust-package.zip thrust -9 -i *.h APPEND_INL_DVS_PACKAGE = $(ZIP) -rg built/CUDA-thrust-package.zip thrust -9 -i *.inl APPEND_CUH_DVS_PACKAGE = $(ZIP) -rg built/CUDA-thrust-package.zip thrust -9 -i *.cuh MAKE_DVS_PACKAGE = $(CREATE_DVS_PACKAGE) && $(APPEND_HEADERS_DVS_PACKAGE) && $(APPEND_INL_DVS_PACKAGE) && $(APPEND_CUH_DVS_PACKAGE) else CREATE_DVS_PACKAGE = tar -cv -f built/CUDA-thrust-package.tar bin thrust/internal/test thrust/internal/scripts thrust/internal/benchmark thrust/*.trs $(DVS_COMMON_TEST_PACKAGE_FILES) APPEND_HEADERS_DVS_PACKAGE = find thrust -name "*.h" | xargs tar rvf built/CUDA-thrust-package.tar APPEND_INL_DVS_PACKAGE = find thrust -name "*.inl" | xargs tar rvf built/CUDA-thrust-package.tar APPEND_CUH_DVS_PACKAGE = find thrust -name "*.cuh" | xargs tar rvf built/CUDA-thrust-package.tar COMPRESS_DVS_PACKAGE = bzip2 built/CUDA-thrust-package.tar MAKE_DVS_PACKAGE = $(CREATE_DVS_PACKAGE) && $(APPEND_HEADERS_DVS_PACKAGE) && $(APPEND_INL_DVS_PACKAGE) && $(APPEND_CUH_DVS_PACKAGE) && $(COMPRESS_DVS_PACKAGE) endif DVS_OPTIONS := ifneq ($(TARGET_ARCH),$(HOST_ARCH)) DVS_OPTIONS += TARGET_ARCH=$(TARGET_ARCH) endif ifeq ($(TARGET_ARCH),ARMv7) DVS_OPTIONS += ABITYPE=$(ABITYPE) endif THRUST_DVS_BUILD = release pack: cd .. && $(MAKE_DVS_PACKAGE) dvs: $(MAKE) $(DVS_OPTIONS) -s -C ../cuda $(THRUST_DVS_BUILD) $(MAKE) $(DVS_OPTIONS) $(THRUST_DVS_BUILD) THRUST_DVS=1 cd .. && $(MAKE_DVS_PACKAGE) # XXX Deprecated, remove. dvs_nightly: dvs dvs_release: $(MAKE) dvs THRUST_DVS_BUILD=release dvs_debug: $(MAKE) dvs THRUST_DVS_BUILD=debug include $(THRUST_MKDIR)/dependencies.mk thrust-1.9.5/NOTICE000066400000000000000000000027711344621116200137720ustar00rootroot00000000000000Thrust includes source code from the Boost Iterator, Tuple, System, and Random Number libraries. Boost Software License - Version 1.0 - August 17th, 2003 Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following: The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. thrust-1.9.5/README.md000066400000000000000000000043361344621116200143440ustar00rootroot00000000000000Thrust: Code at the speed of light ================================== Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's **high-level** interface greatly enhances programmer **productivity** while enabling performance portability between GPUs and multicore CPUs. **Interoperability** with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software. Develop **high-performance** applications rapidly with Thrust! Examples -------- Thrust is best explained through examples. The following source code generates random numbers serially and then transfers them to a parallel device where they are sorted. ```c++ #include #include #include #include #include #include #include int main(void) { // generate 32M random numbers serially thrust::host_vector h_vec(32 << 20); std::generate(h_vec.begin(), h_vec.end(), rand); // transfer data to the device thrust::device_vector d_vec = h_vec; // sort data on the device (846M keys per second on GeForce GTX 480) thrust::sort(d_vec.begin(), d_vec.end()); // transfer data back to host thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin()); return 0; } ``` This code sample computes the sum of 100 random numbers in parallel: ```c++ #include #include #include #include #include #include #include int main(void) { // generate random data serially thrust::host_vector h_vec(100); std::generate(h_vec.begin(), h_vec.end(), rand); // transfer to device and compute sum thrust::device_vector d_vec = h_vec; int x = thrust::reduce(d_vec.begin(), d_vec.end(), 0, thrust::plus()); return 0; } ``` Refer to the [Quick Start Guide](http://github.com/thrust/thrust/wiki/Quick-Start-Guide) page for further information and examples. Contributors ------------ The original creators of Thrust are [Jared Hoberock](http://github.com/jaredhoberock) and [Nathan Bell](http://research.nvidia.com/users/nathan-bell). thrust-1.9.5/SConscript000066400000000000000000000037261344621116200151010ustar00rootroot00000000000000import os import re Import('env') # clone the environment so as not to pollute the parent my_env = env.Clone() # divine the version number from thrust/version.h version = int(re.search('THRUST_VERSION ([0-9]+)', File('#thrust/version.h').get_contents()).group(1)) major = int(version / 100000) minor = int(version / 100) % 1000 subminor = version % 100 # create the Thrust zip for item in my_env.RecursiveGlob('*', '#thrust'): my_env.InstallAs(os.path.join('thrust', Dir('#thrust').rel_path(item)), item) # grab the CHANGELOG as well my_env.Install('thrust', '#CHANGELOG') # make sure to change directory into the variant dir to ensure the paths are correct in the zipfile # note Zip uses the special site_scons/site_tools/zip.py to WAR an issue with the chdir parameter thrust_zipfile = my_env.Zip('thrust-{0}.{1}.{2}.zip'.format(major,minor,subminor), 'thrust', chdir = 1) my_env.Alias('dist', thrust_zipfile) # create the examples zip # do not recurse into the 'targets' directory, should it exist for item in my_env.RecursiveGlob('*', '#examples', 'targets'): # avoid included SCons-related files in the distribution # XXX would be nice if we could ignore all dotfiles and anything in .gitignore if item.get_path(item.get_dir()) not in ['SConscript','.sconsign.dblite']: my_env.InstallAs(os.path.join('examples', Dir('#examples').rel_path(item)), item) # make sure to change directory into the variant dir to ensure the paths are correct in the zipfile # note Zip uses the special site_scons/site_tools/zip.py to WAR an issue with the chdir parameter examples_zipfile = my_env.Zip('examples-{0}.{1}.zip'.format(major,minor), 'examples', chdir = 1) my_env.Alias('dist', examples_zipfile) # generate documentation # note that thrust.dox instructs doxygen to output to the targets directory public_headers = my_env.RecursiveGlob('*.h', '#thrust', exclude='detail') thrust_docs = my_env.Command('doc/html', public_headers, 'doxygen doc/thrust.dox') my_env.Alias('doc', thrust_docs) thrust-1.9.5/SConstruct000066400000000000000000000402511344621116200151130ustar00rootroot00000000000000"""Exports a SCons construction environment 'env' with configuration common to all build projects""" EnsureSConsVersion(1,2) import os import platform import glob import itertools import subprocess def RecursiveGlob(env, pattern, directory = Dir('.'), exclude = '\B'): """Recursively globs a directory and its children, returning a list of sources. Allows exclusion of directories given a regular expression. """ directory = Dir(directory) result = directory.glob(pattern) for n in directory.glob('*'): # only recurse into directories which aren't in the blacklist import re if isinstance(n,type(directory)) and not re.match(exclude, directory.rel_path(n)): result.extend(RecursiveGlob(env, pattern, n, exclude)) return result # map features to the list of compiler switches implementing them gnu_compiler_flags = { 'warn_all' : ['-Wextra', '-Wall'], 'warnings_as_errors' : ['-Werror'], 'release' : ['-O2'], 'debug' : ['-g'], 'exception_handling' : [], 'cpp' : [], 'omp' : ['-fopenmp'], 'tbb' : [], 'cuda' : [], 'workarounds' : [], 'c++03' : [], 'c++11' : ['-std=c++11'] } clang_compiler_flags = { 'warn_all' : ['-Wextra', '-Wall'], 'warnings_as_errors' : ['-Werror'], 'release' : ['-O2'], 'debug' : ['-g'], 'exception_handling' : [], 'cpp' : [], 'omp' : ['-fopenmp'], 'tbb' : [], 'cuda' : [], 'workarounds' : [], 'c++03' : [], 'c++11' : ['-std=c++11'] } msvc_compiler_flags = { 'warn_all' : ['/Wall'], 'warnings_as_errors' : ['/WX'], 'release' : ['/Ox'], 'debug' : ['/Zi', '-D_DEBUG', '/MTd'], 'exception_handling' : ['/EHsc'], 'cpp' : [], 'omp' : ['/openmp'], 'tbb' : [], 'cuda' : [], # avoid min/max problems due to windows.h # suppress warnings due to "decorated name length exceeded" 'workarounds' : ['/DNOMINMAX', '/wd4503'], 'c++03' : [], 'c++11' : [] } compiler_to_flags = { 'g++' : gnu_compiler_flags, 'cl' : msvc_compiler_flags, 'clang++' : clang_compiler_flags } gnu_linker_flags = { 'debug' : [], 'release' : [], 'workarounds' : [] } nv_linker_flags = gnu_linker_flags clang_linker_flags = { 'debug' : [], 'release' : [], 'workarounds' : ['-stdlib=libstdc++'] } msvc_linker_flags = { 'debug' : ['/debug'], 'release' : [], 'workarounds' : ['/nologo'] } linker_to_flags = { 'gcc' : gnu_linker_flags, 'link' : msvc_linker_flags, 'nvcc' : nv_linker_flags, 'clang++' : clang_linker_flags } def cuda_installation(env): """Returns the details of CUDA's installation returns (bin_path,lib_path,inc_path,library_name) """ cuda_path = env['cuda_path'] bin_path = cuda_path + '/bin' lib_path = cuda_path + '/lib' inc_path = cuda_path + '/include' # fix up the name of the lib directory on 64b platforms if platform.machine()[-2:] == '64': if os.name == 'posix' and platform.system() != 'Darwin': lib_path += '64' elif os.name == 'nt': lib_path += '/x64' # override with environment variables if 'CUDA_BIN_PATH' in os.environ: bin_path = os.path.abspath(os.environ['CUDA_BIN_PATH']) if 'CUDA_LIB_PATH' in os.environ: lib_path = os.path.abspath(os.environ['CUDA_LIB_PATH']) if 'CUDA_INC_PATH' in os.environ: inc_path = os.path.abspath(os.environ['CUDA_INC_PATH']) return (bin_path,lib_path,inc_path,'cudart',cuda_path) def omp_installation(CXX): """Returns the details of OpenMP's installation returns (bin_path,lib_path,inc_path,library_name) """ bin_path = '' lib_path = '' inc_path = '' # the name of the library is compiler-dependent library_name = '' if CXX == 'g++': library_name = 'gomp' elif CXX == 'cl': library_name = 'VCOMP' elif CXX == 'clang++': raise NotImplementedError, "OpenMP not supported together with clang" else: raise ValueError, "Unknown compiler. What is the name of the OpenMP library?" return (bin_path,lib_path,inc_path,library_name) def tbb_installation(env): """Returns the details of TBB's installation returns (bin_path,lib_path,inc_path,library_name) """ # determine defaults if os.name == 'nt': try: # we assume that TBBROOT exists in the environment root = env['ENV']['TBBROOT'] # choose bitness bitness = 'ia32' if platform.machine()[-2:] == '64': bitness = 'intel64' # choose msvc version msvc_version = 'vc' + str(int(float(env['MSVC_VERSION']))) # assemble paths bin_path = os.path.join(root, 'bin', bitness, msvc_version) lib_path = os.path.join(root, 'lib', bitness, msvc_version) inc_path = os.path.join(root, 'include') except: raise ValueError, 'Where is TBB installed?' else: bin_path = '' lib_path = '' inc_path = '' return (bin_path,lib_path,inc_path,'tbb') def inc_paths(env, host_backend, device_backend): """Returns a list of include paths needed by the compiler""" result = [] thrust_inc_path = Dir('.') # note that the thrust path comes before the cuda path, which # may itself contain a different version of thrust result.append(thrust_inc_path) if host_backend == 'cuda' or device_backend == 'cuda': cuda_inc_path = cuda_installation(env)[2] result.append(cuda_inc_path) if host_backend == 'tbb' or device_backend == 'tbb': tbb_inc_path = tbb_installation(env)[2] result.append(tbb_inc_path) return result def lib_paths(env, host_backend, device_backend): """Returns a list of lib paths needed by the linker""" result = [] if host_backend == 'cuda' or device_backend == 'cuda': cuda_lib_path = cuda_installation(env)[1] result.append(cuda_lib_path) if host_backend == 'tbb' or device_backend == 'tbb': tbb_lib_path = tbb_installation(env)[1] result.append(tbb_lib_path) return result def libs(env, CCX, host_backend, device_backend): """Returns a list of libraries to link against""" result = [] # when compiling with g++, link against the standard library # we don't have to do this with cl if CCX == 'g++': result.append('stdc++') result.append('m') # link against backend-specific runtimes if host_backend == 'cuda' or device_backend == 'cuda': result.append(cuda_installation(env)[3]) # XXX clean this up if env['cdp']: result.append('cudadevrt') if host_backend == 'omp' or device_backend == 'omp': result.append(omp_installation(CCX)[3]) if host_backend == 'tbb' or device_backend == 'tbb': result.append(tbb_installation(env)[3]) return result def linker_flags(LINK, mode, platform, device_backend, arch): """Returns a list of command line flags needed by the linker""" result = [] flags = linker_to_flags[LINK] # debug/release result.extend(flags[mode]) # unconditional workarounds result.extend(flags['workarounds']) return result def macros(mode, host_backend, device_backend): """Returns a list of preprocessor macros needed by the compiler""" result = [] # backend defines result.append('-DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_' + host_backend.upper()) result.append('-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_' + device_backend.upper()) if mode == 'debug': # turn on thrust debug mode result.append('-DTHRUST_DEBUG') return result def cc_compiler_flags(CXX, mode, platform, host_backend, device_backend, warn_all, warnings_as_errors, cpp_standard): """Returns a list of command line flags needed by the c or c++ compiler""" # start with all platform-independent preprocessor macros result = macros(mode, host_backend, device_backend) flags = compiler_to_flags[CXX] # continue with unconditional flags # exception handling result.extend(flags['exception_handling']) # finish with conditional flags # debug/release result.extend(flags[mode]) # enable host_backend code generation result.extend(flags[host_backend]) # enable device_backend code generation result.extend(flags[device_backend]) # Wall if warn_all: result.extend(flags['warn_all']) # Werror if warnings_as_errors: result.extend(flags['warnings_as_errors']) # workarounds result.extend(flags['workarounds']) # c++ standard result.extend(flags[cpp_standard]) return result def nv_compiler_flags(mode, device_backend, arch, cdp): """Returns a list of command line flags specific to nvcc""" result = [] for machine_arch in arch: # transform arch_XX to compute_XX virtual_arch = machine_arch.replace('sm','compute') # the weird -gencode flag is formatted like this: # -gencode=arch=compute_10,code=\"sm_20,compute_20\" result.append('-gencode=arch={0},\\"code={1},{2}\\"'.format(virtual_arch, machine_arch, virtual_arch)) if mode == 'debug': # turn on debug mode # XXX make this work when we've debugged nvcc -G #result.append('-G') pass if device_backend != 'cuda': result.append("--x=c++") if cdp != False: result.append("-rdc=true") if device_backend == 'cuda' and master_env['PLATFORM'] == 'darwin': (release, versioninfo, machine) = platform.mac_ver() if(release[0:5] == '10.8.'): result.append('-ccbin') result.append(master_env.subst('$CXX')) return result def clang_compiler_flags(mode, arch): """Returns a list of command line flags specific to clang""" result = [] for machine_arch in arch: result.append('--cuda-gpu-arch={0}'.format(machine_arch)) return result def command_line_variables(): # allow the user discretion to select the MSVC version vars = Variables() if os.name == 'nt': vars.Add(EnumVariable('MSVC_VERSION', 'MS Visual C++ version', None, allowed_values=('8.0', '9.0', '10.0', '11.0', '12.0', '13.0'))) # add a variable to handle the host backend vars.Add(ListVariable('host_backend', 'The host backend to target', 'cpp', ['cpp', 'omp', 'tbb'])) # add a variable to handle the device backend vars.Add(ListVariable('device_backend', 'The parallel device backend to target', 'cuda', ['cuda', 'omp', 'tbb', 'cpp'])) # add a variable to handle release/debug mode vars.Add(EnumVariable('mode', 'Release versus debug mode', 'release', allowed_values = ('release', 'debug'))) # allow the option to send sm_1x to nvcc even though nvcc may not support it vars.Add(ListVariable('arch', 'Compute capability code generation', 'sm_30', ['sm_30', 'sm_32', 'sm_35', 'sm_37', 'sm_50', 'sm_52', 'sm_60', 'sm_61'])) # add a variable to handle CUDA dynamic parallelism vars.Add(BoolVariable('cdp', 'Enable CUDA dynamic parallelism', False)) # add a variable to handle warnings # only enable Wall by default on compilers other than cl vars.Add(BoolVariable('Wall', 'Enable all compilation warnings', os.name != 'nt')) # add a variable to treat warnings as errors vars.Add(BoolVariable('Werror', 'Treat warnings as errors', os.name != 'nt')) # add a variable to switch between C++ standards vars.Add(EnumVariable('std', 'C++ standard', 'c++03', allowed_values = ('c++03', 'c++11'))) # add a variable to select C++ standard vars.Add(EnumVariable('std', 'C++ standard', 'c++03', allowed_values = ('c++03', 'c++11'))) vars.Add(EnumVariable('cuda_compiler', 'CUDA compiler', 'nvcc', allowed_values = ('nvcc', 'clang'))) # determine defaults if 'CUDA_PATH' in os.environ: default_cuda_path = os.path.abspath(os.environ['CUDA_PATH']) elif os.name == 'nt': default_cuda_path = 'C:/CUDA' elif os.name == 'posix': default_cuda_path = '/usr/local/cuda' else: raise ValueError, 'Error: unknown OS. Where is nvcc installed?' vars.Add(PathVariable('cuda_path', 'CUDA installation path', default_cuda_path)) return vars # create a master Environment vars = command_line_variables() master_env = Environment(variables = vars, tools = ['default', 'zip']) Tool(master_env['cuda_compiler'])(master_env) # XXX it might be a better idea to harvest help text from subsidiary # SConscripts and only add their help text if one of their targets # is scheduled to be built Help(vars.GenerateHelpText(master_env)) # enable RecursiveGlob master_env.AddMethod(RecursiveGlob) # add CUDA's lib dir to LD_LIBRARY_PATH so that we can execute commands # which depend on shared libraries (e.g., cudart) # we don't need to do this on windows if master_env['PLATFORM'] == 'posix': master_env['ENV'].setdefault('LD_LIBRARY_PATH', []).append(cuda_installation(master_env)[1]) elif master_env['PLATFORM'] == 'darwin': master_env['ENV'].setdefault('DYLD_LIBRARY_PATH', []).append(cuda_installation(master_env)[1]) # Check if g++ really is g++ if(master_env.subst('$CXX') == 'g++'): output = subprocess.check_output(['g++','--version']) if(output.find('clang') != -1): # It's actually clang master_env.Replace(CXX = 'clang++') if(master_env.subst('$CC') == 'gcc'): output = subprocess.check_output(['gcc','--version']) if(output.find('clang') != -1): # It's actually clang master_env.Replace(CC = 'clang') if(master_env.subst('$LINK') == 'clang'): master_env.Replace(CC = 'clang++') elif master_env['PLATFORM'] == 'win32': master_env['ENV']['TBBROOT'] = os.environ['TBBROOT'] master_env['ENV']['PATH'] += ';' + tbb_installation(master_env)[0] # if the environment variable NVVMIR_LIBRARY_DIR is set, provide it to nvcc to prevent the following error: # "nvcc fatal : Path to libdevice library not specified" if 'NVVMIR_LIBRARY_DIR' in os.environ: master_env['ENV']['NVVMIR_LIBRARY_DIR'] = os.environ['NVVMIR_LIBRARY_DIR'] # get the list of requested backends host_backends = master_env.subst('$host_backend').split() device_backends = master_env.subst('$device_backend').split() for (host,device) in itertools.product(host_backends, device_backends): # clone the master environment for this config env = master_env.Clone() # populate the environment env.Append(CPPPATH = inc_paths(env, host, device)) env.Append(CCFLAGS = cc_compiler_flags(env.subst('$CXX'), env['mode'], env['PLATFORM'], host, device, env['Wall'], env['Werror'], env['std'])) env.Append(NVCCFLAGS = nv_compiler_flags(env['mode'], device, env['arch'], env['cdp'])) env.Append(CLANGFLAGS = clang_compiler_flags(env['mode'], env['arch'])) env.Append(LIBS = libs(env, env.subst('$CXX'), host, device)) # XXX this probably doesn't belong here # XXX ideally we'd integrate this into site_scons if 'cudadevrt' in env['LIBS']: # nvcc is required to link against cudadevrt env.Replace(LINK = 'nvcc') if os.name == 'nt': # the nv linker uses the same command line as the gnu linker env['LIBDIRPREFIX'] = '-L' env['LIBLINKPREFIX'] = '-l' env['LIBLINKSUFFIX'] = '' env.Replace(LINKCOM = '$LINK -o $TARGET $LINKFLAGS $__RPATH $SOURCES $_LIBDIRFLAGS $_LIBFLAGS') # we Replace instead of Append, to avoid picking-up MSVC-specific flags on Windows env.Replace(LINKFLAGS = linker_flags(env.subst('$LINK'), env['mode'], env['PLATFORM'], device, env['arch'])) env.Append(LIBPATH = lib_paths(env, host, device), RPATH = lib_paths(env, host, device)) # assemble the name of this configuration's targets directory targets_dir = 'targets/{0}_host_{1}_device_{2}_{3}'.format(host, device, env['mode'], env['cuda_compiler']) # allow subsidiary SConscripts to peek at the backends env['host_backend'] = host env['device_backend'] = device # invoke each SConscript with a variant directory env.SConscript('examples/SConscript', exports='env', variant_dir = 'examples/' + targets_dir, duplicate = 0) env.SConscript('testing/SConscript', exports='env', variant_dir = 'testing/' + targets_dir, duplicate = 0) env.SConscript('performance/SConscript', exports='env', variant_dir = 'performance/' + targets_dir, duplicate = 0) env = master_env master_env.SConscript('SConscript', exports='env', variant_dir = 'targets', duplicate = False) thrust-1.9.5/THANKS000066400000000000000000000020201344621116200137640ustar00rootroot00000000000000Thrust is an open source library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). The primary developers of Thrust are Jared Hoberock [1] and Nathan Bell [2] of NVIDIA Research. We wish to thank the following people, who have made important intellectual and/or software contributions to the project: * Andrew Corrigan * David Tarjan * Duane Merrill * Erich Elsen * Gregory Diamos * Manjunath Kudlur * Mark Harris * Michael Garland * Nadathur Satish * Nathan Whitehead * Ryuta Suzuki * Shubho Sengupta * Thomas Bradley We also thank the compiler group at NVIDIA for their continued improvements to nvcc. In particular, we appreciate the work Bastiaan Aarts has done to enhance nvcc's C++ support. Lastly, Thrust has greatly benefited from the design and implementation of the Boost Iterator, Tuple, System, Phoenix, and Random Number libraries [3]. [1] http://research.nvidia.com/users/jared-hoberock [2] http://research.nvidia.com/users/nathan-bell [3] http://www.boost.org/ thrust-1.9.5/doc/000077500000000000000000000000001344621116200136245ustar00rootroot00000000000000thrust-1.9.5/doc/thrust.dox000066400000000000000000001254221344621116200156770ustar00rootroot00000000000000# Doxyfile 1.3.4 # This file describes the settings to be used by the documentation system # doxygen (www.doxygen.org) for a project # # All text after a hash (#) is considered a comment and will be ignored # The format is: # TAG = value [value, ...] # For lists items can also be appended using: # TAG += value [value, ...] # Values that contain spaces should be placed between quotes (" ") #--------------------------------------------------------------------------- # Project related configuration options #--------------------------------------------------------------------------- # The PROJECT_NAME tag is a single word (or a sequence of words surrounded # by quotes) that should identify the project. PROJECT_NAME = thrust # The PROJECT_NUMBER tag can be used to enter a project or revision number. # This could be handy for archiving the generated documentation or # if some version control system is used. PROJECT_NUMBER = # The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) # base path where the generated documentation will be put. # If a relative path is entered, it will be relative to the location # where doxygen was started. If left blank the current directory will be used. OUTPUT_DIRECTORY = targets/doc # The OUTPUT_LANGUAGE tag is used to specify the language in which all # documentation generated by doxygen is written. Doxygen will use this # information to generate all constant output in the proper language. # The default language is English, other supported languages are: # Brazilian, Catalan, Chinese, Chinese-Traditional, Croatian, Czech, Danish, Dutch, # Finnish, French, German, Greek, Hungarian, Italian, Japanese, Japanese-en # (Japanese with English messages), Korean, Norwegian, Polish, Portuguese, # Romanian, Russian, Serbian, Slovak, Slovene, Spanish, Swedish, and Ukrainian. OUTPUT_LANGUAGE = English # This tag can be used to specify the encoding used in the generated output. # The encoding is not always determined by the language that is chosen, # but also whether or not the output is meant for Windows or non-Windows users. # In case there is a difference, setting the USE_WINDOWS_ENCODING tag to YES # forces the Windows encoding (this is the default for the Windows binary), # whereas setting the tag to NO uses a Unix-style encoding (the default for # all platforms other than Windows). USE_WINDOWS_ENCODING = NO # If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will # include brief member descriptions after the members that are listed in # the file and class documentation (similar to JavaDoc). # Set to NO to disable this. BRIEF_MEMBER_DESC = YES # If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend # the brief description of a member or function before the detailed description. # Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the # brief descriptions will be completely suppressed. REPEAT_BRIEF = YES # If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then # Doxygen will generate a detailed section even if there is only a brief # description. ALWAYS_DETAILED_SEC = NO # If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all inherited # members of a class in the documentation of that class as if those members were # ordinary class members. Constructors, destructors and assignment operators of # the base classes will not be shown. INLINE_INHERITED_MEMB = NO # If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full # path before files name in the file list and in the header files. If set # to NO the shortest path that makes the file name unique will be used. FULL_PATH_NAMES = YES # If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag # can be used to strip a user-defined part of the path. Stripping is # only done if one of the specified strings matches the left-hand part of # the path. It is allowed to use relative paths in the argument list. STRIP_FROM_PATH = # If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter # (but less readable) file names. This can be useful is your file systems # doesn't support long names like on DOS, Mac, or CD-ROM. SHORT_NAMES = NO # If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen # will interpret the first line (until the first dot) of a JavaDoc-style # comment as the brief description. If set to NO, the JavaDoc # comments will behave just like the Qt-style comments (thus requiring an # explict @brief command for a brief description. JAVADOC_AUTOBRIEF = NO # The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen # treat a multi-line C++ special comment block (i.e. a block of //! or /// # comments) as a brief description. This used to be the default behaviour. # The new default is to treat a multi-line C++ comment block as a detailed # description. Set this tag to YES if you prefer the old behaviour instead. MULTILINE_CPP_IS_BRIEF = NO # If the DETAILS_AT_TOP tag is set to YES then Doxygen # will output the detailed description near the top, like JavaDoc. # If set to NO, the detailed description appears after the member # documentation. DETAILS_AT_TOP = NO # If the INHERIT_DOCS tag is set to YES (the default) then an undocumented # member inherits the documentation from any documented member that it # reimplements. INHERIT_DOCS = YES # If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC # tag is set to YES, then doxygen will reuse the documentation of the first # member in the group (if any) for the other members of the group. By default # all members of a group must be documented explicitly. DISTRIBUTE_GROUP_DOC = NO # The TAB_SIZE tag can be used to set the number of spaces in a tab. # Doxygen uses this value to replace tabs by spaces in code fragments. TAB_SIZE = 8 # This tag can be used to specify a number of aliases that acts # as commands in the documentation. An alias has the form "name=value". # For example adding "sideeffect=\par Side Effects:\n" will allow you to # put the command \sideeffect (or @sideeffect) in the documentation, which # will result in a user-defined paragraph with heading "Side Effects:". # You can put \n's in the value part of an alias to insert newlines. ALIASES = # Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources # only. Doxygen will then generate output that is more tailored for C. # For instance, some of the names that are used will be different. The list # of all members will be omitted, etc. OPTIMIZE_OUTPUT_FOR_C = NO # Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java sources # only. Doxygen will then generate output that is more tailored for Java. # For instance, namespaces will be presented as packages, qualified scopes # will look different, etc. OPTIMIZE_OUTPUT_JAVA = NO # Set the SUBGROUPING tag to YES (the default) to allow class member groups of # the same type (for instance a group of public functions) to be put as a # subgroup of that type (e.g. under the Public Functions section). Set it to # NO to prevent subgrouping. Alternatively, this can be done per class using # the \nosubgrouping command. SUBGROUPING = YES #--------------------------------------------------------------------------- # Build related configuration options #--------------------------------------------------------------------------- # If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in # documentation are documented, even if no documentation was available. # Private class members and static file members will be hidden unless # the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES EXTRACT_ALL = NO # If the EXTRACT_PRIVATE tag is set to YES all private members of a class # will be included in the documentation. EXTRACT_PRIVATE = NO # If the EXTRACT_STATIC tag is set to YES all static members of a file # will be included in the documentation. EXTRACT_STATIC = YES # If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) # defined locally in source files will be included in the documentation. # If set to NO only classes defined in header files are included. EXTRACT_LOCAL_CLASSES = YES # If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all # undocumented members of documented classes, files or namespaces. # If set to NO (the default) these members will be included in the # various overviews, but no documentation section is generated. # This option has no effect if EXTRACT_ALL is enabled. HIDE_UNDOC_MEMBERS = NO # If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all # undocumented classes that are normally visible in the class hierarchy. # If set to NO (the default) these classes will be included in the various # overviews. This option has no effect if EXTRACT_ALL is enabled. HIDE_UNDOC_CLASSES = NO # If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all # friend (class|struct|union) declarations. # If set to NO (the default) these declarations will be included in the # documentation. HIDE_FRIEND_COMPOUNDS = NO # If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any # documentation blocks found inside the body of a function. # If set to NO (the default) these blocks will be appended to the # function's detailed documentation block. HIDE_IN_BODY_DOCS = NO # The INTERNAL_DOCS tag determines if documentation # that is typed after a \internal command is included. If the tag is set # to NO (the default) then the documentation will be excluded. # Set it to YES to include the internal documentation. INTERNAL_DOCS = NO # If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate # file names in lower-case letters. If set to YES upper-case letters are also # allowed. This is useful if you have classes or files whose names only differ # in case and if your file system supports case sensitive file names. Windows # users are advised to set this option to NO. CASE_SENSE_NAMES = YES # If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen # will show members with their full class and namespace scopes in the # documentation. If set to YES the scope will be hidden. HIDE_SCOPE_NAMES = NO # If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen # will put a list of the files that are included by a file in the documentation # of that file. SHOW_INCLUDE_FILES = YES # If the INLINE_INFO tag is set to YES (the default) then a tag [inline] # is inserted in the documentation for inline members. INLINE_INFO = YES # If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen # will sort the (detailed) documentation of file and class members # alphabetically by member name. If set to NO the members will appear in # declaration order. SORT_MEMBER_DOCS = YES # The GENERATE_TODOLIST tag can be used to enable (YES) or # disable (NO) the todo list. This list is created by putting \todo # commands in the documentation. GENERATE_TODOLIST = YES # The GENERATE_TESTLIST tag can be used to enable (YES) or # disable (NO) the test list. This list is created by putting \test # commands in the documentation. GENERATE_TESTLIST = YES # The GENERATE_BUGLIST tag can be used to enable (YES) or # disable (NO) the bug list. This list is created by putting \bug # commands in the documentation. GENERATE_BUGLIST = YES # The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or # disable (NO) the deprecated list. This list is created by putting # \deprecated commands in the documentation. GENERATE_DEPRECATEDLIST= YES # The ENABLED_SECTIONS tag can be used to enable conditional # documentation sections, marked by \if sectionname ... \endif. ENABLED_SECTIONS = # The MAX_INITIALIZER_LINES tag determines the maximum number of lines # the initial value of a variable or define consists of for it to appear in # the documentation. If the initializer consists of more lines than specified # here it will be hidden. Use a value of 0 to hide initializers completely. # The appearance of the initializer of individual variables and defines in the # documentation can be controlled using \showinitializer or \hideinitializer # command in the documentation regardless of this setting. MAX_INITIALIZER_LINES = 30 # Set the SHOW_USED_FILES tag to NO to disable the list of files generated # at the bottom of the documentation of classes and structs. If set to YES the # list will mention the files that were used to generate the documentation. SHOW_USED_FILES = YES #--------------------------------------------------------------------------- # configuration options related to warning and progress messages #--------------------------------------------------------------------------- # The QUIET tag can be used to turn on/off the messages that are generated # by doxygen. Possible values are YES and NO. If left blank NO is used. QUIET = NO # The WARNINGS tag can be used to turn on/off the warning messages that are # generated by doxygen. Possible values are YES and NO. If left blank # NO is used. WARNINGS = YES # If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings # for undocumented members. If EXTRACT_ALL is set to YES then this flag will # automatically be disabled. WARN_IF_UNDOCUMENTED = YES # If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for # potential errors in the documentation, such as not documenting some # parameters in a documented function, or documenting parameters that # don't exist or using markup commands wrongly. WARN_IF_DOC_ERROR = YES # The WARN_FORMAT tag determines the format of the warning messages that # doxygen can produce. The string should contain the $file, $line, and $text # tags, which will be replaced by the file and line number from which the # warning originated and the warning text. WARN_FORMAT = "$file:$line: $text" # The WARN_LOGFILE tag can be used to specify a file to which warning # and error messages should be written. If left blank the output is written # to stderr. WARN_LOGFILE = #--------------------------------------------------------------------------- # configuration options related to the input files #--------------------------------------------------------------------------- # The INPUT tag can be used to specify the files and/or directories that contain # documented source files. You may enter file names like "myfile.cpp" or # directories like "/usr/src/myproject". Separate the files or directories # with spaces. INPUT = thrust examples # If the value of the INPUT tag contains directories, you can use the # FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp # and *.h) to filter out the source-files in the directories. If left # blank the following patterns are tested: # *.c *.cc *.cxx *.cpp *.c++ *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh *.hxx *.hpp # *.h++ *.idl *.odl *.cs *.php *.php3 *.inc FILE_PATTERNS = # The RECURSIVE tag can be used to turn specify whether or not subdirectories # should be searched for input files as well. Possible values are YES and NO. # If left blank NO is used. RECURSIVE = YES # The EXCLUDE tag can be used to specify files and/or directories that should # excluded from the INPUT source files. This way you can easily exclude a # subdirectory from a directory tree whose root is specified with the INPUT tag. EXCLUDE = examples # The EXCLUDE_SYMLINKS tag can be used select whether or not files or directories # that are symbolic links (a Unix filesystem feature) are excluded from the input. EXCLUDE_SYMLINKS = NO # If the value of the INPUT tag contains directories, you can use the # EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude # certain files from those directories. EXCLUDE_PATTERNS = */detail/* # The EXAMPLE_PATH tag can be used to specify one or more files or # directories that contain example code fragments that are included (see # the \include command). EXAMPLE_PATH = examples # If the value of the EXAMPLE_PATH tag contains directories, you can use the # EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp # and *.h) to filter out the source-files in the directories. If left # blank all files are included. EXAMPLE_PATTERNS = # If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be # searched for input files to be used with the \include or \dontinclude # commands irrespective of the value of the RECURSIVE tag. # Possible values are YES and NO. If left blank NO is used. EXAMPLE_RECURSIVE = NO # The IMAGE_PATH tag can be used to specify one or more files or # directories that contain image that are included in the documentation (see # the \image command). IMAGE_PATH = # The INPUT_FILTER tag can be used to specify a program that doxygen should # invoke to filter for each input file. Doxygen will invoke the filter program # by executing (via popen()) the command , where # is the value of the INPUT_FILTER tag, and is the name of an # input file. Doxygen will then use the output that the filter program writes # to standard output. INPUT_FILTER = # If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using # INPUT_FILTER) will be used to filter the input files when producing source # files to browse (i.e. when SOURCE_BROWSER is set to YES). FILTER_SOURCE_FILES = NO #--------------------------------------------------------------------------- # configuration options related to source browsing #--------------------------------------------------------------------------- # If the SOURCE_BROWSER tag is set to YES then a list of source files will # be generated. Documented entities will be cross-referenced with these sources. SOURCE_BROWSER = NO # Setting the INLINE_SOURCES tag to YES will include the body # of functions and classes directly in the documentation. INLINE_SOURCES = NO # Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct # doxygen to hide any special comment blocks from generated source code # fragments. Normal C and C++ comments will always remain visible. STRIP_CODE_COMMENTS = YES # If the REFERENCED_BY_RELATION tag is set to YES (the default) # then for each documented function all documented # functions referencing it will be listed. REFERENCED_BY_RELATION = YES # If the REFERENCES_RELATION tag is set to YES (the default) # then for each documented function all documented entities # called/used by that function will be listed. REFERENCES_RELATION = YES # If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen # will generate a verbatim copy of the header file for each class for # which an include is specified. Set to NO to disable this. VERBATIM_HEADERS = YES #--------------------------------------------------------------------------- # configuration options related to the alphabetical class index #--------------------------------------------------------------------------- # If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index # of all compounds will be generated. Enable this if the project # contains a lot of classes, structs, unions or interfaces. ALPHABETICAL_INDEX = NO # If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then # the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns # in which this list will be split (can be a number in the range [1..20]) COLS_IN_ALPHA_INDEX = 5 # In case all classes in a project start with a common prefix, all # classes will be put under the same header in the alphabetical index. # The IGNORE_PREFIX tag can be used to specify one or more prefixes that # should be ignored while generating the index headers. IGNORE_PREFIX = #--------------------------------------------------------------------------- # configuration options related to the HTML output #--------------------------------------------------------------------------- # If the GENERATE_HTML tag is set to YES (the default) Doxygen will # generate HTML output. GENERATE_HTML = YES # The HTML_OUTPUT tag is used to specify where the HTML docs will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `html' will be used as the default path. HTML_OUTPUT = html # The HTML_FILE_EXTENSION tag can be used to specify the file extension for # each generated HTML page (for example: .htm,.php,.asp). If it is left blank # doxygen will generate files with .html extension. HTML_FILE_EXTENSION = .html # The HTML_HEADER tag can be used to specify a personal HTML header for # each generated HTML page. If it is left blank doxygen will generate a # standard header. HTML_HEADER = # The HTML_FOOTER tag can be used to specify a personal HTML footer for # each generated HTML page. If it is left blank doxygen will generate a # standard footer. HTML_FOOTER = # The HTML_STYLESHEET tag can be used to specify a user-defined cascading # style sheet that is used by each HTML page. It can be used to # fine-tune the look of the HTML output. If the tag is left blank doxygen # will generate a default style sheet HTML_STYLESHEET = # If the HTML_ALIGN_MEMBERS tag is set to YES, the members of classes, # files or namespaces will be aligned in HTML using tables. If set to # NO a bullet list will be used. HTML_ALIGN_MEMBERS = YES # If the GENERATE_HTMLHELP tag is set to YES, additional index files # will be generated that can be used as input for tools like the # Microsoft HTML help workshop to generate a compressed HTML help file (.chm) # of the generated HTML documentation. GENERATE_HTMLHELP = NO # If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can # be used to specify the file name of the resulting .chm file. You # can add a path in front of the file if the result should not be # written to the html output dir. CHM_FILE = # If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can # be used to specify the location (absolute path including file name) of # the HTML help compiler (hhc.exe). If non-empty doxygen will try to run # the HTML help compiler on the generated index.hhp. HHC_LOCATION = # If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag # controls if a separate .chi index file is generated (YES) or that # it should be included in the master .chm file (NO). GENERATE_CHI = NO # If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag # controls whether a binary table of contents is generated (YES) or a # normal table of contents (NO) in the .chm file. BINARY_TOC = NO # The TOC_EXPAND flag can be set to YES to add extra items for group members # to the contents of the HTML help documentation and to the tree view. TOC_EXPAND = NO # The DISABLE_INDEX tag can be used to turn on/off the condensed index at # top of each HTML page. The value NO (the default) enables the index and # the value YES disables it. DISABLE_INDEX = NO # This tag can be used to set the number of enum values (range [1..20]) # that doxygen will group on one line in the generated HTML documentation. ENUM_VALUES_PER_LINE = 4 # If the GENERATE_TREEVIEW tag is set to YES, a side panel will be # generated containing a tree-like index structure (just like the one that # is generated for HTML Help). For this to work a browser that supports # JavaScript, DHTML, CSS and frames is required (for instance Mozilla 1.0+, # Netscape 6.0+, Internet explorer 5.0+, or Konqueror). Windows users are # probably better off using the HTML help feature. GENERATE_TREEVIEW = NO # If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be # used to set the initial width (in pixels) of the frame in which the tree # is shown. TREEVIEW_WIDTH = 250 #--------------------------------------------------------------------------- # configuration options related to the LaTeX output #--------------------------------------------------------------------------- # If the GENERATE_LATEX tag is set to YES (the default) Doxygen will # generate Latex output. GENERATE_LATEX = NO # The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `latex' will be used as the default path. LATEX_OUTPUT = latex # The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be # invoked. If left blank `latex' will be used as the default command name. LATEX_CMD_NAME = latex # The MAKEINDEX_CMD_NAME tag can be used to specify the command name to # generate index for LaTeX. If left blank `makeindex' will be used as the # default command name. MAKEINDEX_CMD_NAME = makeindex # If the COMPACT_LATEX tag is set to YES Doxygen generates more compact # LaTeX documents. This may be useful for small projects and may help to # save some trees in general. COMPACT_LATEX = NO # The PAPER_TYPE tag can be used to set the paper type that is used # by the printer. Possible values are: a4, a4wide, letter, legal and # executive. If left blank a4wide will be used. PAPER_TYPE = a4wide # The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX # packages that should be included in the LaTeX output. EXTRA_PACKAGES = # The LATEX_HEADER tag can be used to specify a personal LaTeX header for # the generated latex document. The header should contain everything until # the first chapter. If it is left blank doxygen will generate a # standard header. Notice: only use this tag if you know what you are doing! LATEX_HEADER = # If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated # is prepared for conversion to pdf (using ps2pdf). The pdf file will # contain links (just like the HTML output) instead of page references # This makes the output suitable for online browsing using a pdf viewer. PDF_HYPERLINKS = NO # If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of # plain latex in the generated Makefile. Set this option to YES to get a # higher quality PDF documentation. USE_PDFLATEX = NO # If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. # command to the generated LaTeX files. This will instruct LaTeX to keep # running if errors occur, instead of asking the user for help. # This option is also used when generating formulas in HTML. LATEX_BATCHMODE = NO # If LATEX_HIDE_INDICES is set to YES then doxygen will not # include the index chapters (such as File Index, Compound Index, etc.) # in the output. LATEX_HIDE_INDICES = NO #--------------------------------------------------------------------------- # configuration options related to the RTF output #--------------------------------------------------------------------------- # If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output # The RTF output is optimised for Word 97 and may not look very pretty with # other RTF readers or editors. GENERATE_RTF = NO # The RTF_OUTPUT tag is used to specify where the RTF docs will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `rtf' will be used as the default path. RTF_OUTPUT = rtf # If the COMPACT_RTF tag is set to YES Doxygen generates more compact # RTF documents. This may be useful for small projects and may help to # save some trees in general. COMPACT_RTF = NO # If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated # will contain hyperlink fields. The RTF file will # contain links (just like the HTML output) instead of page references. # This makes the output suitable for online browsing using WORD or other # programs which support those fields. # Note: wordpad (write) and others do not support links. RTF_HYPERLINKS = NO # Load stylesheet definitions from file. Syntax is similar to doxygen's # config file, i.e. a series of assigments. You only have to provide # replacements, missing definitions are set to their default value. RTF_STYLESHEET_FILE = # Set optional variables used in the generation of an rtf document. # Syntax is similar to doxygen's config file. RTF_EXTENSIONS_FILE = #--------------------------------------------------------------------------- # configuration options related to the man page output #--------------------------------------------------------------------------- # If the GENERATE_MAN tag is set to YES (the default) Doxygen will # generate man pages GENERATE_MAN = NO # The MAN_OUTPUT tag is used to specify where the man pages will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `man' will be used as the default path. MAN_OUTPUT = man # The MAN_EXTENSION tag determines the extension that is added to # the generated man pages (default is the subroutine's section .3) MAN_EXTENSION = .3 # If the MAN_LINKS tag is set to YES and Doxygen generates man output, # then it will generate one additional man file for each entity # documented in the real man page(s). These additional files # only source the real man page, but without them the man command # would be unable to find the correct page. The default is NO. MAN_LINKS = NO #--------------------------------------------------------------------------- # configuration options related to the XML output #--------------------------------------------------------------------------- # If the GENERATE_XML tag is set to YES Doxygen will # generate an XML file that captures the structure of # the code including all documentation. Note that this # feature is still experimental and incomplete at the # moment. GENERATE_XML = NO # The XML_OUTPUT tag is used to specify where the XML pages will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `xml' will be used as the default path. XML_OUTPUT = xml # The XML_SCHEMA tag can be used to specify an XML schema, # which can be used by a validating XML parser to check the # syntax of the XML files. XML_SCHEMA = # The XML_DTD tag can be used to specify an XML DTD, # which can be used by a validating XML parser to check the # syntax of the XML files. XML_DTD = #--------------------------------------------------------------------------- # configuration options for the AutoGen Definitions output #--------------------------------------------------------------------------- # If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will # generate an AutoGen Definitions (see autogen.sf.net) file # that captures the structure of the code including all # documentation. Note that this feature is still experimental # and incomplete at the moment. GENERATE_AUTOGEN_DEF = NO #--------------------------------------------------------------------------- # configuration options related to the Perl module output #--------------------------------------------------------------------------- # If the GENERATE_PERLMOD tag is set to YES Doxygen will # generate a Perl module file that captures the structure of # the code including all documentation. Note that this # feature is still experimental and incomplete at the # moment. GENERATE_PERLMOD = NO # If the PERLMOD_LATEX tag is set to YES Doxygen will generate # the necessary Makefile rules, Perl scripts and LaTeX code to be able # to generate PDF and DVI output from the Perl module output. PERLMOD_LATEX = NO # If the PERLMOD_PRETTY tag is set to YES the Perl module output will be # nicely formatted so it can be parsed by a human reader. This is useful # if you want to understand what is going on. On the other hand, if this # tag is set to NO the size of the Perl module output will be much smaller # and Perl will parse it just the same. PERLMOD_PRETTY = YES # The names of the make variables in the generated doxyrules.make file # are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. # This is useful so different doxyrules.make files included by the same # Makefile don't overwrite each other's variables. PERLMOD_MAKEVAR_PREFIX = #--------------------------------------------------------------------------- # Configuration options related to the preprocessor #--------------------------------------------------------------------------- # If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will # evaluate all C-preprocessor directives found in the sources and include # files. ENABLE_PREPROCESSING = YES # If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro # names in the source code. If set to NO (the default) only conditional # compilation will be performed. Macro expansion can be done in a controlled # way by setting EXPAND_ONLY_PREDEF to YES. MACRO_EXPANSION = YES # If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES # then the macro expansion is limited to the macros specified with the # PREDEFINED and EXPAND_AS_PREDEFINED tags. EXPAND_ONLY_PREDEF = NO # If the SEARCH_INCLUDES tag is set to YES (the default) the includes files # in the INCLUDE_PATH (see below) will be search if a #include is found. SEARCH_INCLUDES = NO # The INCLUDE_PATH tag can be used to specify one or more directories that # contain include files that are not input files but should be processed by # the preprocessor. INCLUDE_PATH = # You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard # patterns (like *.h and *.hpp) to filter out the header-files in the # directories. If left blank, the patterns specified with FILE_PATTERNS will # be used. INCLUDE_FILE_PATTERNS = # The PREDEFINED tag can be used to specify one or more macro names that # are defined before the preprocessor is started (similar to the -D option of # gcc). The argument of the tag is a list of macros of the form: name # or name=definition (no spaces). If the definition and the = are # omitted =1 is assumed. PREDEFINED = THRUST_NOEXCEPT=noexcept THRUST_DEFAULT="{}" THRUST_NODISCARD="[[nodiscard]]" THRUST_MR_DEFAULT_ALIGNMENT="alignof(max_align_t)" THRUST_FINAL="final" THRUST_OVERRIDE="" # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then # this tag can be used to specify a list of macro names that should be expanded. # The macro definition that is found in the sources will be used. # Use the PREDEFINED tag if you want to use a different macro definition. EXPAND_AS_DEFINED = # If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then # doxygen's preprocessor will remove all function-like macros that are alone # on a line, have an all uppercase name, and do not end with a semicolon. Such # function macros are typically used for boiler-plate code, and will confuse the # parser if not removed. SKIP_FUNCTION_MACROS = YES #--------------------------------------------------------------------------- # Configuration::addtions related to external references #--------------------------------------------------------------------------- # The TAGFILES option can be used to specify one or more tagfiles. # Optionally an initial location of the external documentation # can be added for each tagfile. The format of a tag file without # this location is as follows: # TAGFILES = file1 file2 ... # Adding location for the tag files is done as follows: # TAGFILES = file1=loc1 "file2 = loc2" ... # where "loc1" and "loc2" can be relative or absolute paths or # URLs. If a location is present for each tag, the installdox tool # does not have to be run to correct the links. # Note that each tag file must have a unique name # (where the name does NOT include the path) # If a tag file is not located in the directory in which doxygen # is run, you must also specify the path to the tagfile here. TAGFILES = # When a file name is specified after GENERATE_TAGFILE, doxygen will create # a tag file that is based on the input files it reads. GENERATE_TAGFILE = # If the ALLEXTERNALS tag is set to YES all external classes will be listed # in the class index. If set to NO only the inherited external classes # will be listed. ALLEXTERNALS = NO # If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed # in the modules index. If set to NO, only the current project's groups will # be listed. EXTERNAL_GROUPS = YES # The PERL_PATH should be the absolute path and name of the perl script # interpreter (i.e. the result of `which perl'). PERL_PATH = /usr/bin/perl #--------------------------------------------------------------------------- # Configuration options related to the dot tool #--------------------------------------------------------------------------- # If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will # generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base or # super classes. Setting the tag to NO turns the diagrams off. Note that this # option is superceded by the HAVE_DOT option below. This is only a fallback. It is # recommended to install and use dot, since it yields more powerful graphs. CLASS_DIAGRAMS = YES # If set to YES, the inheritance and collaboration graphs will hide # inheritance and usage relations if the target is undocumented # or is not a class. HIDE_UNDOC_RELATIONS = YES # If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is # available from the path. This tool is part of Graphviz, a graph visualization # toolkit from AT&T and Lucent Bell Labs. The other options in this section # have no effect if this option is set to NO (the default) HAVE_DOT = NO # If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen # will generate a graph for each documented class showing the direct and # indirect inheritance relations. Setting this tag to YES will force the # the CLASS_DIAGRAMS tag to NO. CLASS_GRAPH = YES # If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen # will generate a graph for each documented class showing the direct and # indirect implementation dependencies (inheritance, containment, and # class references variables) of the class with other documented classes. COLLABORATION_GRAPH = YES # If the UML_LOOK tag is set to YES doxygen will generate inheritance and # collaboration diagrams in a style similiar to the OMG's Unified Modeling # Language. UML_LOOK = NO # If set to YES, the inheritance and collaboration graphs will show the # relations between templates and their instances. TEMPLATE_RELATIONS = NO # If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT # tags are set to YES then doxygen will generate a graph for each documented # file showing the direct and indirect include dependencies of the file with # other documented files. INCLUDE_GRAPH = YES # If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and # HAVE_DOT tags are set to YES then doxygen will generate a graph for each # documented header file showing the documented files that directly or # indirectly include this file. INCLUDED_BY_GRAPH = YES # If the CALL_GRAPH and HAVE_DOT tags are set to YES then doxygen will # generate a call dependency graph for every global function or class method. # Note that enabling this option will significantly increase the time of a run. # So in most cases it will be better to enable call graphs for selected # functions only using the \callgraph command. CALL_GRAPH = NO # If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen # will graphical hierarchy of all classes instead of a textual one. GRAPHICAL_HIERARCHY = YES # The DOT_IMAGE_FORMAT tag can be used to set the image format of the images # generated by dot. Possible values are png, jpg, or gif # If left blank png will be used. DOT_IMAGE_FORMAT = png # The tag DOT_PATH can be used to specify the path where the dot tool can be # found. If left blank, it is assumed the dot tool can be found on the path. DOT_PATH = # The DOTFILE_DIRS tag can be used to specify one or more directories that # contain dot files that are included in the documentation (see the # \dotfile command). DOTFILE_DIRS = # The MAX_DOT_GRAPH_WIDTH tag can be used to set the maximum allowed width # (in pixels) of the graphs generated by dot. If a graph becomes larger than # this value, doxygen will try to truncate the graph, so that it fits within # the specified constraint. Beware that most browsers cannot cope with very # large images. MAX_DOT_GRAPH_WIDTH = 1024 # The MAX_DOT_GRAPH_HEIGHT tag can be used to set the maximum allows height # (in pixels) of the graphs generated by dot. If a graph becomes larger than # this value, doxygen will try to truncate the graph, so that it fits within # the specified constraint. Beware that most browsers cannot cope with very # large images. MAX_DOT_GRAPH_HEIGHT = 1024 # The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the # graphs generated by dot. A depth value of 3 means that only nodes reachable # from the root by following a path via at most 3 edges will be shown. Nodes that # lay further from the root node will be omitted. Note that setting this option to # 1 or 2 may greatly reduce the computation time needed for large code bases. Also # note that a graph may be further truncated if the graph's image dimensions are # not sufficient to fit the graph (see MAX_DOT_GRAPH_WIDTH and MAX_DOT_GRAPH_HEIGHT). # If 0 is used for the depth value (the default), the graph is not depth-constrained. MAX_DOT_GRAPH_DEPTH = 0 # If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will # generate a legend page explaining the meaning of the various boxes and # arrows in the dot generated graphs. GENERATE_LEGEND = YES # If the DOT_CLEANUP tag is set to YES (the default) Doxygen will # remove the intermediate dot files that are used to generate # the various graphs. DOT_CLEANUP = YES #--------------------------------------------------------------------------- # Configuration::addtions related to the search engine #--------------------------------------------------------------------------- # The SEARCHENGINE tag specifies whether or not a search engine should be # used. If set to NO the values of all tags below this one will be ignored. SEARCHENGINE = NO thrust-1.9.5/doc/thrust_logo.png000066400000000000000000000717731344621116200167220ustar00rootroot00000000000000‰PNG  IHDRžÂ\sBIT|dˆ pHYs<<*Æ€†tEXtSoftwarewww.inkscape.org›î< IDATxœìwxTeþ·ï©™I¤Nè½÷ŽQÖ¾Ø×†:ïêºî®»î®¿uí® Š•`Cì¢RÄ(HïHï=™L/çýãIÈLfR&L*¹¯ë\“9õ™$s>ç[™$IôÐC=´%:. è 7XB¼¬kn›0FÀäåÕ¹@Žëkjjª±Í?ìy„¬G@z衇sE§Ói€ ŸË«ëÏÁ7:7Ê,à°·v9˜ššjêÐQuQz¤‡zhN'âðˆº÷±€¬µç—Ë%r' ……‰BáDYûª;Q*Å+2 ³E…ɬÆlVcµ)ýðé°Çb²Ø”šššáwwz¤‡z8‹N§ Æã€Ô D ÐÒój-„› 6d"$ÈLpPýÏ ß+•ŽV×é”a¶¨1™Uµ¯jÌ5å•A•†QTNQI8¥!8>éÛA`-°655õ@«wÐ# =ôpž¢ÓéBb1ÞeÈ›;V­²Ó;BOd¯Ú%BOï^ÕDÖ®Ój¬m;x±;ä””…R\NaIÅœ8OEUPKÏDˆÉû©©©ûÚt ]Œ顇óNЧX ¤ ±ËÄDVSN\Ÿrb"«éQMd/=¡ÁÝ#d_ÁÑ´DŽžLäTfìvEs‡| <—ššúM; ¯ÓÓ# =ôÐͨ‹qxŠ…WŽL½ÂõÄ÷©bSN|ŸrúDW¢T8ÛoàŒÅªää™8ަ%ràh_*«›´NöÏŸ§¦¦¶ÎÿÖ èzèÂètº `îb1€FÄB©p’WJJR1ñ}Êk£M€­ýÝp8äì9ÔŸo~EN~ï¦v=Ü—ššº¥†Ö©8oD&“õæËåòa} E¢Ýn±Ûí! …¬P( r¹¼Ìf³eÆ3ˆ\r×¥H’¤óçñ¬‡NN§‹f³j—q@£©H¡Á&ú'Ñ/¹ˆþÉE$Å— jeÀú|åÄéx¶þ4Š£i‰Míöð`jjju; «Sp^ ˆL&›¥T*¯Òh4ךL¦øÉ“'›§OŸ'‹‹‹#..Žˆˆ 555”””››Kvv¶ýÌ™3–¬¬,G~~¾²²²R H¦L©T8ŽL£ÑxZ’$WÉ $Iêù¶öÐjt:] 0˜Œ4·Ë%âû”Ñ?¹˜~I…ôO."²—¾GÛ½É/êÅ–G±kÿ ¹mf‹RSS·µïÈ:ŽóB@d2Ù¥AAAÏ*•ÊA .T\~ùåª .¸€ààÖÕ69 ÈÍÍ=»äää8Ïœ9cÊÊÊräåå)ÊË˵N§FS©R© œNg–Ñh<ít:ë*cë–þøcM|||Géœ)//w™ÜÜ\jEÆ–““#+..0›Íª€€ƒZ­.–Éd9f³ùŒÕjÍÄSdΫ _GSÛd,õ‚1 P5Ü/(а¹ ê—O¿¤"âûT “u¯ïfw%+/’7V]ÒXQâàúÔÔÔné¦îV"“Énxû©§žR=øàƒrY3öýéÓ§ùå—_øyûÏìÞ³“É„Srât:Q*•ÄÇÅÓ7©/I‰I$$$¸-áááíô©ZFuuµ›»,77—ÌÌLKzzº5''‡¢¢¢£Ñ¨V©TfFS,—Ës-KºÙlÎÀ½ki®$Iûiº6µïyÁ˜D5ÜG&“HN(aÄ \†Î&%±äü `kbiI4AY»¨¼ü¬Æ‹lûü¢^<÷ú˜-jo›?MMM½¡í®Þqt‘Éd—|¾yófõœ9sݯ¼¼œåË—óʲW¨¬¬$npáÉ…R£D&—!“Ép:œÊ J X*,˜ËÍJ T—TSSUƒ6HKŸ¸>$$$Ð7±/ÉIÉ"Ó»w“ùãíŽÁ`ð°d²²²lgΜ±dggK………êššš¥RiÕh4% …"Ïf³¥Æt<-™’Žý4N§¦SoeŒÆKFH°‰as18‡as êÆa-‰¦Áu±í‘¯k— ÚW--hÜÒ2ަ%°lå|œN¯'¼%55õCÿ\©óÐ-D&“ÍR«Õ߬^½Z}ÅWxÝ'77—§Ÿyš÷Þ{Øá± ^0˜„± Èä¾G!íûYq1”¨)©Á\nÆRa©™ÊÔjbbcˆOˆ'91™¾I}IHH 11ñ¬ÈDEEÑœ¥Ôž˜Íf‘©Mc6gggKªêêj\.·k4šR¥R™o·ÛÓ Ccµ2]ÿÌ µVÆUÀï1 ÿ…\.‘’ẌÁ9 ”CR|I÷ z7œ™£îçÖþåÔ ‹ëÒp]Ý{ h°È–‡–z¡ðfÈæ{½ð¡]¤w~Ú5”×Ìô¶©žššZpnWè\ty‘ÉdcÕjõŽÔÔTíí·ßîuŸ>úÝu$MNbÄÕ#Ol{÷“ÃæpC©S¹ k¹c™‘ªâ*ª+ªQ*”DÇFORR)I)–LŸ>}Ëýô˜ä¬V+yyy EÆ‘žžnÎÊÊrÖÖÊhd2™S«Õ–+•Ê|‡ÃÑXs—©•ÑétÉÀ5µË4¼<»†…^+Ãæ¨µ´÷0Û3Þ§oòÕzj­ALýT÷j¤m­‘:A AˆE/ûŒ{þt×¾~-JMç‰ t,57w™¡Ì€¥¼>¹²¤³ÁLph0±ñ±$&$’œ”Lr¢§È„…5üFv,%%%ìܹ“Ç“ŸŸÃáþXdfÌð ÆÌ`pÿ|äò.ÔlÙN½PÔÔ¾¶fø6Ü­‹J|wi5@’„@Ô‰…µJX M„ eÃg£d`(õñ‘(Dÿ€86/¿‚Ó™}®vÉ©©©yç4àN@Wk ðþ©S§Ü[Y,ú ìÇ k1èÂA5¼nÕh­Ï0«s¹K¹ŸÊâJŒz#Aô‰ïCB|B£–L¯^½Út¬&“‰C‡±ÿ~Ž=ŠÕê¯ 61vDãF¦3(¥¹¼‹|L¸[­µ ¬@)P† ?4Ñ`)K­Xت;ƺh2é ¡ ÌÄ®‰U½hf>GÁÑ´^yç2o›þ›ššúdëÚ9è’è …b´iÓ<¢Á¯¿þ:’FbÐ=âÑ–¨Õ¨ÕD$F4ºÝlws—/;ξ}û°n³ž­•ÑWê ÐÐ'®ñññ$'y¯•‰ŒŒô©VF¯×sàÀöïßÏÉ“'±ÛÝÝOá¡F!#Ò˜RØù+ÀÔ[uKk=jª(JÁo?||»Ì¥b±Vv!Áhˆú ° b8Èêf¸ÍCÔˆÔõF.ò±’&>(—èÞÕ—…6Üt Ð# AHHÈÌI“&yŒ}ý×ëI™›Ò"Ó²‡¶E©QNx|ã57km­L­»ìLÙŽ>‚å' Æ2£™r=J•Rd6Q+£R©8xð û÷ïçôéÓ4´¬# Œ«µ4ú'vþ‚>õ.$=çv“7S/e7Õ9"9ÁR–ZѰw)ÒÏb*qšÈñ.AötD\¤® ˆÆ{¢ )IÅÞd˜N§SvõF‹]R@Ìfóè &x¬ß¿?sæÏiÿõÐ*j¡±¡„Æz|¹Îâ´;1”×»ËrÊr8yò$Ö_­È 2Â5áô‰éCT¤G»)zGÔ0vD:ãG¦“’XܹEÃJ½`TÑz „ÅRŒR„ù‡¹ÖÊ(.ª®QþÙz¬UPu‡º¬<‰¤«¿ç,„+« ’âKص@ÃÕjDŠø·#èr"“ÉäryÐèÑ£ÝÖggg£¯ÒÓ;¥sõŸêáÜ+å„D‡"V¨­_Çë1²—žq#Ò?*¾ ¸]—!UÑ8×'x#îV†Ÿnîv‹À\ ¶ÿœ³+aÈuÖÅÁí@&õ¢Q ôAT·7Br|ic›FÑ# íN F£±k47ÃÑjµ"“ÉzÒv»#6„`ä |Ï ˆî]͸‘ÂÒHjüËÚñÔP/çê–’¿‹"ÄMÌ3ÁØM`*SÑù) ©>®UꙈTÞºî¾Å4) ‰q¥Èd’äa>òãPÛ®( åF£Qm·ÛÝZwôë×¥RIUAU“~÷ºâi:(Ä#%5&ª’ñ#3?2½ó¶ñ§[ Äï  ñû(Â/±Œ:f!¦"°öÌã†Ã"b"Ú˜ÚvD½®Ü´‘ÕØˆŽ¬¢¨Äã¾äµaVW¢+ H…L&“ŠŠŠd®“EÉår @eve€teª©¯oPø]ÉøQg?2ƒ¸/¦HGão·WT)B4Š9wrÁi‚a,™S=4Ž!×E@@ĘêÄH}GàFè›PêM@<w]Œ.' ’$9ÂÂÂÒvìØ1øºë®sÛ6cÚ vÞAß©};fp=´ "£%! .Z˜8ú4SǧuΘ†?³¥ê°#¬¯ÂÚW?«%kÝS– :OSÃNŽ¥RüîΦõV5Ø¡h¢n6²—W³®‹6Q«§Ë €Á`øê›o¾yàºë®s‹ƒÜvëm|0ÿ&ß9¹'ÒÙq"Ü09ˆ'k—™\îd䦎KcäÐ,”ŠNTT !D®ñꯆ6Äï¡aqøù#[«D@ØTÔý³§Ú„ÚjzuaDüÍêâ ÍH#s¦÷HGàp86}ùå—Z¾|9 E½Ý8eÊ"##Éþ-»Ç é¬T D#~b\SÇ¥1iÌiB‚;Qaaa”×¾úËdEˆh!"¶ágkÀicó{‚áþÀê* ,κF ͈r#Òå'‹é’üh2™ Û¶mÓ^rÉ%nݾˆU›WõHgÂDý¬ë ²…BƒML{Š©ãÒHˆíDq õV†ŸºÍB4 ¢ÑF.$KóDà·ËV„wBì û8»þ횈Ø{,΃$IN•JõÎ[o½õçK.¹Ä­ï­·ÞÊO>¹ÚÜeçéØ7Ê\„KÆ¥ÒÁ¨¡YL—ƈÁ¹§i¡‰zÑðSñ ħç)¡MDÃa–†1ßË®ÿÐðïæª ÍHÒɰÛíï®[·îoÕÕÕ„†ÖW2'%%1iê$Nÿxš—èÀž‡H±¨+ôk`Ö÷M,fÚø4&Ž>Óy¦zÕS/þöšU D£¿¦Üºb.«µ6ÚH˜zp¡a‡¢‘Ÿ½àpxÝ¡Çé($I:rbõêÕ#-Zä¶íž;ïáágîö¢†zUƒ6ÝᡦŒ;ÅÔñiô‰ê¹¢u©¶áð÷݈ˆïäÕþÜHۨɕâ=´e€¾X N¯ˆ?íÜ¡Ë @MMÍòåË—?·hÑ"·þ×^{-º?ê¨È©h²clç€úÔÛº RÙ;<“iãÓ2 ¯ã»ÝÚ‚Q7Ç…¿³l+#¯öm„Ã,DØ/ä=´/ W¸ ÷¹Òi“Ù³ã¢Ùl>èqu$]r>:d2Y/…BQ”––¦ìׯŸÛ¶›ÿp3ÇMÇ™xûÄ]7¤Î—Ÿ‹Èjº˜RÈÔqiŒ•Þñó„[©ŸãÂO-Ëݨ«”ÏCüNÚ0Œc­‚šlïqSu‘ãÄ4¸DSj¢Ùšò¿?y+zƒ{Löã?Æb±˜5M‰B¡È³Ùl555§ñœ†¹e—¸Ó¥-I’ʃƒƒ·¾÷Þ{—>þøãnæ¢;qÍ ×0á¶ >Í%уŒˆ®£Ùxxm#{é™2V¸¨)–j?ê2§Jñ(HôUÑ( M=Ø’$j6 Ù=­E: *צѽùÙ ÅâÑ»woòóóÉÍÍÕäææ&Ö.S²³³–¬¬,©°°P©×ëT*•U«Õ–)•Ê<»Ýži0N;ÎlÜ…¦Têk K[ 2™ìŠØØØóòò]…Âét—ÇxÝxÆ53ëKÞ)F4Žkðä«T:˜0*éN00¥ cÛ¤Ûq¶øw¶"D#—fû+N›h›aÈ=˜z訂!zŠËŠÉÔ×€ŒÅÝÕ€ÆGkg¸­›6m·Ýv[³×5™Läææº-999ÎôôtKVV–£  @Y]]­‘Ëåv­V[¦R© G¦Á`8åp8²cÀI’Š}ûÄ-£K[ µlª¨¨°oß¾™3gž])—˹ã¶;XûÃÚñÂÒÈÄ#Ý»š™“1}BA8©µáš*EÄ4Úê¨ ãñâ®ó73è³DFUOíFç#(ÞåMPWPD“â–ë±nР–ÍšªÕj8p ºM:"GLmˆNäyyyÊÜÜܘÚeLvv¶têÔ)óáÇ¥‚‚‚@­V«×h4'ôzý6‡Ã±Øã‹¥Ë ˆ$I¶€€€÷ß~ûí»gΜ驺ý¶ÛYºt)SŒSP63mØùN%B4òp»YÊ壆f1{Ê1†Èí8k£n’¤º¸F[Ýdë¬ /EmÝúL0·U™ã\Vô§>+ÎË HËðÁƒ{Ù³u¨ÕjRRRHIIq]-£Vd ÇŽ 9|øðÄ­[·ŽÞ¸qデøÆd2­¶JRë[º¼ @&“Ójµ¿”••©µZ­Û¶±Ç:-”!é Ñubœˆ›e&™Tá¡FfL<ÁŒIljë \Q×ù»+ðö”+åË«¬ ­Eô™"ÎÑïÜ'BXÝý^ÌFH0¬éc‹JÃøï’ëÝÖ…„„pß}÷ZÝþ¶‡ƒ_~ù…õë×;ß{ï=‹^¯/3OïI’ä“ã´[@HHÈ™+Vô»ñÆÝÖ¿ñÆ<»üYæ?=¿ƒFÖ i$(.“ÁàþyÌžrŒ1ò:¦B¼®Ya)â¦Þ–3F·³µ" ®Ïs'l,܃' 5DOy]ÓÄ@"âù~4.Ž$ïlýi4_|=ÙmÝÁ£Ù÷Û>ì6;á½Ã‰‹%11‘¾I}INL&!!ÄÄDˆ§áC±?±Ûí|úé§<ñÄÆììl«Õj}Îét¾(IR‹RDº€Èd²Ìž=û±~øÁ-#»¢¢‚>±}¸öµk ‰ibÚ°óF‚âZ ÓÆ§1kÊ1b"ö©n'\E£­kÚÙÚ1ßFuX:éÜW=x§×(ÐF×¾ ¦!Ä#!$M`³+xø¹©Ö»Ü’4À…âæj35e5J Ê Ô”Ö`)·`.7SSZCUqV‹•°ˆ0úÄõ!1¡Vd’„ȸ.ÁÁÁçüY·lÙ¿þõ/ã©S§ à ’$íiî˜î$ ± …"';;[çòê+ÉÌgÜã:htHAñ¾‰ÅÌžrŒ‰£ÒQ©ÚòQ¿j¢QFÛ7uèk„`TgôLØÔÑÆ@¯‘µoTñDˆÀhÜ+ѽðýÎ|òÕ4÷•#€¯»{Å¢·œC©s¹s™™ê’jÌF3A!AÄÆÅŸOJr }“ú2fÌfΜI¯^½š¿P-N§“—_~ÙùïÿÛ.IÒ+V«õ?M¹µº€„„„üðÈ#Ìz衇ÜB½ëׯçŽ?ÞÁµË¯í¨¡µ?ÅÔv&9ͬÉÇ:fþp¢¯”öi%×ÖÁ¨:Ý#]u(DŽw™@jbþ@%Bšñ*ÙrþóüTT¹4ɨ³>ü“>ÃûtÔðÚ'¢SAñØè fO9Æ”q§ÐjÚ¹JÜŽ°2Šió: @ü ÚÎEx6=TŸshsþA©…¨‰ ¯‹o"nüÀxÂÞ=”¾œé¾ÒGëÃXô Rt¬ˆòåäÌ#.!Ž—.à¯ùk“éÄ’$q÷Ýw[?úè£B“É4Y’¤Â†ût7ѪÕê²;wjÇï¶íþîçû´ï™~ßô]ÒHP\©p2vD³§c`JAûŽIBTmÓ¶i·®X¿ƒlÚ½Q¶Ý(„ÃTÔ¾×íÁ¿(„塬 [Ä#â1ѤœN9ÿY|=e.JÓFÖ‡¯Ø-vŠO“¹#“Sßâ‹/äá>ÌôéÞï‹’$qÇwXW¯^o4'7,HìV ÕjW.Z´èæeË–¹Õ¸ìß¿Ÿé3§sãªQª»|ù‹ ‘ xïˆfN:ÆŒ‰'Ûf?SíxJi¿fÕz„ÕU@»º©@T‹ëÓÁOO:nG ½F  ™ú4Ýþ@´÷ã²cÏ`V}>Û}åH ¯?Fé?LU&Ž­?ƉM'¸üw—óÚ«¯y—8Nþð‡?X×­[—c0&¹öæêv"“Éf„„„l+++ P©TnÛL‚ÌÐA£óuAñ,Ü‚Á2™ÄˆÁ9ÌžrŒƒsÚ·®!%´‹ ÄͺN@; ÕœÓ&ê8 9=•ãÝm4D w‰yôCX nüžµ€^)¯ æÉ—¯Åh ¨_Lò×Hý¹ÚÌ®·vQ|¸˜Ô×SY¸p¡Ç>N§“Ë.»ÌòÓO?}j4Ïö`évœ÷ÁÄ]uÕUnë—¼°„7>}ƒ‹þ{Qì¨ ŠçãVP ¶3mÂI.œ~˜¨Þíèð¯+ò+¡ý\T Ä*! 0óžäqõY u@âZþE¦€Ðþœä²rÂâP b-œÂ锳$õrÎdÅÔ¯¬+<ì0²vgñËë¿ðÇ»ÿȳÿ{Öc{vv6ƒ ²Y,–©’$í…n* …âÑK/½ôŸ6lpË•(,,$1)‘ÞºÀ^Í4ðï Hˆy&Îà1ÏDx¨‘9S0{ÊñöÝψÚøLð IDAT¾^ÃB@siÛ⦆PU§Dߪº>ÚhQa~Öe¥B´eFô»B³ó|¸²vËD6}?¶~… ‘úÛò,Ú§º°š-láÆßßȲW–yt2â‰'œÏ?ÿü!ƒÁ0N’$©[ ˆL&KQ*•§ ½{»÷Z¾è’‹¨‰¯aô5£;ht-À‰¸QžÆ£f!!¶œ‹gbÂèÓ(íôØß.ª:ÊñçiOlz¨JKNÕCû¡Ð@ø`ÐD¹¬ŒÆ ,†P„ûʇPé‰Óñ¼ôöeH’Ë w‚ébÔ×°å‘-ÜyË<ÿÜónÛÌf3ýúõ3è$Iz¿ƒsÚI’2´Zíþ?þØcÛÝwÞMæ™í?¨–`CˆÆ6à nâ1|P¹k#<ð9SÆ¥µ½xH«' ؃R·g|#ØìBÔpt€x8mPyŠw÷ˆG·@ÁÉ3µxôC´g× âÃðI<ô5ZÞþt®»xD"Ò» ÁÑÁ\üØÅ¼ºìU8à¶M£Ñ°dÉmHHÈ#ÐM]X2™ìÎáÇ¿räÈ·š³ÙLdt$—TÌëq– `8bB(5BH|œ[’d¼úî|ަ¹ô5 @Ä=;ªkpàÓX[Ù»{/ry½­QYYIïÞ½N§3ª; HˆR©,=xð zØ0÷–™‹îZÄžÒ=L¹kJ#G·zD|£AµxP ™ÙSŽ1wÚQBÛ# ׉(ô+ªS{ã@d–eÐîõ ±” w•­½]u=øh"E7Ý×8„$OAÔeD!2­|Ìî—$«¾˜ÅÎ=.­ÙUˆ©nÃ;ªëà°9Xû§µ¬L]Éï~÷;·mC‡ÕŸ8qâ¶néÂ$I¯V«×¯\¹Ò£ øwÜIúé8í”Yì~@ôfªFtd7]µçþýWÎÛÓöâQ˜Þ‹pµ·xÔ¹ì¾NСâá°@ù!(Ý×#]™RdUÅLƒÞ£ˆG 0! Dœb­Ïgw[ñP¨$LL`Û·Û<¶-X° 0 àânkÈd²y½{÷^[\\¬u5Á’R’zËPúNîÛ~ªË¨jP·00¥‹gbÔЬ¶¯ß¨·6:j®m ÂÚȦmçøh!Æ|au8{Òr»4Ê@J„ 8—zŽ:BñºØG4Âêh¸_ $+WÏæ×}.m@º™xÔ‘µ+‹ìµÙ;|ÌmýæÍ›¹îºë2»IIv£l3ÆmÛ¶içÍ›ç¶aÑí‹øð›Û^@ê2ªÎà„–ËŒ‘ÁųÑ7¡&‡0#D£#bu˜ñ\Ú½bÜ3TsO‹õ.M@oá¦Òx i† ,Œ>.ï“iQ?+oñ˜Ã¯û\Ò«º©xD Œâ›ãß I’[Jott4v»=¢[ ˆ$IN•JõÎ[o½õçyóæ¹…´n½õVž~æiÌz3šf&5n vDP<qó®E`cƤã\0í½#ÚØWR—IUˆèKÕQÔ ´€NÓîÃ+j:¤N`õà2¨#@šh—:W‚ÂQWA®’8§š I’ñîgsص¿xLEÌNØ 1U™ ó¨),,D­V—vk°Ûí+×­[÷W½^OHHýcGJJ ã'ŽçÌOg¾`¸ÿ.hF¸g²p{Ò«áÂG˜9隀6neAÔMÓ~ý¨¼Q…ŽNÔ`Ðn‚Êc=i¹] ™4½…`h"AÞØ+!±ˆB>bâ§èÚ÷­ÄîóÞê9ì>à’›ÛÍÅÀPj OœgóÂÂBd2Ya·I’Ž…„„¤­^½zØwÞé¶íî;ïæ¿‹ÿë©{ÊnàžIŽ/åâY‡?2½í§ˆ­@ܬ;úæX…¨éL-Í%¨És{¬Ž®\]keD‰@¸¬±”‡Žº4]yíº8Zçp¥¼2˜Ô.&3×¥x$˜ˆ(:ìÆTåW‘Ò׳}aa!V«5§Û @MMÍ©©©ÏÝyçn .\Ƚº—ÊœJÂ[éÀ¬@d5è”?¨_ .ØÇ^çañVê­N¥!ÈâpX¡âpÕÑÙ‘+E½†:"ÄÏMZ uÖFêÛ¤+kßÇâsf•7ަ%òö's1]ÜÜÑÀ8„pusJŽ”pý ×{¬ÏÏÏwÆìóB@€÷îÝ»4##ƒ””z5 áÊ+¯äÔ÷§˜xëDßÎX„ŽUCä±àÂ}m?ÿF%õÖFGÇŒÀ):UŒ£K…GGºòzðŠB áõ¢¡ ¢y7“ŠúùÈ]K„ÖF4~™sC’d¬ß6ž¯¿ÇÙDU¢Ñbãs0u+6Ù²™¿r¾Ç¶={ö˜€´óB@$I* ùfÕªUó}ôQ·ÑEw,âú[®gÂ&xŠ<¨«”>ƒGÍÄðA9üî¢}ôKjÃÇo;L¹XËãènx‚1;ÖÉ–+½ûSÊËËÝÊúôéc,**ºR’¤mç•€Èd²ñZ­vgYY™Z«uÿO5n½gõfРA"“Èå V.—˜8ú —]°>Q•m38 "“«¯&ÝùŠ†È‘äùÅ/¶ÍõÛ‰½{÷²à‚dÝl& Á“j¹R>RsðèIúöíëÓyßyçÞzü~v^áÞÆ×n{M­ Eõ¹Ã"–óvFA™¸ùËÕ P‹W¥¶^,ñþœ,‰¦@B4B^G†È°Šª}õSl£!zƒ†Ï7Nu¯*—!*Õ‡Ð)²¬v.Ûɬ³xmÙk->fþ‚ùèãôŒ¸rD«¯{â›X÷Yùåç_ÜÖ§¥¥1|øp«Ýn•$Ér>¹°$ioHHHþºuëúÞpà g×;N^½´ÓibŽZär'SÆžæÒ¹û‰Žl£Rn=ÂÚ(§Ñ`oA9¬þÙɉ·l›1´#Kžyœû‡[=Äàõ£ .›©Ïâ!I/<ó8O÷ì¯ÔŠÅmþ‡ºãµbâ"*N«è‰%9Åö³KÃ÷èg/™ˆ%Ô-Ôæœ]W' *ñ*W KB®ª ¹’6»)7J(î®)o#D#¿¤à6Åνƒø|ã÷ôÜPÄŒ„>¶to+Ê2ÊÈÚ•Å>ÑâcŽ?ÎŽ;Xø–ç¼æ¾P´¿ˆÛÜî±~óæÍhµÚÕÕÕhó?SçÃ`0,_±bÅ£7ÜpƒV’$öîÝËúõë),,$P#âë …“iãÓ˜?g?‘½Ú E­„(²+¤E“4½²Aŵ×^C||¼ÿÇÒŽdee±ióV^¿Ù³ Òì€eÇÔ|½ì¿>Ÿ÷믿ÆZ]ÂUý|;N¦®ekg7–ÀÙ”ÈÔ¾w:®I—½ÛÏò¦Eáìvíão-ˆqû©±nAjêãí¤.* ãÃ539y&®~¥‚úyÐ;ÑïwßÊ}<òð#>Å<Ÿ]ü,ƒ/ŒJÓzóIrJäÈaþ+žé»k×®5èõú/ëÞŸW.,™L§P(²¿ùæÅöíÛÉÍÍ=»M©t0câI.™}€^ámЧʎ"ZÜb¤ÆIwðÓŽ=ŒÑz“´3ð×ûïEÚù/Móìæ¸â¨ŒOm“øöç_}>¸^³—?ví_O×&!uKS¢,G¤Ý6eø½AÃæÆðÃ/ñÛ]Ìß(„ÕÑÚ‡ˆ6"û·l¯<Ì™´3´,ß¹  €ƒpÍë×ÑúTx¼KvPRXâVÚ`6› µÛl¶a’$‚óйçž{FWUU™>ù䓳!9•ÊάI'˜7ûá¡m0žá¦*Åç.´o#gÊ”)]^<*++Y¹ò]\ë)N ^8ÈËï=æóy÷ìÙÃÑ£G¸ýf? ²‡–!GÜø{!Ä"œæã2„‹(4o§,t£IÍ7?âÛí#±X]©†"jH:N‡“}ïíãÕů¶X<^zù%úÍèwNâ·/ù—Ì÷¨‹ûé§ŸP«Õ¥V«õTݺóF@t:ÝÀ“À´°0ñØ ¶3kò1æÍ>Ø6“75HÃõ‡–~¥áÝõç¨:„寿ƥÉ2’½´Ñþ*Âc˜?ßÓdnŽÅO?νCmhÏ›ÿä@»uJËÒZUÔ L8í&«’ïvŒdëO£0š\nÂÀ@D ¼“N§wbó b¸îºëZ|Œ^¯çÔ7¸ô—žóõ‹ñ÷Gÿî±þ믿¶Ûl¶¯\×uû¯N§›Ž޹uëTJs§å’Ù27~pkh$ ·5¬Þ‘1IÌ;·ù;1V«•W—.aÃ…ÞEzñÑ`þþ„ï"™‘‘Á×›·°ÌKL¥‡V¢@Ôb„ núxñ5…«+Ë—ãü„Ý®àÇ_‡±é‡1èk\‚**DŒ£í*d¾b5X9ðɶ~½Õ§ãÞ|óMb‡Åžpn“’˜«Ìœ* áüIk×®µX­Ö ®ëº­€ètº á8ûX+“ILŸÆï%"ÌÏ1"¾Q€ß&lZ².˜ﺋÌÌÌs:ODDuV—¯a2›uöÕºu ³1ÖK&ÔÎÈ6«¹ñÆ}>ïÒÅÏró ˆê¤ž@„P„R/¾z?”Ô[tØÅá³sï`6~;νŠ\þtŠ´Üæ8¸ú ]xS§Nmñ1v»ç_xžÉL>çëçîÏe䨑ôêå>qJvv6¹¹¹à;×õÝN@t:Ý(à àJ×õ£†fqõüÝÄÅø¹%« !…øuzÖ]'!·ØÂâÿ=Ââÿ=Òªs8¥•f~ýõW&OöýŸ«ººš!ûp®mè%'ïÏõÞ*xñÑ@xðŸ¨T¾}»ËËËY¹ò=öx‰©ôÐõẴöI<ˆzÁ¦C3—j ~Þ=”~Neµ‹úÉSÖ@¸­ºÕ…ÕœÜr’/}áÓqŸ|ò ê5}†yÎÛá+ ¸n§ëlóæÍí©ªª2¸®ï6¢Óé† qù—îŸ\Ä5—îb@ßÂÆm á®jÊäÁPø¾s1gÞØŸ×*ñH]¾œ¹ ðe#®§s%­~È•xO÷ÿ|>öõe¯ra’ŒAÝpÑV#§Þªpµ,ÎuÂM"h^Ïðs‘Ö_Ô‹o·`×þØ\³ªäˆÞYé´½«cßûû¸çž{Ü:†·„§ž}Š¡W=÷H™ÿ”g,rݺuÆêêjeëò¢Óéú7ã‹®àêù¿1zX¦/hBtä-¥SwyuJðâº@–.¼UÇÛl6^~á9VÏ24¿s+yápwߣó¹5‹ÙlfÙË/òÅÜæƒL¥f8] UVÐ[A«„ð±$CX'¸¶½V@™„„‰¥WoÔŸs W´Ô[¡tŠúI‚Ã'’ønÇHŽŸn>¤ ‚ãüMíf»h4h°ž}µ[ì(”¨4*T*B¢Cn“¥ðx!EGŠxto±À­[·R^]ά)³Îy ¥é¥Èœ2&Mšä¶Þn·óý÷ß+Í é²¢Óé’€G€Ûqùa.¿xSÇ¥!—ûñ_ƒŽ6jåWœ°æ Pi"Y°À³ZKøè£èheê¹[Å^)6Á'§àØFÏlæxÕ*R‚lL÷Ò§([2acA0{ T›m LŽ'"<ŒÐÐ0 •Ê+*(«¨¢¤¢š¡1ZfG™¸¡¿)~ú¬§AV+êOe2øËh@¸‰¡T‚•d|z(˜#Y&¤ÄÓ'&¥RIEZ%'NeqÉ3ŸþÓÝ >”_ïñáâjÄM¸v™?ÆômùáN'¼ø:Ø[Ù&& îh$ÛbU±sÏ ¾Û9‚âÒ±¼0DŒ#޳N»“Šœ JÏ”R‘QAUF…§ ±Ym…JhX(aáaQe¬B¯×SS]C^NÚ`-‘É‘D¦ï”¾„Æž[ï¹c¡/hú"ÿ@>O<ú„Ï DŸúßSÈ42v¿½»Ù}’“Ñ׎&°—÷@WîÞ\.ºø"är÷ô´_~ù F’¤Ã ér¢Óé"®ª»qyÖÔZ˜?çL?‚JéÇ`DB8Ú¨“‰ß©Áâoùûc6?ÇI#,yæqžÞÅ”µ¼zDÉÕW]åsu½$I¼ðì“<=¢Þ­fqÀ‡'aYZ0g*ì\4w6×üa!KgÎdÀ€_ˆ:ŒF#Û·oçûï¶qÕ›©L¶±x’‰~çp¿È­{~V³è®{šßÙåOóÓO?§<Å?çA 3<ÿ…‚—¾R0ïâ yꕘ5k®=Ül6)I1üãÏÚ¥ß "(þúöm¢·—¢ÜbÖ¬ù‚;nõí;ôåX±*Ë\åÓqï¿ÿ>Ý_MC“>'?’{ñë¾A˜Ì.f… 1/H?DM ¢ù_æ®L +$ûp6‘Ñ‘Œ7– '\È„ÿ7±cÇÓì÷ÁápžžÎ¡C‡ørí—lxhIŒ¼~$q£âš<Ö5Å5ìû`ü_ÓqLå %÷Þ{¯Oç6,˜¿€Ë¤ËšÝ÷‡~àЙCÞÝx–DÑÁ"þø×?z¬ß´i“S’¤ÞŽé2•è:N Ü <Ž0®‘’{Áô#ÌŸs€@­çt­@Gt2iœˆîÁéðs\ÿS8™yE¨Õ¾ûh6mÚÄ_ïXÈñ…†6ñ\lü¡†~ùÍçɯ¾úŠ¿-º´ëM˜°ô ‚W©6|$>ü(óæÍó9 "þé'çý7_ç׫L$¶¦ó«þ±KEåxóÝU->Ìáp0 %Žwï+fÎH‘@qÓ ZFŒ™Îâ_ >½°jÕ*ÞYz?<í.ôûÏÀÅ“[D` oiUÿüçߨ(zƒK}ËAŸ|q0>ô¶Oµ :aO›:–¬CVÂBEÅøîýÙ¹w¹ Zx(ñ~@ h¹‘ý[6g¶œ!ÿX>\| ¯^È¥—^JT”—”¿V`³Ùxwå»<úø£„ gÚýÓPª[þܽë­]LŸÊ¯ùÖ]ÚŸH’ÄáCHXÀ€Ù¼îc5Xyÿ–÷ÉÉÎ!6ÖÝ´:th͉'î”$iuÃ㺄¢Óé.^†Õ­“Ë%¦ŽKãò‹÷æ'?½”!„£ ÒÛŒjàgÅnñ-þÛ?Z%‹Ÿ~Œ‡ÛÌíýÎq“&OjUuýâ§åoÃMì(€;dàèIlض˜ &œÓ˜BBBxöù%²`Å üv•Q4|T"ì\Uƒ×:—¦þµÊ o¾#ç×wöéÚ«W¯&2ÈÈœ‘ðÒ:9O~¦áåW—sË-hò¸ž{œg®ó´—lÐrßýø,z½ž7ßLå—-¾‰Ç; ¤,k¯½Ö§ã–,yŠEÈÌMfçÞÁ>‘ä>7Ubí¢•Ú'7ŸäèG‰î̓z›o¾Ù£í¸?P©TÜs÷=Ü|ÓÍ\wãulyx óžš×¢^Sf½™´oÓX{x­ßÇå ›6m¢´²”Y3“äÌcàâQTTDZZZ °ÍÛqZ@t:]?àE¤äŽ&Rrc£ý”’ëJYU~®+lTÀpÀ '²àÇ<‰UôÍ®cïÞ½;r˜[ojËÔ!ÁÒcZÞþì1Ÿýõ×_9vìÇ©yô€š¥ËÞà–[nñëøþûجùâ6…âªøùäKþp˼¾ªµ€h3’ˆø|µdÿ–;•ûˆ‹Žãó>o·"Û   6¬ÛÀï¯ÿ=»–ïbÆ_f4{ÌñÇY°`ÏÝ¥ýÍ“Ï<ÉÐ+†697HÁþ®Xp…Çú­[·|¸ªªÊëͶS ˆN§ þ†KwÿäB®½l7ý“ý”’+!æÑâæ†-gS—|®æ®{îö9WÇâÿ=ÁŸ'Y ˆBÔµØ[£u"~o —òùièÕ'±U_ü%ÿ{‚j“•Ý çð‰môéÓ6þ›n½‹O7=ÆU3[ž¾lµÃËëøloY4ß}÷%¹l?®æçôdvîú™˜˜˜f[òüc—Õ`e÷Û»)?VÎòW—sõÕWû|½sE&“ñΛï0tÄPröæ8>±Ñ}í;'6ž`Å÷+Úq„žüúë¯9z„ëþÚ´k1o—þÓ³ ÊúõëÍ®ÝwÒ©D§ÓÉé¸Ï!ò*€º”ÜÝŒ–å¿‹•9ˆzŽnBa|ú[ñVŸ‘‘Áæ­[xã-‡ï³¿Õ‰HCq#¨]Ì?÷=µøôéÓ¬Ù°™™Ó&±þëmm⮨câÄI|ü¶o_~€ä”LŸ>ݧã–<÷8£‰ïÓúòÃÏ;‰ŒŒlö˜ƒräèÖ€ÛM¶² Þùöîý·OcQŒ–oaº%C‹_Õpï}n•»lùòå¢ÿY(B4ðš‚[žYηOËEs.bÅ‘­~8òaaa<ò°tÕÒ&ää7'™0acÆŒiÇÑyòÔ3O1tÁP”ÿ?WäT`®1{üï:N6oÞ,“$iScÇvÑét—³5ü¡Á&®š¿Û¿)¹å@6n3v^Y¯àª+/'1±ñì¦XúÂsÜ~D´&€\w#k‹ñý!(7ñûßÿÞçÓ:tˆûîý#Ï=¿„†³Iú¥Ò·¯…$Á’uA<¹ä1ŸŽ;rä›¶þD¿¾±lû¾eâðÂ’'ùÓ]V6j}ã]x!øèƒ–,~ŒÇþá[,1¯Ö~ §NýÅç륦¦“C¯«z5ÙÒ=ï`?.ù‘ž{{îiAf[#Øl6òóóÑjµDEEµ:;à–›oáoû½…€Ïš§ÃÉñ¯Žóù‡Ÿ·úþàĉ|÷Ýw\÷fÓÖGÎÞæÌã‘|²wï^¬V«ØÛر. :.ø¢žC"@>{ÊQ®œ·­ÆO¾¥*„p´]fj‡Rc‚å›”|ÿSëÚž”——³jÕ{|¥íL²ÅkƒøëßöÙWpÍ5×pÍ5×´Á¨<±ÙlÈ}¸¿|½l²®¼òÊæwvaÉâ' ÑðÕ†oZìŽËÍÍeƼ¼Ï=ÍÖbWV¨Y³Ö÷ ¹¶lÙ‚ÉXÄ•ÍgƒºñÒr7Þxc«Üe‹_\̤?MjR<òåóÓâŸøüÓϹä’KZ|~»ÝÎÎ;Y¿a=[¿ÛJ^Ne‡cµX±[íDÅD1dÈzð!Ÿ»@‡„„Ðo`?ʳʉáYŒ”þs: ± ÞõÙçŸeðŃ½Šœ+Å‹¹mÑmë7mÚ$)Š­’$5Úk£ÃD§Ó©?#ŠφÉú'qÓUÛIˆ-óÏ…ôá¨öÏé:+om•1qâFݪã_íU.›('9ÚÏ«åHìNƒÕwúî+o §ÓI~~>¦ÅOðͱoß>†'¶¼…ÌâµÁ<øÐ#Ö›x£²²’}ûðᇟ1|øð÷òˉö{È IDAT‹ùÃõN"xq>X  aÊ”)->W‹Ÿ”ï3àÃ𩬂7ßS°g¯og Üeª0U“uå™åü¸øG>ûä3ŸÄcÍš5Üÿ—û±ËíÄ‹#ኆÅ#(2…J<¸X VjJk(4ÙW¾]D§Ó –gÃýr¹Ä¬ÉǸrÞojýà®2#‚ã¥ç~ª®Âg?C¯¨.ºè¢V¿jÕ*F$9ÛßÏ«%¯ Öìtrêý¿¶ú’$±ìÕWxañÓDhMÜ;¿†éWCr4i8œðë‰2®ûç}¨Tªfk(Ãét²}Ç/¸Åùeˆz­Ëkí²øÍ@îàA46¢òΛo.çâ¹Ð7É}ý†- Ñ‹Ë/¿Üçs.YüºË†¦/G•>CÇ“8t¼/ÇOÇa³)q:<ø9›7¿æóõ¶lÙ"j¦5^“°kÅ.æ]8E>X«~àÏì?¹Ÿ/.ð©ÐO®3ñމ|õç¯Ø²eK“ÖŽ^¯gß¾}ìþm7ß}ÿAc<†*s+I™œÂ’÷—x=GÁÉæL™Ó*¹áæp†9[Tƒ"I£®Õì~yûó¸ü2Ïÿ›mÛ¶xªººº¨©ãÛE@t:](ð_„Ëêì§ï—TÄMWí 1Îw{+‹èŽÛ5ŠëýÆ’uAüã±Ö5Mt:¼ðü¼r{Û‡^þJÅ 7Ü@ttëüc555ÜzóB2ÿÈG11ÍËëBÓ‡Áã7šùjÍ'­Í›7ádø,êCƒ×z3°å;'Ëß¾¯U×j I’8sæ ¹¹¹”——£×ëyù¥çøò=O+qñ²`þþßÛÖdff²ñëM¼êOq:ådç÷æèÉDO&+/ІÍ*N:Å AƒZÕåù©gŸbÈåC5dÊþ-›òãå,ÿby‹Ï™››Ë»+ßå÷©¿÷I<êPª•ôÖ‡¬¬ú,O‹ÅÂøí·ßØñëvÿ¶›œ¬úôëC¯þ½ˆ¹8†ø1žmxF^9²Ñë8íN>¿çsþùÐ?}ãæÍ›Ñ[õ\þ¾?$4EáþB.Õy¦ïnذÁj4Mß­£M¤6-÷àDf7!Af®¾tÓÆŸúðM~ùuj•D¿¾j"Â%BCœ\µ• cÝÙµNiKdéÒç¸ñZ9…¥‰lßÃéÌ>dæDcµy¹-È}§¢àÄ–¼µì-Ÿ¯·oß>8Àuòž$I><À’ç—ø”ª{ìØ1B£Bцµ>;O¡V°uëV~Ûó;wí$íx½ãz9 ’°þaŒºgsûÍ=Gi §<ÍÀþ[§zúÙ§|ùàV_Ûú"=•E•Ì™3ÇcÛ† ‡£ÑôÝ:ÚL@t:Ý4DZîÙ2™Ä¬Éǹê’ßνo•„¨Ïï9u5–¬ ä¯>ìsêi‹Ÿ}Œ¿_ÙvÖÇŠÍræÌ™Ýh?§¦°Z­\}Å<®É“·´ìœSÉÉIÍïè…‚‚¾Ùö=o¿Ø¼ [Zï&qø°ïO“ ©¬¬ä…žgÙ²—˜<^Å×Tóæ ͵FXüj ~àï4Ìém†ŠŠ Þ~{W\q¯¼ÓH÷È@Da_4 („…Ôª.ÏO?û4C.ÒhMBúÏ鄨B¸é¦›|:ïœ9sPØüöÎoŒ¹q *­ï½Ðzp$ýaAa¤,LaÒÀI¨ƒüÛë?m}¯-ñÝí·oß>Ž?ʵñ½ULSäìÍaÚŒiiñGŽA¯×üÒÜ9ü. :.x¸ Ã?%±˜¯ÚNr¼ÜU•@]³íˆ9”¿’ñÅÝw·êø;w’yŠZg4‹Í/µ¢:»Ž¿ÿí~úh²yâæâ¡¢¾UƒåðV9cÿ{gUÙþñÏ00ì;¹!¸¢¸k¹—™¦¹›¦iöú{Ó±ÒVµýMÓDÅ%×ÔÐ4ËŒLEqË51Ë6Q@@öf†aÎïeÎ0dËù\×¹Ι3Ï3ÃpîóÜÛ·»¸b¾ ¾ür-cGZãá^{,nm°5#GŒ Y³ût»5°°0^zi2ýûê8{¨”¶­M/Pº‘GNè Þ*Þ…¶fÍš6õÁÙ¹’ñp<*m5ä;Ä„Æð¿÷þg–»ìÀþ<»ñÞ+åo°à㢲٠çΞcÚ«Óøvò·øvöų'®M]qõvŵ‰k­+‡®ê·à/õB* ŒmÜ.¤6.^HÛm±²÷¹ÔFfT&¯Ž7î¾B¡8¡ÑhjmÎoQð°ŒJÙÝNŽjÆ=Ç£Ýcëî®ÒIü=49þ–†Ú1ýÕ×ÍÎl Z4·F©±©§uè÷' ÕÙ}ûö}îîݻٵs—~(EÖ?*¨$fÌÑS|ø‰ézÒ¨T*Ö­[É‘]µg¢•ªaM°5‡›WswͬY3ùîÛMlûRÍÐ'Å¿Æòu¶Lú’èÊlFÃ_|AÿQý ’¯£–÷¬¸,”J³ÜeAË‚ðà‡KÍÉ)ä§æ›Ud àííMxX8™™™üüóÏ?qœ«¿\%êz9Y9x4òÀ£©pjì„[S7\½]qô¬[V ©Ä…Åñþì÷EÇääd"""îkxÍA¯Ó“r)…¡Á5¦ï–”””ì6åu,réh|ÜM’ÉúõŒeÌs8:XÈ]•Ê¿2ÎQ)ن̦„oß1ëüøøxNœ8Á7›ëï]ºÇ‰Ï–Šî«ÕjfÎx™à5¥xÕžHr—«1p3 †YlÞ¼™‡;C×{Ç@ïòõw2ºvíF·nÝjr èt:¦¼0‘› \>¥¦ÇsráÛ¸ruŽès·~³/nàèÐhÞ}û]Ñ]ž úõ[½üÞwß©S9r¤hW\u6lȤI“˜4iÒÝ}J¥’øøxâã㉋‹ãjÌUb"c¸yý&åúr¶hˆg{O¼»yÓ¤S‹«/æÞÌ%7)Wtª7 o›'ÛXÜ–›»íÚUÍHQ*•œ;wΞÔk¢NäNüu`•¼Í›fóü˜Ó4kš]——7PÜä_ﮪÎʽ6<÷ÜD“ïÕIJ …¼2D³¥º‚Tnsn¿Fæ*º:`íÚÕ´i%þ®üëí6<ÿŸÉ¢/pååå,_ö9ë—Õ 2(ï9°ný\q“»ƒ LþÏ8r2sxW)Ž&¶Ò•[q;ÃÉ ¹‘܈m;r2Ô__qñA\ˆÿóâ:Þ.$-*W÷»(ã×ß?2ëü¬¬,vìØAÌ—&T[WèaT2U~¯ØWí®-è]GÞ-ÞW^TTÄ¢EŸ±ï{qE¥jØöƒœC‡DðÓO?áâ\ÂS&tžØµœœóÔSO‰àãßçzÜN†ßÛxä8‘–áAjºi¤ex’‘íŠ^opèt:.^:ÇÉ“_‹?,,ŒÒòR|»‹3<Ñ{¢yù¥—qu½Oï‘P«Õ|±ò úØÿ¾ÏÓ©tõÖ]ù~4nܘ#F0bÄV®XÉ©S§X¸x!»_ßMß™}ÍR!¬Œ2GI⯉ÌÜ1Sô¹k×­Å·»o­Å€æ•Áˆ%Æ$<<¼L£Ñ˜,`"Ú€Èw1(Þuhú4Éá¥çŽÕ]£CrWÕʆ+žxâqÚ¶5/­oõªŒ`E“¶¾6w+ÿ\ñhÆrþB\‹µ6kɾ|yõÒÓëqçmÙ:u1«ûéҠϘõºi™hA«˜=g®è1öîÝË×›WóÛQƒñPk¤e¸ß5©éÜÎô@UzÿÔõÄëtíÖÕ¬÷º pþ£ýEý]K KI8‘@øúUMïËÖo¶ââíÂC~÷w—Éd2tæ ª[~ýúq ß˜üÂdzNëI«~æWØÆì‹áùçŸÝfG£Ñ ïÇ÷7¼æ ÊS‘”Í“O/ñÃÂÂÊÊÊÊLþC‹2 ¯©”ške%0¤ÿ%F :\^Ç+~!†ìª`§\Kñ‡îÄ\³ÎW©T¬_¿†ã{K Ôz h=3f¾-º:»¸¸˜•+—qz¿¸/€^ËÖ:°fÝ\Qçœ8q‚ŒŒDž3¡Oã‰_ #ËŽ‰'Š';;›—_~é/ù°çp RÓ=È+p6*Ò3ÂC§¸;›à.{ –ï6™Ð{¤‘‘‘\¿qgßÙÃÓO?-ºË³^¯'pI í&ß»mIöíùí÷ßÌR5¬FÍ‘ƒGxò©'ñhî»»è×Ъ´ÄŠcÇù¢Ïݶm®>®xµ´L·Ê¤^Lå‘^àâRÕ¥˜@VV– pÒÔ×2É€ÜÑ#ÿø˜J¹^E¼8ñ­šÝ·ÚÝ4Ò1–ch ^¡+ñO@†á=YÝy³U;g{4kÕZ´îD›6Ó«»@q.p“IºŽ¬Ý(~ɾqãzí%=·ÝáààØHtWU€ %ŸñÖt5¦”Ñ­qä­·?0«æfúôé4nìKFn/2jrËʹ«*XÙ`T×ÇH:“„§«'ƒ=‡‹à?Ò_T:¨N­#6"–M'6‰/,, ^ƒï#µ»ËZ>Þ’M‹7ñÙ¼ÏÌ–b¶4={öä“?aóöÍôOüJ îPýŸè_«ºduA`á’…&^sȸ˜ÁG{<ˆ½½ý™¢¢"“#εþ'ÜɰÚN%€ÇzÄ2aT$¶ Ó»–Þ—Æw¶Êè1”ûm• MeƒcÎ>+ û{=šs¬ÂXA€¥k™ÿ¹ymKÊËËY±|!_¯¶†| ¬øRÁ /L]mPÄ[ÌwÄ7t ZíȬÙâkM¢££ùå—_ø~}í+çk±ùìØ)^“âôéÓ9rävôŽ Lr+Åî‰%ðÃ@ÑsˆçøÏÇ™|mˆêT#uî,"î àÿ´iü½•;Ó¦OcËæ-¢Çª/¦M›Æ§ó>E«Ò¢p0ݰéËõÄî‹eïO{E¹oß>Ôz5M¾wCHsÁ¾»Âøf+44Ty?õÁš¸¯ ˜¬§R»ug§Rþ;î$ÛYPð^XÝÙÄ–þ#9p´eæe6ìܹ/%ýÍ[¼ÔJ^>lý^Æ¥(ñÕÙ;vì I#5‹,9 ié¶UÒ6MeéÒ…L›¢Ç¹†¥ ÈÈ/t$;×…œ<æ-.bÚ´É89‰ h ‚Àk3^£ëÈ®(* ÿIf&÷d\Ë ¬°Ì,Ú¢ E´,.T_®'vo,!߆ˆïÌ™3ÜHºÁøL¯ëè;³/ûßÛÏ”§°qýÆ¿ÄJÄÍÍ&M›P”^„W+ÓÝI7NÞ ¹os³< Ðv´eÛ–TŸ­ÂÖ(~¦Õj9yò¤Ów+¨Ñ€8k€*ëœ.í“ø¿q'qv”rjA«yw–8݉Ê, šËœ™õ×¶äËÍr† jÔÚ‚–Ìå³Ä¯Œ–¬và­·ß7RS«ôôt~ ù‘ãû¹tÍœ<²ó\ÈÎu&'Ï…Ü|gtå†ÏY¥Rq:ò'¾ùN|7á;v]œMïçz×¹¾ fO ³ß™-Ú…–••ÅŽïw0v8ñë?_Ç·©/uÜq—ç.³u²eÄ’œþâ4~þ~,[²ŒgžyÆìﻥpvqF«×)ëiçÎ(µJZ>ÚRô¹Õ‰ÞÍ«¯Šv¡)•JÖ¬[Ãàyâ‚îq‡âhÕ¢•YÁúÀ%´ÒVTÌ :Íû4§yŸædDgús(k7¬=ô¢?ú  sçÎtîÜOOO³Ç0…ÜÜ\r2spnhúw,6,–÷f¿'zå”ÀÏGæÙ¯,Û¶¤u±šôøôk˜BCCK5Í>±¯i à lî–RYËõŒü;CúGÕ½(P¢N$&CÄQ/ƒÅg6\»vsç~åÇàúù;V(â}±j®èsÏŸ?Ï•+—«(âéõ2Š•ö9PPä@añ­ÈñΣ©é2¾ßu€„„Y¢ÇüꫯhèÛ÷îw ŽŒGµܸÐ8úõ뇿¿ø´µyŸÏ£ý¸öuv]©òUÜ<}“·6¿%úÜÍ›7ãÙÒϦ_hÕ…j¢B¢Ø»K|833“BD»ËîE£öhÔÞ°ò+L+$ýj:›n¢`C7 í8:tè@{ÿö´ñkCëÖ­iݺ5-Z´°H 娱c4m×ôž„«“—”GΦN™*z¬%K—ˆŽS‰!íbºt02ºiii$''Û?‹}M뀀?à4†¦Í4nÏKÏýŒO© ü¯Àòu ^xá³u'–Íçµ—t8XªmI5€¶¡Yi´‹ÍçñG³iGۻơXi^ÿ«îï¿ÿθqãDéXƒ¡Š;hYÝgt‡Z²$õ:ó3iŽ=Jzf:}ßH²:1ûb7~ëoׯªµ«h;Q\@6r]$ÇOäñÇï­x/V®ZI‹¾-ê¥I¡«·¡»n‚^  µ€ü[ùüvû7N„Ÿ äv ¹©¹(‹•4iÚ„!O aʧЯ_?³Æüú›¯iÜËôÏ=&,†¯ÍÀÁÁÄ5w(,,äÛo¿µ˜á­‰ôKéŒa[‰ˆˆÀÑÑñ|aa¡è©5ð •Œ@F¶k¶ ÅíO÷<ÜJðp+®ò»½äg%j%7ÏàºtÉ<݉۷o³{w(×·|•¯ ª©­´3+6))‰ððC<ûì³\‰5Ñ5geò2ââãØ¶m›è1øá¬]¬ Mój!ád-šµ0+“f~à|üGùc%¯›/[§Öw0Žm‘âßkdd$¹¹ ènB–;ÄŽ£,³Œ/!z<¥RÉÚ/×2dÁ½eaÅ Óèî{ç/³’áî뎻¯q‘ŸV©¥(³ˆ«ç¯2ú™Ñ¼9ãMæ~:WÔø™™™ü|ôg&›–õ¦ÌU’™Èß¾!j€ÐÐPš´k"*N% ÁÐÿjè<㛼°°°Ò¢¢¢ŸÌyYkj(×EŽ9róVÍÍúìí´÷50ÎŽ¥u¯L—`Ý&9Æ¡E‹f¿råR&ƒ‡î“¨Ö(Pª¨JmïnÊÊ?W;f8® TmKzz7’Θ•F»dÉüüü q š~¶¬ µ}ëKûöíE¹pñB“S$ãÂâX$^(**Šßû ÓÅ‘«sýøuzõîeÔ5ÕBv†Ðò‰–&ž3b28¿åCI¹^oX…UÙøc_•ã?·â8ÈÐ UëtVw?­Öš2áñî¾2ùÝßOŸIů½Y™M7n¤Q£¦\НÁ\óÃJáNGÞÂâBn§ß& @|ÜÕkVZ|·¸w‹ïš¸yê&>M|èß_|k‰ ]i™Uí3vO,Ìù@t&Í­[·Ø³gã××=ÿöåÛÈur† fÖù%%%ø:Ý¿H™ºŒk{®‘p0ÃÓ«W/³Æ ÁÊÉŠÆã—·_æ?ãÿC×]ñññÁ×ןûf>|XôEÿ^\ ½Jï¾½E­>¦MŸ†ÿP½L»v%F&âããc–ñ°²²BáT…“YQY¼aü¡jÕ>¿5:7lØä—j:à@UƒR±5Ü+m⮕Ðëe(U¶(Uu˜ù»"¿DþÊ7ß|cÖùeee«V6•7… ûª¥Ç|ô€iFMÙjC­V³rÕJžøè ‘ïÊ"¹ráJÑçEEEqþÂy&¼Z»[)'!‡Â”B³º /[¾ŒÖ·ÆÁ]¼ ¨:I§’xùÅ—Í.¤kïßžÄS‰<ä÷‘Ñ,É)!>"žØˆXúôéïg~¥ukó»k~¾èsCÛ’êPUÀܹsE)'N˜0ÅA‹é<¡³ÙíÌõ:=Q?F‘t4‰È_j•÷¾Ëºuë¸{‰‘KM/LJ:–ÄGo˜'¯àé鉧—'Wîâ?ÄÿÞ+WÊuåÈ­å¢2ûÊJËH¹–Â!Ʊ©Ý»w«JKKE×T`–Ȇ T@ÌížÜiýîFU£bêæ‚ÅµÁþ>$''ãîînVf* ¼h0¦AU#`!ùÚÒÂRN&°ã~ÑçnÙºW×Z[|W'õR*ò2ù=¥D¸$ÿ¡þXÛÕþÄì‰á­7ß]sSPP@ð¦`Qžû‘•ÎðyâÕ+Xµbƒ‡æÇ—Ä«™6ö6”d•“š.ÒßD~cV|¥2G%3/³Fébu‰¹µ\´ìn›6m˜1c›>ÝÄÃS¦YOÓµçuI‘I\þá2mZ´áâù‹x{›_8wîï}ðC 1YPª¬´Œ”+)f·øæëoøßgÿcËæ-ØÚÙâän0š:­FK™¶Œ2M¶¶Lúz’¨›´Ëi´ôki”±˜““CLLŒ=pÄÜyד¶ 6”c„|§°Ñ C–~…t‘mµGS~®í8Z2V´eÔWÛŒö%&&NìÓ§[»víÉdw·Ë—/óëÅ_iѿŦ¯âQd'ÞØc™ÿÑ|±Û]>_ô9mǵ…zª³Š a̘1&ÿcVPÑâ»ýÿ‰Džu¯TIDAT€Çî1¯@+%%…=¡{xvCíZÅ™ÅÜ:‹×z]ôüÖ­[‡wgoܼÅ],ï5­JK÷îÝkò=ðññ!öZ,×®]ãæÍ›(•J|}}iݺµYÞ{± píFµ«1X¯pP ÕhQ«Õ¢ ràçôx¤oÏz›¨ï¢ðjë…Gk<[xbëd‹\!G§Ö¡QjPå«ÈOʧ8©˜ä‹ÉtîÚ™­¶Š’7Žeèð¡ô~µ·¨º™” )tèÔÁì4{€2pà@ôz=999dgg#“ÉP(ØÚÚ¢P(˜ñæ n”Ý] ™~1Ñ#ŒU :„““ÓµÂÂÂ33“ð}á<³Vüêûö…Û {Ç8޶wï^uqqñn³'Ë_Ø€üÅÙ~îܹeÉÉÉ4köÇEÚÝÝ#Gp,G&›øÃö¬·g‰Îlª`AàÚ>ݶÞ€qGâèÙ³';v}î‚À´{ºè¹Eï‰6KWº°°àà`FËwVG]¬&þçxö\Ý#nr€:Ò°yõÕ)J/¢ýèATçêÕ«ôØkÑ.^H»í°VÜûRÒ UŽ9b–°±±aРA 4Ȭókcß¾}<ÿÏÓãÅ´ >TšSJ‡ö÷7Ðue媕´è#¾@³ ­U‘ŠÇ«zc#@³è`1¥Š‚ d;88Û¶m›QšÈ‹S^$ñD¢ÙbXiܾb^fÀÅ‹¹rõ ~O˜÷ÏZ‚^ 6,–ßÿXô¹§OŸ&19‘ÖýÅý“–d—|.™™3Ä·rùòË/Mv+ÅîeøðáUn LACŒeŒåTºôåzìæõ=ÃñãÇé÷D?ŠŠŠxç­wDuúMIIaïÞ½ø¿ÿûn> 9k׋Oo­oT*¯¼Âó/<Ïï=a–ñƒ»±e˺÷;»*•е_®¥ýÓâݾ©çSy¼ÿãFm].^¼ˆZ­Ö¿Õen’1“âââ/7nÜhÔ×~ذa ƒÛWn›õº1{b˜6M|fSŸ/úÿ‘þÈmê§kbâ™D7hÌ€¦W7W0?p¾hE<€è½ÑLžûqÞxó .\¸P—©×™S§NÑûÑÞL™>…6Ú0àýعÖm¥çÒÔ…ß~«Óü=Ñëõ,ZºÈðýI¹¶œ[Q·jÌäܵk—R©TŠ÷ÕVC2 f"‚ÖÊÊjû×_mÔdjÊ S¸qú:µ¸þS1ûb;v¬è- ÂïI¿zëæyûÊm•Àøñâ‹ä-YDÛÁâ[|LjaРA´jÕªö'WB.^hÒ?ž Äî5Ï0FFF=¿–uºy»áÔÀ‰µk-çú)--%hií;¶'×)—Q_ŒÂÞÕžëǯóî;ânZÖ}¹Ž¦ÝššÜæÜ«¥=§õ¤ÿÀþlþz³9Ó7µZÍ–-[èØ¥#c&ŒÁ¦³ cV±H›}€VƒZ1oÁ<’“-¯Òº{÷n[ï.âÛœ¤_K§±wc#÷ZQQ—.]rÖu~’©¥¥¥ÁÛ·o׫ÕU=Yݺu£y‹æÜ,ö­˜LQQ!!!Œ?O/O毞÷(oÆmGûí-*ØÔ¸ccüû1tøPª{$êÊ= 4M íb#‡×$9r‡‚ dÔu~RVᜳ³sfXX˜Ï„ U+œ_yñÖ~·–6MS2‹;Gã†III!%%Eô\Âö†áÛÝ×ìªÝÚÈ¿•OÒù$¼g{!®ïÚŽ´|´¥è ’ëÇ®ãâèBnn®è1?ýìSüGù›ÔH06,–QýG‰£  €£?5¹[«X<š{0è“A¼úÖ«|ò=3^Á€LJíÍÎÎæÐ¡Cü¸ëG8H³GšÑçí>U|ý:­ŽØý±Œ=VÔ{?sæ ÎMœñj-.&àÕÚ‹1kÆw(ŽI/LÂÕÉ•Q#G1|èp{ì1³š8‚¡…LTT‘g#Ùp?ÑW¢ñnçM“žM³fLýu¹½ÃÃÏ?L̆F¿¾ýøïäÿÒ»wo|}Å¥«ƒÁ@'%%qòäI’Ó’yäQó2:3.f0üeã:˜ððpmiiiÒw+ ‚$Uärù‡üäðáÃU©øøø0á« &}y-=†&Gcö<ô‚ž^½ðlY?•ƒ—w_&ílšY’¢zAOŸ×û˜Ô¥µ2'WDuÛ¼à¤ÌFÆ OÕZM\x»S_œÂÚʼ{)ïÞt×ɬsME«Ôs †GoP®.§K×.têÐ wWw\]]±±±!//ÌìL¢c£¹våEEE4ëÚŒ]в_Kì]ÓŸS/¦õ}r+q«'Aèü|g³Ü*•Ñ—ë¹}ù6·/Ý&ër‰x5ð¢UëVø·ñç!χppp¸»éõz4 jµšÌ¬Ln$Ý %%…ĉ‚@ÃV qmáJÃN iܱqÍE§Ös †¬KYÜŽ»££#w˜† âîæŽ›‹ÎÎÎØÛÛ£T*)..&¿ Ÿ¼Â<’’’HLL$+= gwgÜ»ÓjX+Zõç¾CæâÎWw’Ÿ—od”4h ÊÎÎ.‰º¾_É€Ô™LÖT.—'¥¦¦Ê«kf:˜¢†Et}¶ëšÄ?â¬brrÈKÎC§ÔQ®.G(°q¶Aá¢À¥‘ Ígù"# ‰êdFe2b¸qóÐýû÷ë4M¨%Ç’ ˆå8QVVVTS>ý‹S_$éxÒŸ?# ‰ª|™73yòÉ'Ž…††jÊÊÊÂ-9žd@,„ BYYÙÆàà`£bŽW^~…[—nQœYü ¦&!!ñ/!áX}ëkÔ×,11‘ŒŒ PçÚÊHÄ‚”••m ·ÎÏϯ²¿I“&Œ?žè½Ñhfÿn½ÉkÓ_3Ú†]¤ FÄë‚d@,ˆ  ööö7oÞlT9gÖâÇ¡UjÄÔ$$$þá¤_MG[¢eìØ±FÇV¯^­,))±xKÉ€X˜âââÏ—,Y¢®LïÒ¥ =zö &"æÍLBBâŸL‘¦N™j$uöìYRRRdÀ–S2 –'L©TæíܹÓèÀû³ß'.<½®N.UÐ*µÜ8}ƒi¯L3:¶~ýz °M‹«^I½°ê™LöZÇŽƒ®\¹R¥‹™ øùûá;Ò×âúÿ^®í»†îŠŽ³¿œ­²¿¸¸//¯2­VÛS„K÷8Ýl¤Hý°åúõëú“'OVÙ)“ɘ?w>¾¹€ºØ¢±, ‰)¥¥\¹Ì¬·gûþûï±±±‰¯ã’©AP•——¯ ,­~lÒ¤I 0ˆ3k,+<#!!ñ/D€Ó+O3ä©!5*…®ZµJ©T*WÕ×ð’ «žÉd¬­­o]»vͦM›ª¢R´ïÔžÖ£[׫~¹„„Ä?›+{®v$«—¯âì\U^øâÅ‹ôìÙS£Óé¡^ŠÐ¤H=!B†µµõ–™3gùªÜÜÜøaûü¶å7 R Äô$$$þæäÜÌáâö‹ì Ùid<ÞxãR¹\¾±¾ŒH¤^Q«ÕsNž<©®)#«_¿~¼ûλœZqJÊÊ’…N£ãÔòSÌŸ7Ÿ=zÿî»ïøý÷ßUæ£úœ‡äªgd2ÙOOÏ­7oÞ´sqq©rL§ÓÑ÷±¾äÉòè?«UÈ$$$þùh•ZŽ/9N+ÏV>xØHfº  €-Zh žaW}ÎEZÔ3‚ „h4š3F%èÖÖÖ9t_'_~ru‘”™%!!qoŠÒ‹/œÞÝjd<¦OŸ®-++;Y߯$ò§PRRòÜž={ò|U...:pˆA½öV×]iQ(!!Q‰ò²r.ïºLØ»aL™8…°Ð0œœŒuàçÏŸ_¾wïÞ,¥RùÜŸ1/É…õ'!“É:)Šs!!!vO?ýtÏ cæ[3ÑÛéi7¦>Ý}°VHn- ‰+êb5Ig’¸ºó*>M|X¹|%ýû÷¯ñ¹;vì`êÔ©%jµúaA®ÿó“ ÈŸˆL&ekk»s÷îÝŠa̠Ϳ­VËêÕ«Y»~-é·ÓiÞ³9^í½pðpÀÑÃ{w{¬äÒÂQBâŸFYiª<ª|%Y%d\Ì õZ*»täÃ9òì³ÏÖè²gܸqFó” 'k|R= ?™L6^¡P|»bÅ Åk¯½Vó·áÑÑÑü´ë'~‰ü…´Ûidef‘—GõFììñjèE£FðõñeØSÃ1b5ºïy‹/Öúé§Fó¬ Œª É€<d2Y/;;»ˆiÓ¦9-_¾ÜZ.—?è)IHHüÍP«ÕL:U–£R© ‚ð§ I¾€ ¿ªÕê®ÁÁÁ‰íÛ·/=zô胞’„„Ä߈ü‘6mÚ”îÛ·ïw•JÕéAV ™Lf-“É^S(Ÿ0@¾jÕ*{??©K¯„„DÍ;vŒ7ß|³ôÆ%*•êà;á^Ä%ò@&“¹ÛÙÙÍÓét¯>üðÃÚÉ“';<óÌ3x{{?è©IHH<`®\¹ÂîÝ»…íÛ·«e:nž^¯_eiyZs È_™LÖxÆÙÙy²R©ìѹsçÒ:øøøÈš6mJÓ¦MiÒ¤‰‘☄„ÄߟÂÂBÒÒÒHKK#55•äädýÞ½{ÕÙÙÙr[[Û#%%%;€½‚ >è¹V ¿(2™Ì meeÕÕÁÁ¡¥••U3N×P­V{èõz)ê.!ñÃÆÆFmgg—-—Ëoëtºd•JuS¯×ŸÿV5ñÿeª-\í­ýIEND®B`‚thrust-1.9.5/doc/thrust_logo.svg000066400000000000000000000276431344621116200167320ustar00rootroot00000000000000 image/svg+xml Thrust thrust-1.9.5/examples/000077500000000000000000000000001344621116200146755ustar00rootroot00000000000000thrust-1.9.5/examples/CMakeLists.txt000066400000000000000000000020151344621116200174330ustar00rootroot00000000000000# message(STATUS "Adding \"examples\"") #aux_source_directory("testing" sources) FILE(GLOB SOURCES_CU *.cu) FILE(GLOB SOURCES_CPP *.cpp) set(SOURCES ${SOURCES_CU}) list(LENGTH SOURCES index) message(STATUS "Found ${index} examples") set(targets "") foreach (src ${SOURCES}) get_filename_component(exec_name ${src} NAME_WE) set(target example-${exec_name}) thrust_add_executable(${target} ${src}) set_target_properties(${target} PROPERTIES OUTPUT_NAME ${exec_name}) install(TARGETS ${target} DESTINATION "examples/${HOST_BACKEND}_host_${DEVICE_BACKEND}_device_${THRUST_MODE}" OPTIONAL COMPONENT examples-bin) list(APPEND targets ${target}) endforeach() add_subdirectory(cuda) add_subdirectory(omp) add_subdirectory(cpp_integration) add_custom_target(examples-bin DEPENDS ${targets}) add_custom_target(install-examples-bin COMMAND "${CMAKE_COMMAND}" -DCMAKE_INSTALL_COMPONENT=examples-bin -P "${CMAKE_BINARY_DIR}/cmake_install.cmake" ) install(FILES ${SOURCES} DESTINATION "examples" COMPONENT examples) thrust-1.9.5/examples/README000066400000000000000000000006211344621116200155540ustar00rootroot00000000000000Once Thrust has been installed, these example programs can be compiled directly with nvcc. For example, the following command will compile the norm example. $ nvcc norm.cu -o norm These examples are also available online: https://github.com/thrust/thrust/tree/master/examples For additional information refer to the Quick Start Guide: https://github.com/thrust/thrust/wiki/Quick-Start-Guide thrust-1.9.5/examples/SConscript000066400000000000000000000014041344621116200167060ustar00rootroot00000000000000import os Import('env') # create a clone of the environment so that we don't alter the parent my_env = env.Clone() # find all .cus & .cpps in the current directory sources = [] directories = ['.'] # find all .cus & .cpps in the current directory sources = [] directories = ['.', my_env['device_backend']] extensions = ['.cu','.cpp'] for dir in directories: for ext in extensions: regex = os.path.join(dir, '*' + ext) sources.extend(my_env.Glob(regex)) # compile examples for src in sources: program = my_env.Program(src) # add the program to the 'run_examples' alias program_alias = my_env.Alias('run_examples', [program], program[0].abspath) # always build the 'run_examples' target whether or not it needs it my_env.AlwaysBuild(program_alias) thrust-1.9.5/examples/arbitrary_transformation.cu000066400000000000000000000047031344621116200223570ustar00rootroot00000000000000#include #include #include #include // This example shows how to implement an arbitrary transformation of // the form output[i] = F(first[i], second[i], third[i], ... ). // In this example, we use a function with 3 inputs and 1 output. // // Iterators for all four vectors (3 inputs + 1 output) are "zipped" // into a single sequence of tuples with the zip_iterator. // // The arbitrary_functor receives a tuple that contains four elements, // which are references to values in each of the four sequences. When we // access the tuple 't' with the get() function, // get<0>(t) returns a reference to A[i], // get<1>(t) returns a reference to B[i], // get<2>(t) returns a reference to C[i], // get<3>(t) returns a reference to D[i]. // // In this example, we can implement the transformation, // D[i] = A[i] + B[i] * C[i]; // by invoking arbitrary_functor() on each of the tuples using for_each. // // Note that we could extend this example to implement functions with an // arbitrary number of input arguments by zipping more sequence together. // With the same approach we can have multiple *output* sequences, if we // wanted to implement something like // D[i] = A[i] + B[i] * C[i]; // E[i] = A[i] + B[i] + C[i]; // // The possibilities are endless! :) struct arbitrary_functor { template __host__ __device__ void operator()(Tuple t) { // D[i] = A[i] + B[i] * C[i]; thrust::get<3>(t) = thrust::get<0>(t) + thrust::get<1>(t) * thrust::get<2>(t); } }; int main(void) { // allocate storage thrust::device_vector A(5); thrust::device_vector B(5); thrust::device_vector C(5); thrust::device_vector D(5); // initialize input vectors A[0] = 3; B[0] = 6; C[0] = 2; A[1] = 4; B[1] = 7; C[1] = 5; A[2] = 0; B[2] = 2; C[2] = 7; A[3] = 8; B[3] = 1; C[3] = 4; A[4] = 2; B[4] = 8; C[4] = 3; // apply the transformation thrust::for_each(thrust::make_zip_iterator(thrust::make_tuple(A.begin(), B.begin(), C.begin(), D.begin())), thrust::make_zip_iterator(thrust::make_tuple(A.end(), B.end(), C.end(), D.end())), arbitrary_functor()); // print the output for(int i = 0; i < 5; i++) std::cout << A[i] << " + " << B[i] << " * " << C[i] << " = " << D[i] << std::endl; } thrust-1.9.5/examples/basic_vector.cu000066400000000000000000000017361344621116200177000ustar00rootroot00000000000000#include #include #include int main(void) { // H has storage for 4 integers thrust::host_vector H(4); // initialize individual elements H[0] = 14; H[1] = 20; H[2] = 38; H[3] = 46; // H.size() returns the size of vector H std::cout << "H has size " << H.size() << std::endl; // print contents of H for(size_t i = 0; i < H.size(); i++) std::cout << "H[" << i << "] = " << H[i] << std::endl; // resize H H.resize(2); std::cout << "H now has size " << H.size() << std::endl; // Copy host_vector H to device_vector D thrust::device_vector D = H; // elements of D can be modified D[0] = 99; D[1] = 88; // print contents of D for(size_t i = 0; i < D.size(); i++) std::cout << "D[" << i << "] = " << D[i] << std::endl; // H and D are automatically deleted when the function returns return 0; } thrust-1.9.5/examples/bounding_box.cu000066400000000000000000000046641344621116200177150ustar00rootroot00000000000000#include #include #include #include #include // This example shows how to compute a bounding box // for a set of points in two dimensions. struct point2d { float x, y; __host__ __device__ point2d() : x(0), y(0) {} __host__ __device__ point2d(float _x, float _y) : x(_x), y(_y) {} }; // bounding box type struct bbox { // construct an empty box __host__ __device__ bbox() {} // construct a box from a single point __host__ __device__ bbox(const point2d &point) : lower_left(point), upper_right(point) {} // construct a box from a single point __host__ __device__ bbox& operator=(const point2d &point) { lower_left = point; upper_right = point; return *this; } // construct a box from a pair of points __host__ __device__ bbox(const point2d &ll, const point2d &ur) : lower_left(ll), upper_right(ur) {} point2d lower_left, upper_right; }; // reduce a pair of bounding boxes (a,b) to a bounding box containing a and b struct bbox_reduction : public thrust::binary_function { __host__ __device__ bbox operator()(bbox a, bbox b) { // lower left corner point2d ll(thrust::min(a.lower_left.x, b.lower_left.x), thrust::min(a.lower_left.y, b.lower_left.y)); // upper right corner point2d ur(thrust::max(a.upper_right.x, b.upper_right.x), thrust::max(a.upper_right.y, b.upper_right.y)); return bbox(ll, ur); } }; int main(void) { const size_t N = 40; thrust::default_random_engine rng; thrust::uniform_real_distribution u01(0.0f, 1.0f); // allocate storage for points thrust::device_vector points(N); // generate some random points in the unit square for(size_t i = 0; i < N; i++) { float x = u01(rng); float y = u01(rng); points[i] = point2d(x,y); } // initial bounding box contains first point bbox init = bbox(points[0], points[0]); // binary reduction operation bbox_reduction binary_op; // compute the bounding box for the point set bbox result = thrust::reduce(points.begin(), points.end(), init, binary_op); // print output std::cout << "bounding box " << std::fixed; std::cout << "(" << result.lower_left.x << "," << result.lower_left.y << ") "; std::cout << "(" << result.upper_right.x << "," << result.upper_right.y << ")" << std::endl; return 0; } thrust-1.9.5/examples/bucket_sort2d.cu000066400000000000000000000071341344621116200200050ustar00rootroot00000000000000#include #include #include #include #include #include #include #include #include // define a 2d float vector typedef thrust::tuple vec2; // return a random vec2 in [0,1)^2 vec2 make_random_vec2(void) { static thrust::default_random_engine rng; static thrust::uniform_real_distribution u01(0.0f, 1.0f); float x = u01(rng); float y = u01(rng); return vec2(x,y); } // hash a point in the unit square to the index of // the grid bucket that contains it struct point_to_bucket_index : public thrust::unary_function { unsigned int width; // buckets in the x dimension (grid spacing = 1/width) unsigned int height; // buckets in the y dimension (grid spacing = 1/height) __host__ __device__ point_to_bucket_index(unsigned int width, unsigned int height) : width(width), height(height) {} __host__ __device__ unsigned int operator()(const vec2& v) const { // find the raster indices of p's bucket unsigned int x = static_cast(thrust::get<0>(v) * width); unsigned int y = static_cast(thrust::get<1>(v) * height); // return the bucket's linear index return y * width + x; } }; int main(void) { const size_t N = 1000000; // allocate some random points in the unit square on the host thrust::host_vector h_points(N); thrust::generate(h_points.begin(), h_points.end(), make_random_vec2); // transfer to device thrust::device_vector points = h_points; // allocate storage for a 2D grid // of dimensions w x h unsigned int w = 200, h = 100; // the grid data structure keeps a range per grid bucket: // each bucket_begin[i] indexes the first element of bucket i's list of points // each bucket_end[i] indexes one past the last element of bucket i's list of points thrust::device_vector bucket_begin(w*h); thrust::device_vector bucket_end(w*h); // allocate storage for each point's bucket index thrust::device_vector bucket_indices(N); // transform the points to their bucket indices thrust::transform(points.begin(), points.end(), bucket_indices.begin(), point_to_bucket_index(w,h)); // sort the points by their bucket index thrust::sort_by_key(bucket_indices.begin(), bucket_indices.end(), points.begin()); // find the beginning of each bucket's list of points thrust::counting_iterator search_begin(0); thrust::lower_bound(bucket_indices.begin(), bucket_indices.end(), search_begin, search_begin + w*h, bucket_begin.begin()); // find the end of each bucket's list of points thrust::upper_bound(bucket_indices.begin(), bucket_indices.end(), search_begin, search_begin + w*h, bucket_end.begin()); // write out bucket (150, 50)'s list of points unsigned int bucket_idx = 50 * w + 150; std::cout << "bucket (150, 50)'s list of points:" << std::endl; std::cout << std::fixed << std::setprecision(6); for(unsigned int point_idx = bucket_begin[bucket_idx]; point_idx != bucket_end[bucket_idx]; ++point_idx) { vec2 p = points[point_idx]; std::cout << "(" << thrust::get<0>(p) << "," << thrust::get<1>(p) << ")" << std::endl; } return 0; } thrust-1.9.5/examples/constant_iterator.cu000066400000000000000000000013211344621116200207650ustar00rootroot00000000000000#include #include #include #include #include #include #include int main(void) { thrust::device_vector data(4); data[0] = 3; data[1] = 7; data[2] = 2; data[3] = 5; // add 10 to all values in data thrust::transform(data.begin(), data.end(), thrust::constant_iterator(10), data.begin(), thrust::plus()); // data is now [13, 17, 12, 15] // print result thrust::copy(data.begin(), data.end(), std::ostream_iterator(std::cout, "\n")); return 0; } thrust-1.9.5/examples/counting_iterator.cu000066400000000000000000000025661344621116200207760ustar00rootroot00000000000000#include #include #include #include #include #include int main(void) { // this example computes indices for all the nonzero values in a sequence // sequence of zero and nonzero values thrust::device_vector stencil(8); stencil[0] = 0; stencil[1] = 1; stencil[2] = 1; stencil[3] = 0; stencil[4] = 0; stencil[5] = 1; stencil[6] = 0; stencil[7] = 1; // storage for the nonzero indices thrust::device_vector indices(8); // counting iterators define a sequence [0, 8) thrust::counting_iterator first(0); thrust::counting_iterator last = first + 8; // compute indices of nonzero elements typedef thrust::device_vector::iterator IndexIterator; IndexIterator indices_end = thrust::copy_if(first, last, stencil.begin(), indices.begin(), thrust::identity()); // indices now contains [1,2,5,7] // print result std::cout << "found " << (indices_end - indices.begin()) << " nonzero values at indices:\n"; thrust::copy(indices.begin(), indices_end, std::ostream_iterator(std::cout, "\n")); return 0; } thrust-1.9.5/examples/cpp_integration/000077500000000000000000000000001344621116200200625ustar00rootroot00000000000000thrust-1.9.5/examples/cpp_integration/CMakeLists.txt000066400000000000000000000016071344621116200226260ustar00rootroot00000000000000FILE(GLOB SOURCES_CU *.cu) FILE(GLOB SOURCES_CPP *.cpp) FILE(GLOB SOURCES_H *.h) set(SOURCES_BACKEND ${SOURCES_CU} ${SOURCES_CPP} ${SOURCES_H}) list(APPEND SOURCES_BACKEND "README") install(FILES ${SOURCES_BACKEND} DESTINATION "examples/cpp_integration" COMPONENT examples) if (NOT "x${DEVICE_BACKEND}" STREQUAL "xCUDA") return() endif() list(LENGTH SOURCES_BACKEND index) message(STATUS "Found ${index} examples/cpp_integration") set(targets_backend "") set(exec_name "cpp_integration") set(target example-${exec_name}) thrust_add_executable(${target} ${SOURCES_BACKEND}) set_target_properties(${target} PROPERTIES OUTPUT_NAME ${exec_name}) install(TARGETS ${target} DESTINATION "examples/cpp_integration/${HOST_BACKEND}_host_${DEVICE_BACKEND}_device_${THRUST_MODE}" OPTIONAL COMPONENT examples-bin) list(APPEND targets_backend ${target}) set(targets ${targets} ${targets_backend} PARENT_SCOPE)thrust-1.9.5/examples/cpp_integration/README000066400000000000000000000015521344621116200207450ustar00rootroot00000000000000This example shows how to link a Thrust program contained in a .cu file with a C++ program contained in a .cpp file. Note that device_vector only appears in the .cu file while host_vector appears in both. This relects the fact that algorithms on device vectors are only available when the contents of the program are located in a .cu file and compiled with the nvcc compiler. On a Linux system where Thrust is installed in the default location we can use the following procedure to compile the two parts of the program and link them together. $ nvcc -O2 -c device.cu $ g++ -O2 -c host.cpp -I/usr/local/cuda/include/ $ nvcc -o tester device.o host.o Alternatively, we can use g++ to perform final linking step. $ nvcc -O2 -c device.cu $ g++ -O2 -c host.cpp -I/usr/local/cuda/include/ $ g++ -o tester device.o host.o -L/usr/local/cuda/lib64 -lcudart thrust-1.9.5/examples/cpp_integration/device.cu000066400000000000000000000006471344621116200216610ustar00rootroot00000000000000#include #include #include #include "device.h" void sort_on_device(thrust::host_vector& h_vec) { // transfer data to the device thrust::device_vector d_vec = h_vec; // sort data on the device thrust::sort(d_vec.begin(), d_vec.end()); // transfer data back to host thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin()); } thrust-1.9.5/examples/cpp_integration/device.h000066400000000000000000000001701344621116200214700ustar00rootroot00000000000000#pragma once #include // function prototype void sort_on_device(thrust::host_vector& V); thrust-1.9.5/examples/cpp_integration/host.cpp000066400000000000000000000011531344621116200215430ustar00rootroot00000000000000#include #include #include #include #include #include #include // defines the function prototype #include "device.h" int main(void) { // generate 20 random numbers on the host thrust::host_vector h_vec(20); thrust::default_random_engine rng; thrust::generate(h_vec.begin(), h_vec.end(), rng); // interface to CUDA code sort_on_device(h_vec); // print sorted array thrust::copy(h_vec.begin(), h_vec.end(), std::ostream_iterator(std::cout, "\n")); return 0; } thrust-1.9.5/examples/cuda/000077500000000000000000000000001344621116200156115ustar00rootroot00000000000000thrust-1.9.5/examples/cuda/CMakeLists.txt000066400000000000000000000016021344621116200203500ustar00rootroot00000000000000 FILE(GLOB SOURCES_CU *.cu) FILE(GLOB SOURCES_CPP *.cpp) FILE(GLOB SOURCES_H *.h) set(SOURCES_BACKEND ${SOURCES_CU} ${SOURCES_CPP} ${SOURCES_H}) install(FILES ${SOURCES_BACKEND} DESTINATION "examples/cuda" COMPONENT examples) if (NOT "x${DEVICE_BACKEND}" STREQUAL "xCUDA") return() endif() list(LENGTH SOURCES_BACKEND index) message(STATUS "Found ${index} examples/cuda") set(targets_backend "") foreach (src ${SOURCES_BACKEND}) get_filename_component(exec_name ${src} NAME_WE) set(target example-${exec_name}) thrust_add_executable(${target} ${src}) set_target_properties(${target} PROPERTIES OUTPUT_NAME ${exec_name}) install(TARGETS ${target} DESTINATION "examples/cuda/${HOST_BACKEND}_host_${DEVICE_BACKEND}_device_${THRUST_MODE}" OPTIONAL COMPONENT examples-bin) list(APPEND targets_backend ${target}) endforeach() set(targets ${targets} ${targets_backend} PARENT_SCOPE) thrust-1.9.5/examples/cuda/async_reduce.cu000066400000000000000000000051441344621116200206120ustar00rootroot00000000000000#include #include #include #include #if __cplusplus >= 201103L #include #endif // This example demonstrates two ways to achieve algorithm invocations that are asynchronous with // the calling thread. // // The first method wraps a call to thrust::reduce inside a __global__ function. Since __global__ function // launches are asynchronous with the launching thread, this achieves asynchrony. The result of the reduction // is stored to a pointer to CUDA global memory. The calling thread waits for the result of the reduction to // be ready by synchronizing with the CUDA stream on which the __global__ function is launched. // // The second method uses the C++11 library function, std::async, to create concurrency. The lambda function // given to std::async returns the result of thrust::reduce to a std::future. The calling thread can use the // std::future to wait for the result of the reduction. This method requires a compiler which supports // C++11-capable language and library constructs. template __global__ void reduce_kernel(Iterator first, Iterator last, T init, BinaryOperation binary_op, Pointer result) { *result = thrust::reduce(thrust::cuda::par, first, last, init, binary_op); } int main() { size_t n = 1 << 20; thrust::device_vector data(n, 1); thrust::device_vector result(1, 0); // method 1: call thrust::reduce from an asynchronous CUDA kernel launch // create a CUDA stream cudaStream_t s; cudaStreamCreate(&s); // launch a CUDA kernel with only 1 thread on our stream reduce_kernel<<<1,1,0,s>>>(data.begin(), data.end(), 0, thrust::plus(), result.data()); // wait for the stream to finish cudaStreamSynchronize(s); // our result should be ready assert(result[0] == n); cudaStreamDestroy(s); // reset the result result[0] = 0; #if __cplusplus >= 201103L // method 2: use std::async to create asynchrony // copy all the algorithm parameters auto begin = data.begin(); auto end = data.end(); unsigned int init = 0; auto binary_op = thrust::plus(); // std::async captures the algorithm parameters by value // use std::launch::async to ensure the creation of a new thread std::future future_result = std::async(std::launch::async, [=] { return thrust::reduce(begin, end, init, binary_op); }); // wait on the result and check that it is correct assert(future_result.get() == n); #endif return 0; } thrust-1.9.5/examples/cuda/custom_temporary_allocation.cu000066400000000000000000000115351344621116200237700ustar00rootroot00000000000000#include #include #include #include #include #include #include #include #include #include #include // This example demonstrates how to intercept calls to get_temporary_buffer // and return_temporary_buffer to control how Thrust allocates temporary storage // during algorithms such as thrust::sort. The idea will be to create a simple // cache of allocations to search when temporary storage is requested. If a hit // is found in the cache, we quickly return the cached allocation instead of // resorting to the more expensive thrust::cuda::malloc. // // Note: this implementation cached_allocator is not thread-safe. If multiple // (host) threads use the same cached_allocator then they should gain exclusive // access to the allocator before accessing its methods. struct not_my_pointer { not_my_pointer(void* p) : message() { std::stringstream s; s << "Pointer `" << p << "` was not allocated by this allocator."; message = s.str(); } virtual ~not_my_pointer() {} virtual const char* what() const { return message.c_str(); } private: std::string message; }; // A simple allocator for caching cudaMalloc allocations. struct cached_allocator { typedef char value_type; cached_allocator() {} ~cached_allocator() { free_all(); } char *allocate(std::ptrdiff_t num_bytes) { std::cout << "cached_allocator::allocate(): num_bytes == " << num_bytes << std::endl; char *result = 0; // Search the cache for a free block. free_blocks_type::iterator free_block = free_blocks.find(num_bytes); if (free_block != free_blocks.end()) { std::cout << "cached_allocator::allocate(): found a free block" << std::endl; result = free_block->second; // Erase from the `free_blocks` map. free_blocks.erase(free_block); } else { // No allocation of the right size exists, so create a new one with // `thrust::cuda::malloc`. try { std::cout << "cached_allocator::allocate(): allocating new block" << std::endl; // Allocate memory and convert the resulting `thrust::cuda::pointer` to // a raw pointer. result = thrust::cuda::malloc(num_bytes).get(); } catch (std::runtime_error&) { throw; } } // Insert the allocated pointer into the `allocated_blocks` map. allocated_blocks.insert(std::make_pair(result, num_bytes)); return result; } void deallocate(char *ptr, size_t) { std::cout << "cached_allocator::deallocate(): ptr == " << reinterpret_cast(ptr) << std::endl; // Erase the allocated block from the allocated blocks map. allocated_blocks_type::iterator iter = allocated_blocks.find(ptr); if (iter == allocated_blocks.end()) throw not_my_pointer(reinterpret_cast(ptr)); std::ptrdiff_t num_bytes = iter->second; allocated_blocks.erase(iter); // Insert the block into the free blocks map. free_blocks.insert(std::make_pair(num_bytes, ptr)); } private: typedef std::multimap free_blocks_type; typedef std::map allocated_blocks_type; free_blocks_type free_blocks; allocated_blocks_type allocated_blocks; void free_all() { std::cout << "cached_allocator::free_all()" << std::endl; // Deallocate all outstanding blocks in both lists. for ( free_blocks_type::iterator i = free_blocks.begin() ; i != free_blocks.end() ; ++i) { // Transform the pointer to cuda::pointer before calling cuda::free. thrust::cuda::free(thrust::cuda::pointer(i->second)); } for( allocated_blocks_type::iterator i = allocated_blocks.begin() ; i != allocated_blocks.end() ; ++i) { // Transform the pointer to cuda::pointer before calling cuda::free. thrust::cuda::free(thrust::cuda::pointer(i->first)); } } }; int main() { std::size_t num_elements = 32768; thrust::host_vector h_input(num_elements); // Generate random input. thrust::generate(h_input.begin(), h_input.end(), rand); thrust::cuda::vector d_input = h_input; thrust::cuda::vector d_result(num_elements); std::size_t num_trials = 5; cached_allocator alloc; for (std::size_t i = 0; i < num_trials; ++i) { d_result = d_input; // Pass alloc through cuda::par as the first parameter to sort // to cause allocations to be handled by alloc during sort. thrust::sort(thrust::cuda::par(alloc), d_result.begin(), d_result.end()); // Ensure the result is sorted. assert(thrust::is_sorted(d_result.begin(), d_result.end())); } return 0; } thrust-1.9.5/examples/cuda/range_view.cu000066400000000000000000000126261344621116200202770ustar00rootroot00000000000000#include #include #include #include #include // This example demonstrates the use of a view: a non-owning wrapper for an // iterator range which presents a container-like interface to the user. // // For example, a view of a device_vector's data can be helpful when we wish to // access that data from a device function. Even though device_vectors are not // accessible from device functions, the range_view class allows us to access // and manipulate its data as if we were manipulating a real container. template class range_view { public: typedef Iterator iterator; typedef typename thrust::iterator_traits::value_type value_type; typedef typename thrust::iterator_traits::pointer pointer; typedef typename thrust::iterator_traits::difference_type difference_type; typedef typename thrust::iterator_traits::reference reference; private: const iterator first; const iterator last; public: __host__ __device__ range_view(Iterator first, Iterator last) : first(first), last(last) {} __host__ __device__ ~range_view() {} __host__ __device__ difference_type size() const { return thrust::distance(first, last); } __host__ __device__ reference operator[](difference_type n) { return *(first + n); } __host__ __device__ const reference operator[](difference_type n) const { return *(first + n); } __host__ __device__ iterator begin() { return first; } __host__ __device__ const iterator cbegin() const { return first; } __host__ __device__ iterator end() { return last; } __host__ __device__ const iterator cend() const { return last; } __host__ __device__ thrust::reverse_iterator rbegin() { return thrust::reverse_iterator(end()); } __host__ __device__ const thrust::reverse_iterator crbegin() const { return thrust::reverse_iterator(cend()); } __host__ __device__ thrust::reverse_iterator rend() { return thrust::reverse_iterator(begin()); } __host__ __device__ const thrust::reverse_iterator crend() const { return thrust::reverse_iterator(cbegin()); } __host__ __device__ reference front() { return *begin(); } __host__ __device__ const reference front() const { return *cbegin(); } __host__ __device__ reference back() { return *end(); } __host__ __device__ const reference back() const { return *cend(); } __host__ __device__ bool empty() const { return size() == 0; } }; // This helper function creates a range_view from iterator and the number of // elements template range_view __host__ __device__ make_range_view(Iterator first, Size n) { return range_view(first, first+n); } // This helper function creates a range_view from a pair of iterators template range_view __host__ __device__ make_range_view(Iterator first, Iterator last) { return range_view(first, last); } // This helper function creates a range_view from a Vector template range_view __host__ make_range_view(Vector& v) { return range_view(v.begin(), v.end()); } // This saxpy functor stores view of X, Y, Z array, and accesses them in // vector-like way template struct saxpy_functor : public thrust::unary_function { const float a; View1 x; View2 y; View3 z; __host__ __device__ saxpy_functor(float _a, View1 _x, View2 _y, View3 _z) : a(_a), x(_x), y(_y), z(_z) { } __host__ __device__ void operator()(int i) { z[i] = a * x[i] + y[i]; } }; // saxpy function, which can either be called form host or device // The views are passed by value template __host__ __device__ void saxpy(float A, View1 X, View2 Y, View3 Z) { // Z = A * X + Y const int size = X.size(); thrust::for_each(thrust::device, thrust::make_counting_iterator(0), thrust::make_counting_iterator(size), saxpy_functor(A,X,Y,Z)); } struct f1 : public thrust::unary_function { __host__ __device__ float operator()(float x) const { return x*3; } }; int main() { using std::cout; using std::endl; // initialize host arrays float x[4] = {1.0, 1.0, 1.0, 1.0}; float y[4] = {1.0, 2.0, 3.0, 4.0}; float z[4] = {0.0}; thrust::device_vector X(x, x + 4); thrust::device_vector Y(y, y + 4); thrust::device_vector Z(z, z + 4); saxpy( 2.0, // make a range view of a pair of transform_iterators make_range_view(thrust::make_transform_iterator(X.cbegin(), f1()), thrust::make_transform_iterator(X.cend(), f1())), // range view of normal_iterators make_range_view(Y.begin(), thrust::distance(Y.begin(), Y.end())), // range view of naked pointers make_range_view(Z.data().get(), 4)); // print values from original device_vector Z // to ensure that range view was mapped to this vector for (int i = 0, n = Z.size(); i < n; ++i) { cout << "z[" << i << "]= " << Z[i] << endl; } return 0; } thrust-1.9.5/examples/cuda/unwrap_pointer.cu000066400000000000000000000013541344621116200212210ustar00rootroot00000000000000#include #include #include #include #include int main(void) { size_t N = 10; // create a device_ptr thrust::device_ptr dev_ptr = thrust::device_malloc(N); // extract raw pointer from device_ptr int * raw_ptr = thrust::raw_pointer_cast(dev_ptr); // use raw_ptr in CUDA API functions cudaMemset(raw_ptr, 0, N * sizeof(int)); // free memory thrust::device_free(dev_ptr); // we can use the same approach for device_vector thrust::device_vector d_vec(N); // note: d_vec.data() returns a device_ptr raw_ptr = thrust::raw_pointer_cast(d_vec.data()); return 0; } thrust-1.9.5/examples/cuda/wrap_pointer.cu000066400000000000000000000011061344621116200206510ustar00rootroot00000000000000#include #include #include int main(void) { size_t N = 10; // obtain raw pointer to device memory int * raw_ptr; cudaMalloc((void **) &raw_ptr, N * sizeof(int)); // wrap raw pointer with a device_ptr thrust::device_ptr dev_ptr = thrust::device_pointer_cast(raw_ptr); // use device_ptr in Thrust algorithms thrust::fill(dev_ptr, dev_ptr + N, (int) 0); // access device memory transparently through device_ptr dev_ptr[0] = 1; // free memory cudaFree(raw_ptr); return 0; } thrust-1.9.5/examples/device_ptr.cu000066400000000000000000000024141344621116200173530ustar00rootroot00000000000000#include #include #include #include #include #include #include int main(void) { // allocate memory buffer to store 10 integers on the device thrust::device_ptr d_ptr = thrust::device_malloc(10); // device_ptr supports pointer arithmetic thrust::device_ptr first = d_ptr; thrust::device_ptr last = d_ptr + 10; std::cout << "device array contains " << (last - first) << " values\n"; // algorithms work as expected thrust::sequence(first, last); std::cout << "sum of values is " << thrust::reduce(first, last) << "\n"; // device memory can be read and written transparently d_ptr[0] = 10; d_ptr[1] = 11; d_ptr[2] = d_ptr[0] + d_ptr[1]; // device_ptr can be converted to a "raw" pointer for use in other APIs and kernels, etc. int * raw_ptr = thrust::raw_pointer_cast(d_ptr); // note: raw_ptr cannot necessarily be accessed by the host! // conversely, raw pointers can be wrapped thrust::device_ptr wrapped_ptr = thrust::device_pointer_cast(raw_ptr); // back to where we started assert(wrapped_ptr == d_ptr); // deallocate device memory thrust::device_free(d_ptr); return 0; } thrust-1.9.5/examples/discrete_voronoi.cu000066400000000000000000000143711344621116200206110ustar00rootroot00000000000000#include #include #include #include #include #include #include #include #include #include #include "include/timer.h" // Compute an approximate Voronoi Diagram with a Jump Flooding Algorithm (JFA) // // References // http://en.wikipedia.org/wiki/Voronoi_diagram // http://www.comp.nus.edu.sg/~tants/jfa.html // http://www.utdallas.edu/~guodongrong/Papers/Dissertation.pdf // // Thanks to David Coeurjolly for contributing this example // minFunctor // Tuple = struct minFunctor { int m, n, k; __host__ __device__ minFunctor(int m, int n, int k) : m(m), n(n), k(k) {} //To decide I have to change my current Voronoi site __host__ __device__ int minVoro(int x_i, int y_i, int p, int q) { if (q == m*n) return p; // coordinates of points p and q int y_q = q / m; int x_q = q - y_q * m; int y_p = p / m; int x_p = p - y_p * m; // squared distances int d_iq = (x_i-x_q) * (x_i-x_q) + (y_i-y_q) * (y_i-y_q); int d_ip = (x_i-x_p) * (x_i-x_p) + (y_i-y_p) * (y_i-y_p); if (d_iq < d_ip) return q; // q is closer else return p; } //For each point p+{-k,0,k}, we keep the Site with minimum distance template __host__ __device__ int operator()(const Tuple &t) { //Current point and site int i = thrust::get<9>(t); int v = thrust::get<0>(t); //Current point coordinates int y = i / m; int x = i - y * m; if (x >= k) { v = minVoro(x, y, v, thrust::get<3>(t)); if (y >= k) v = minVoro(x, y, v, thrust::get<8>(t)); if (y + k < n) v = minVoro(x, y, v, thrust::get<7>(t)); } if (x + k < m) { v = minVoro(x, y, v, thrust::get<1>(t)); if (y >= k) v = minVoro(x, y, v, thrust::get<6>(t)); if (y + k < n) v = minVoro(x, y, v, thrust::get<5>(t)); } if (y >= k) v = minVoro(x, y, v, thrust::get<4>(t)); if (y + k < n) v = minVoro(x, y, v, thrust::get<2>(t)); //global return return v; } }; // print an M-by-N array template void print(int m, int n, const thrust::device_vector& d_data) { thrust::host_vector h_data = d_data; for(int i = 0; i < m; i++) { for(int j = 0; j < n; j++) std::cout << std::setw(4) << h_data[i * n + j] << " "; std::cout << "\n"; } } void generate_random_sites(thrust::host_vector &t, int Nb, int m, int n) { thrust::default_random_engine rng; thrust::uniform_int_distribution dist(0, m * n - 1); for(int k = 0; k < Nb; k++) { int index = dist(rng); t[index] = index + 1; } } //Export the tab to PGM image format void vector_to_pgm(thrust::host_vector &t, int m, int n, const char *out) { FILE *f; f=fopen(out,"w+t"); fprintf(f,"P2\n"); fprintf(f,"%d %d\n 253\n",m,n); for(int j = 0; j < n ; j++) { for(int i = 0; i < m ; i++) { fprintf(f,"%d ",(int)(71*t[j*m+i])%253); //Hash function to map values to [0,255] } } fprintf(f,"\n"); fclose(f); } /************Main Jfa loop********************/ // Perform a jump with step k void jfa(thrust::device_vector& in,thrust::device_vector& out, unsigned int k, int m, int n) { thrust::transform( thrust::make_zip_iterator( thrust::make_tuple(in.begin(), in.begin() + k, in.begin() + m*k, in.begin() - k, in.begin() - m*k, in.begin() + k+m*k, in.begin() + k-m*k, in.begin() - k+m*k, in.begin() - k-m*k, thrust::counting_iterator(0))), thrust::make_zip_iterator( thrust::make_tuple(in.begin(), in.begin() + k, in.begin() + m*k, in.begin() - k, in.begin() - m*k, in.begin() + k+m*k, in.begin() + k-m*k, in.begin() - k+m*k, in.begin() - k-m*k, thrust::counting_iterator(0)))+ n*m, out.begin(), minFunctor(m,n,k)); } /********************************************/ void display_time(timer& t) { std::cout << " ( "<< 1e3 * t.elapsed() << "ms )" << std::endl; } int main(void) { int m = 2048; // number of rows int n = 2048; // number of columns int s = 1000; // number of sites timer t; //Host vector to encode a 2D image std::cout << "[Inititialize " << m << "x" << n << " Image]" << std::endl; t.restart(); thrust::host_vector seeds_host(m*n, m*n); generate_random_sites(seeds_host,s,m,n); display_time(t); std::cout<<"[Copy to Device]" << std::endl; t.restart(); thrust::device_vector seeds = seeds_host; thrust::device_vector temp(seeds); display_time(t); //JFA+1 : before entering the log(n) loop, we perform a jump with k=1 std::cout<<"[JFA stepping]" << std::endl; t.restart(); jfa(seeds,temp,1,m,n); seeds.swap(temp); //JFA : main loop with k=n/2, n/4, ..., 1 for(int k = thrust::max(m,n) / 2; k > 0; k /= 2) { jfa(seeds,temp,k,m,n); seeds.swap(temp); } display_time(t); std::cout <<" ( " << seeds.size() / (1e6 * t.elapsed()) << " MPixel/s ) " << std::endl; std::cout << "[Device to Host Copy]" << std::endl; t.restart(); seeds_host = seeds; display_time(t); std::cout << "[PGM Export]" << std::endl; t.restart(); vector_to_pgm(seeds_host, m, n, "discrete_voronoi.pgm"); display_time(t); return 0; } thrust-1.9.5/examples/dot_products_with_zip.cu000066400000000000000000000131201344621116200216510ustar00rootroot00000000000000#include #include #include #include #include #include // This example shows how thrust::zip_iterator can be used to create a // 'virtual' array of structures. In this case the structure is a 3d // vector type (Float3) whose (x,y,z) components will be stored in // three separate float arrays. The zip_iterator "zips" these arrays // into a single virtual Float3 array. // We'll use a 3-tuple to store our 3d vector type typedef thrust::tuple Float3; // This functor implements the dot product between 3d vectors struct DotProduct : public thrust::binary_function { __host__ __device__ float operator()(const Float3& a, const Float3& b) const { return thrust::get<0>(a) * thrust::get<0>(b) + // x components thrust::get<1>(a) * thrust::get<1>(b) + // y components thrust::get<2>(a) * thrust::get<2>(b); // z components } }; // Return a host vector with random values in the range [0,1) thrust::host_vector random_vector(const size_t N, unsigned int seed = thrust::default_random_engine::default_seed) { thrust::default_random_engine rng(seed); thrust::uniform_real_distribution u01(0.0f, 1.0f); thrust::host_vector temp(N); for(size_t i = 0; i < N; i++) { temp[i] = u01(rng); } return temp; } int main(void) { // number of vectors const size_t N = 1000; // We'll store the components of the 3d vectors in separate arrays. One set of // arrays will store the 'A' vectors and another set will store the 'B' vectors. // This 'structure of arrays' (SoA) approach is usually more efficient than the // 'array of structures' (AoS) approach. The primary reason is that structures, // like Float3, don't always obey the memory coalescing rules, so they are not // efficiently transferred to and from memory. Another reason to prefer SoA to // AoS is that we don't aways want to process all members of the structure. For // example, if we only need to look at first element of the structure then it // is wasteful to load the entire structure from memory. With the SoA approach, // we can chose which elements of the structure we wish to read. thrust::device_vector A0 = random_vector(N); // x components of the 'A' vectors thrust::device_vector A1 = random_vector(N); // y components of the 'A' vectors thrust::device_vector A2 = random_vector(N); // z components of the 'A' vectors thrust::device_vector B0 = random_vector(N); // x components of the 'B' vectors thrust::device_vector B1 = random_vector(N); // y components of the 'B' vectors thrust::device_vector B2 = random_vector(N); // z components of the 'B' vectors // Storage for result of each dot product thrust::device_vector result(N); // We'll now illustrate two ways to use zip_iterator to compute the dot // products. The first method is verbose but shows how the parts fit together. // The second method hides these details and is more concise. // METHOD #1 // Defining a zip_iterator type can be a little cumbersome ... typedef thrust::device_vector::iterator FloatIterator; typedef thrust::tuple FloatIteratorTuple; typedef thrust::zip_iterator Float3Iterator; // Now we'll create some zip_iterators for A and B Float3Iterator A_first = thrust::make_zip_iterator(make_tuple(A0.begin(), A1.begin(), A2.begin())); Float3Iterator A_last = thrust::make_zip_iterator(make_tuple(A0.end(), A1.end(), A2.end())); Float3Iterator B_first = thrust::make_zip_iterator(make_tuple(B0.begin(), B1.begin(), B2.begin())); // Finally, we pass the zip_iterators into transform() as if they // were 'normal' iterators for a device_vector. thrust::transform(A_first, A_last, B_first, result.begin(), DotProduct()); // METHOD #2 // Alternatively, we can avoid creating variables for X_first, X_last, // and Y_first and invoke transform() directly. thrust::transform( thrust::make_zip_iterator(make_tuple(A0.begin(), A1.begin(), A2.begin())), thrust::make_zip_iterator(make_tuple(A0.end(), A1.end(), A2.end())), thrust::make_zip_iterator(make_tuple(B0.begin(), B1.begin(), B2.begin())), result.begin(), DotProduct() ); // Finally, we'll print a few results // Example output // (0.840188,0.45724,0.0860517) * (0.0587587,0.456151,0.322409) = 0.285683 // (0.394383,0.640368,0.180886) * (0.0138811,0.24875,0.0221609) = 0.168775 // (0.783099,0.717092,0.426423) * (0.622212,0.0699601,0.234811) = 0.63755 // (0.79844,0.460067,0.0470658) * (0.0391351,0.742097,0.354747) = 0.389358 std::cout << std::fixed; for(size_t i = 0; i < 4; i++) { Float3 a = A_first[i]; Float3 b = B_first[i]; float dot = result[i]; std::cout << "(" << thrust::get<0>(a) << "," << thrust::get<1>(a) << "," << thrust::get<2>(a) << ")"; std::cout << " * "; std::cout << "(" << thrust::get<0>(b) << "," << thrust::get<1>(b) << "," << thrust::get<2>(b) << ")"; std::cout << " = "; std::cout << dot << std::endl; } return 0; } thrust-1.9.5/examples/expand.cu000066400000000000000000000062121344621116200165060ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // This example demonstrates how to expand an input sequence by // replicating each element a variable number of times. For example, // // expand([2,2,2],[A,B,C]) -> [A,A,B,B,C,C] // expand([3,0,1],[A,B,C]) -> [A,A,A,C] // expand([1,3,2],[A,B,C]) -> [A,B,B,B,C,C] // // The element counts are assumed to be non-negative integers template OutputIterator expand(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, OutputIterator output) { typedef typename thrust::iterator_difference::type difference_type; difference_type input_size = thrust::distance(first1, last1); difference_type output_size = thrust::reduce(first1, last1); // scan the counts to obtain output offsets for each input element thrust::device_vector output_offsets(input_size, 0); thrust::exclusive_scan(first1, last1, output_offsets.begin()); // scatter the nonzero counts into their corresponding output positions thrust::device_vector output_indices(output_size, 0); thrust::scatter_if (thrust::counting_iterator(0), thrust::counting_iterator(input_size), output_offsets.begin(), first1, output_indices.begin()); // compute max-scan over the output indices, filling in the holes thrust::inclusive_scan (output_indices.begin(), output_indices.end(), output_indices.begin(), thrust::maximum()); // gather input values according to index array (output = first2[output_indices]) OutputIterator output_end = output; thrust::advance(output_end, output_size); thrust::gather(output_indices.begin(), output_indices.end(), first2, output); // return output + output_size thrust::advance(output, output_size); return output; } template void print(const std::string& s, const Vector& v) { typedef typename Vector::value_type T; std::cout << s; thrust::copy(v.begin(), v.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; } int main(void) { int counts[] = {3,5,2,0,1,3,4,2,4}; int values[] = {1,2,3,4,5,6,7,8,9}; size_t input_size = sizeof(counts) / sizeof(int); size_t output_size = thrust::reduce(counts, counts + input_size); // copy inputs to device thrust::device_vector d_counts(counts, counts + input_size); thrust::device_vector d_values(values, values + input_size); thrust::device_vector d_output(output_size); // expand values according to counts expand(d_counts.begin(), d_counts.end(), d_values.begin(), d_output.begin()); std::cout << "Expanding values according to counts" << std::endl; print(" counts ", d_counts); print(" values ", d_values); print(" output ", d_output); return 0; } thrust-1.9.5/examples/fill_copy_sequence.cu000066400000000000000000000015041344621116200210760ustar00rootroot00000000000000#include #include #include #include #include #include int main(void) { // initialize all ten integers of a device_vector to 1 thrust::device_vector D(10, 1); // set the first seven elements of a vector to 9 thrust::fill(D.begin(), D.begin() + 7, 9); // initialize a host_vector with the first five elements of D thrust::host_vector H(D.begin(), D.begin() + 5); // set the elements of H to 0, 1, 2, 3, ... thrust::sequence(H.begin(), H.end()); // copy all of H back to the beginning of D thrust::copy(H.begin(), H.end(), D.begin()); // print D for(size_t i = 0; i < D.size(); i++) std::cout << "D[" << i << "] = " << D[i] << std::endl; return 0; } thrust-1.9.5/examples/histogram.cu000066400000000000000000000145031344621116200172260ustar00rootroot00000000000000#include #include #include #include #include #include #include #include #include #include #include #include #include // This example illustrates several methods for computing a // histogram [1] with Thrust. We consider standard "dense" // histograms, where some bins may have zero entries, as well // as "sparse" histograms, where only the nonzero bins are // stored. For example, histograms for the data set // [2 1 0 0 2 2 1 1 1 1 4] // which contains 2 zeros, 5 ones, and 3 twos and 1 four, is // [2 5 3 0 1] // using the dense method and // [(0,2), (1,5), (2,3), (4,1)] // using the sparse method. Since there are no threes, the // sparse histogram representation does not contain a bin // for that value. // // Note that we choose to store the sparse histogram in two // separate arrays, one array of keys and one array of bin counts, // [0 1 2 4] - keys // [2 5 3 1] - bin counts // This "structure of arrays" format is generally faster and // more convenient to process than the alternative "array // of structures" layout. // // The best histogramming methods depends on the application. // If the number of bins is relatively small compared to the // input size, then the binary search-based dense histogram // method is probably best. If the number of bins is comparable // to the input size, then the reduce_by_key-based sparse method // ought to be faster. When in doubt, try both and see which // is fastest. // // [1] http://en.wikipedia.org/wiki/Histogram // simple routine to print contents of a vector template void print_vector(const std::string& name, const Vector& v) { typedef typename Vector::value_type T; std::cout << " " << std::setw(20) << name << " "; thrust::copy(v.begin(), v.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; } // dense histogram using binary search template void dense_histogram(const Vector1& input, Vector2& histogram) { typedef typename Vector1::value_type ValueType; // input value type typedef typename Vector2::value_type IndexType; // histogram index type // copy input data (could be skipped if input is allowed to be modified) thrust::device_vector data(input); // print the initial data print_vector("initial data", data); // sort data to bring equal elements together thrust::sort(data.begin(), data.end()); // print the sorted data print_vector("sorted data", data); // number of histogram bins is equal to the maximum value plus one IndexType num_bins = data.back() + 1; // resize histogram storage histogram.resize(num_bins); // find the end of each bin of values thrust::counting_iterator search_begin(0); thrust::upper_bound(data.begin(), data.end(), search_begin, search_begin + num_bins, histogram.begin()); // print the cumulative histogram print_vector("cumulative histogram", histogram); // compute the histogram by taking differences of the cumulative histogram thrust::adjacent_difference(histogram.begin(), histogram.end(), histogram.begin()); // print the histogram print_vector("histogram", histogram); } // sparse histogram using reduce_by_key template void sparse_histogram(const Vector1& input, Vector2& histogram_values, Vector3& histogram_counts) { typedef typename Vector1::value_type ValueType; // input value type typedef typename Vector3::value_type IndexType; // histogram index type // copy input data (could be skipped if input is allowed to be modified) thrust::device_vector data(input); // print the initial data print_vector("initial data", data); // sort data to bring equal elements together thrust::sort(data.begin(), data.end()); // print the sorted data print_vector("sorted data", data); // number of histogram bins is equal to number of unique values (assumes data.size() > 0) IndexType num_bins = thrust::inner_product(data.begin(), data.end() - 1, data.begin() + 1, IndexType(1), thrust::plus(), thrust::not_equal_to()); // resize histogram storage histogram_values.resize(num_bins); histogram_counts.resize(num_bins); // compact find the end of each bin of values thrust::reduce_by_key(data.begin(), data.end(), thrust::constant_iterator(1), histogram_values.begin(), histogram_counts.begin()); // print the sparse histogram print_vector("histogram values", histogram_values); print_vector("histogram counts", histogram_counts); } int main(void) { thrust::default_random_engine rng; thrust::uniform_int_distribution dist(0, 9); const int N = 40; const int S = 4; // generate random data on the host thrust::host_vector input(N); for(int i = 0; i < N; i++) { int sum = 0; for (int j = 0; j < S; j++) sum += dist(rng); input[i] = sum / S; } // demonstrate dense histogram method { std::cout << "Dense Histogram" << std::endl; thrust::device_vector histogram; dense_histogram(input, histogram); } // demonstrate sparse histogram method { std::cout << "Sparse Histogram" << std::endl; thrust::device_vector histogram_values; thrust::device_vector histogram_counts; sparse_histogram(input, histogram_values, histogram_counts); } // Note: // A dense histogram can be converted to a sparse histogram // using stream compaction (i.e. thrust::copy_if). // A sparse histogram can be expanded into a dense histogram // by initializing the dense histogram to zero (with thrust::fill) // and then scattering the histogram counts (with thrust::scatter). return 0; } thrust-1.9.5/examples/include/000077500000000000000000000000001344621116200163205ustar00rootroot00000000000000thrust-1.9.5/examples/include/timer.h000066400000000000000000000041061344621116200176120ustar00rootroot00000000000000/* * Copyright 2008-2009 NVIDIA Corporation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #pragma once // A simple timer class #ifdef __CUDACC__ // use CUDA's high-resolution timers when possible #include #include #include #include void cuda_safe_call(cudaError_t error, const std::string& message = "") { if(error) throw thrust::system_error(error, thrust::cuda_category(), message); } struct timer { cudaEvent_t start; cudaEvent_t end; timer(void) { cuda_safe_call(cudaEventCreate(&start)); cuda_safe_call(cudaEventCreate(&end)); restart(); } ~timer(void) { cuda_safe_call(cudaEventDestroy(start)); cuda_safe_call(cudaEventDestroy(end)); } void restart(void) { cuda_safe_call(cudaEventRecord(start, 0)); } double elapsed(void) { cuda_safe_call(cudaEventRecord(end, 0)); cuda_safe_call(cudaEventSynchronize(end)); float ms_elapsed; cuda_safe_call(cudaEventElapsedTime(&ms_elapsed, start, end)); return ms_elapsed / 1e3; } double epsilon(void) { return 0.5e-6; } }; #else // fallback to clock() #include struct timer { clock_t start; clock_t end; timer(void) { restart(); } ~timer(void) { } void restart(void) { start = clock(); } double elapsed(void) { end = clock(); return static_cast(end - start) / static_cast(CLOCKS_PER_SEC); } double epsilon(void) { return 1.0 / static_cast(CLOCKS_PER_SEC); } }; #endif thrust-1.9.5/examples/lambda.cu000066400000000000000000000046601344621116200164540ustar00rootroot00000000000000#include #include #include #include // This example demonstrates the use of placeholders to implement // the SAXPY operation (i.e. Y[i] = a * X[i] + Y[i]). // // Placeholders enable developers to write concise inline expressions // instead of full functors for many simple operations. For example, // the placeholder expression "_1 + _2" means to add the first argument, // represented by _1, to the second argument, represented by _2. // The names _1, _2, _3, _4 ... _10 represent the first ten arguments // to the function. // // In this example, the placeholder expression "a * _1 + _2" is used // to implement the SAXPY operation. Note that the placeholder // implementation is considerably shorter and written inline. // allows us to use "_1" instead of "thrust::placeholders::_1" using namespace thrust::placeholders; // implementing SAXPY with a functor is cumbersome and verbose struct saxpy_functor : public thrust::binary_function { float a; saxpy_functor(float a) : a(a) {} __host__ __device__ float operator()(float x, float y) { return a * x + y; } }; int main(void) { // input data float a = 2.0f; float x[4] = {1, 2, 3, 4}; float y[4] = {1, 1, 1, 1}; // SAXPY implemented with a functor (function object) { thrust::device_vector X(x, x + 4); thrust::device_vector Y(y, y + 4); thrust::transform(X.begin(), X.end(), // input range #1 Y.begin(), // input range #2 Y.begin(), // output range saxpy_functor(a)); // functor std::cout << "SAXPY (functor method)" << std::endl; for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl; } // SAXPY implemented with a placeholders { thrust::device_vector X(x, x + 4); thrust::device_vector Y(y, y + 4); thrust::transform(X.begin(), X.end(), // input range #1 Y.begin(), // input range #2 Y.begin(), // output range a * _1 + _2); // placeholder expression std::cout << "SAXPY (placeholder method)" << std::endl; for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl; } return 0; } thrust-1.9.5/examples/lexicographical_sort.cu000066400000000000000000000053321344621116200214360ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // This example shows how to perform a lexicographical sort on multiple keys. // // http://en.wikipedia.org/wiki/Lexicographical_order template void update_permutation(KeyVector& keys, PermutationVector& permutation) { // temporary storage for keys KeyVector temp(keys.size()); // permute the keys with the current reordering thrust::gather(permutation.begin(), permutation.end(), keys.begin(), temp.begin()); // stable_sort the permuted keys and update the permutation thrust::stable_sort_by_key(temp.begin(), temp.end(), permutation.begin()); } template void apply_permutation(KeyVector& keys, PermutationVector& permutation) { // copy keys to temporary vector KeyVector temp(keys.begin(), keys.end()); // permute the keys thrust::gather(permutation.begin(), permutation.end(), temp.begin(), keys.begin()); } thrust::host_vector random_vector(size_t N) { thrust::host_vector vec(N); static thrust::default_random_engine rng; static thrust::uniform_int_distribution dist(0, 9); for (size_t i = 0; i < N; i++) vec[i] = dist(rng); return vec; } int main(void) { size_t N = 20; // generate three arrays of random values thrust::device_vector upper = random_vector(N); thrust::device_vector middle = random_vector(N); thrust::device_vector lower = random_vector(N); std::cout << "Unsorted Keys" << std::endl; for(size_t i = 0; i < N; i++) { std::cout << "(" << upper[i] << "," << middle[i] << "," << lower[i] << ")" << std::endl; } // initialize permutation to [0, 1, 2, ... ,N-1] thrust::device_vector permutation(N); thrust::sequence(permutation.begin(), permutation.end()); // sort from least significant key to most significant keys update_permutation(lower, permutation); update_permutation(middle, permutation); update_permutation(upper, permutation); // Note: keys have not been modified // Note: permutation now maps unsorted keys to sorted order // permute the key arrays by the final permuation apply_permutation(lower, permutation); apply_permutation(middle, permutation); apply_permutation(upper, permutation); std::cout << "Sorted Keys" << std::endl; for(size_t i = 0; i < N; i++) { std::cout << "(" << upper[i] << "," << middle[i] << "," << lower[i] << ")" << std::endl; } return 0; } thrust-1.9.5/examples/max_abs_diff.cu000066400000000000000000000020151344621116200176260ustar00rootroot00000000000000#include #include #include #include #include // this example computes the maximum absolute difference // between the elements of two vectors template struct abs_diff : public thrust::binary_function { __host__ __device__ T operator()(const T& a, const T& b) { return fabsf(b - a); } }; int main(void) { thrust::device_vector d_a(4); thrust::device_vector d_b(4); d_a[0] = 1.0; d_b[0] = 2.0; d_a[1] = 2.0; d_b[1] = 4.0; d_a[2] = 3.0; d_b[2] = 3.0; d_a[3] = 4.0; d_b[3] = 0.0; // initial value of the reduction float init = 0; // binary operations thrust::maximum binary_op1; abs_diff binary_op2; float max_abs_diff = thrust::inner_product(d_a.begin(), d_a.end(), d_b.begin(), init, binary_op1, binary_op2); std::cout << "maximum absolute difference: " << max_abs_diff << std::endl; return 0; } thrust-1.9.5/examples/minimal_custom_backend.cu000066400000000000000000000044761344621116200217300ustar00rootroot00000000000000#include #include #include #include #include #include // This example demonstrates how to build a minimal custom // Thrust backend by intercepting for_each's dispatch. // We begin by defining a "system", which distinguishes our novel // backend from other Thrust backends. // We'll derive my_system from thrust::device_execution_policy to inherit // the functionality of the default device backend. // Note that we pass the name of our system as a template parameter // to thrust::device_execution_policy. struct my_system : thrust::device_execution_policy {}; // Next, we'll create a novel version of for_each which only // applies to algorithm invocations executed with my_system. // Our version of for_each will print a message and then call // the regular device version of for_each. // The first parameter to our version for_each is my_system. This allows // Thrust to locate it when dispatching thrust::for_each. // The following parameters are as normal. template Iterator for_each(my_system, Iterator first, Iterator last, Function f) { // output a message std::cout << "Hello, world from for_each(my_system)!" << std::endl; // to call the normal device version of for_each, pass thrust::device as the first parameter. return thrust::for_each(thrust::device, first, last, f); } int main() { thrust::device_vector vec(1); // create an instance of our system my_system sys; // To invoke our version of for_each, pass sys as the first parameter thrust::for_each(sys, vec.begin(), vec.end(), thrust::identity()); // Other algorithms that Thrust implements with thrust::for_each will also // cause our version of for_each to be invoked when we pass an instance of my_system as the first parameter. // Even though we did not define a special version of transform, Thrust dispatches the version it knows // for thrust::device_execution_policy, which my_system inherits. thrust::transform(sys, vec.begin(), vec.end(), vec.begin(), thrust::identity()); // Invocations without my_system are handled normally. thrust::for_each(vec.begin(), vec.end(), thrust::identity()); return 0; } thrust-1.9.5/examples/minmax.cu000066400000000000000000000045441344621116200165260ustar00rootroot00000000000000#include #include #include #include #include #include // compute minimum and maximum values in a single reduction // minmax_pair stores the minimum and maximum // values that have been encountered so far template struct minmax_pair { T min_val; T max_val; }; // minmax_unary_op is a functor that takes in a value x and // returns a minmax_pair whose minimum and maximum values // are initialized to x. template struct minmax_unary_op : public thrust::unary_function< T, minmax_pair > { __host__ __device__ minmax_pair operator()(const T& x) const { minmax_pair result; result.min_val = x; result.max_val = x; return result; } }; // minmax_binary_op is a functor that accepts two minmax_pair // structs and returns a new minmax_pair whose minimum and // maximum values are the min() and max() respectively of // the minimums and maximums of the input pairs template struct minmax_binary_op : public thrust::binary_function< minmax_pair, minmax_pair, minmax_pair > { __host__ __device__ minmax_pair operator()(const minmax_pair& x, const minmax_pair& y) const { minmax_pair result; result.min_val = thrust::min(x.min_val, y.min_val); result.max_val = thrust::max(x.max_val, y.max_val); return result; } }; int main(void) { // input size size_t N = 10; // initialize random number generator thrust::default_random_engine rng; thrust::uniform_int_distribution dist(10, 99); // initialize data on host thrust::device_vector data(N); for (size_t i = 0; i < data.size(); i++) data[i] = dist(rng); // setup arguments minmax_unary_op unary_op; minmax_binary_op binary_op; // initialize reduction with the first value minmax_pair init = unary_op(data[0]); // compute minimum and maximum values minmax_pair result = thrust::transform_reduce(data.begin(), data.end(), unary_op, init, binary_op); // print results std::cout << "[ "; for(size_t i = 0; i < N; i++) std::cout << data[i] << " "; std::cout << "]" << std::endl; std::cout << "minimum = " << result.min_val << std::endl; std::cout << "maximum = " << result.max_val << std::endl; return 0; } thrust-1.9.5/examples/mode.cu000066400000000000000000000057021344621116200161560ustar00rootroot00000000000000#include #include #include #include #include #include #include #include #include #include #include #include // This example compute the mode [1] of a set of numbers. If there // are multiple modes, one with the smallest value it returned. // // [1] http://en.wikipedia.org/wiki/Mode_(statistics) int main(void) { const size_t N = 30; const size_t M = 10; thrust::default_random_engine rng; thrust::uniform_int_distribution dist(0, M - 1); // generate random data on the host thrust::host_vector h_data(N); for(size_t i = 0; i < N; i++) h_data[i] = dist(rng); // transfer data to device thrust::device_vector d_data(h_data); // print the initial data std::cout << "initial data" << std::endl; thrust::copy(d_data.begin(), d_data.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; // sort data to bring equal elements together thrust::sort(d_data.begin(), d_data.end()); // print the sorted data std::cout << "sorted data" << std::endl; thrust::copy(d_data.begin(), d_data.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; // count number of unique keys size_t num_unique = thrust::inner_product(d_data.begin(), d_data.end() - 1, d_data.begin() + 1, 0, thrust::plus(), thrust::not_equal_to()) + 1; // count multiplicity of each key thrust::device_vector d_output_keys(num_unique); thrust::device_vector d_output_counts(num_unique); thrust::reduce_by_key(d_data.begin(), d_data.end(), thrust::constant_iterator(1), d_output_keys.begin(), d_output_counts.begin()); // print the counts std::cout << "values" << std::endl; thrust::copy(d_output_keys.begin(), d_output_keys.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; // print the counts std::cout << "counts" << std::endl; thrust::copy(d_output_counts.begin(), d_output_counts.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; // find the index of the maximum count thrust::device_vector::iterator mode_iter; mode_iter = thrust::max_element(d_output_counts.begin(), d_output_counts.end()); int mode = d_output_keys[mode_iter - d_output_counts.begin()]; int occurances = *mode_iter; std::cout << "Modal value " << mode << " occurs " << occurances << " times " << std::endl; return 0; } thrust-1.9.5/examples/monte_carlo.cu000066400000000000000000000037251344621116200175370ustar00rootroot00000000000000#include #include #include #include #include #include #include // we could vary M & N to find the perf sweet spot __host__ __device__ unsigned int hash(unsigned int a) { a = (a+0x7ed55d16) + (a<<12); a = (a^0xc761c23c) ^ (a>>19); a = (a+0x165667b1) + (a<<5); a = (a+0xd3a2646c) ^ (a<<9); a = (a+0xfd7046c5) + (a<<3); a = (a^0xb55a4f09) ^ (a>>16); return a; } struct estimate_pi : public thrust::unary_function { __host__ __device__ float operator()(unsigned int thread_id) { float sum = 0; unsigned int N = 10000; // samples per thread unsigned int seed = hash(thread_id); // seed a random number generator thrust::default_random_engine rng(seed); // create a mapping from random numbers to [0,1) thrust::uniform_real_distribution u01(0,1); // take N samples in a quarter circle for(unsigned int i = 0; i < N; ++i) { // draw a sample from the unit square float x = u01(rng); float y = u01(rng); // measure distance from the origin float dist = sqrtf(x*x + y*y); // add 1.0f if (u0,u1) is inside the quarter circle if(dist <= 1.0f) sum += 1.0f; } // multiply by 4 to get the area of the whole circle sum *= 4.0f; // divide by N return sum / N; } }; int main(void) { // use 30K independent seeds int M = 30000; float estimate = thrust::transform_reduce(thrust::counting_iterator(0), thrust::counting_iterator(M), estimate_pi(), 0.0f, thrust::plus()); estimate /= M; std::cout << std::setprecision(3); std::cout << "pi is approximately " << estimate << std::endl; return 0; } thrust-1.9.5/examples/monte_carlo_disjoint_sequences.cu000066400000000000000000000053531344621116200235140ustar00rootroot00000000000000#include #include #include #include #include #include // The technique demonstrated in the example monte_carlo.cu // assigns an independently seeded random number generator to each // of 30K threads, and uses a hashing scheme based on thread index to // seed each RNG. This technique, while simple, may be succeptible // to correlation among the streams of numbers generated by each RNG // because there is no guarantee that the streams are not disjoint. // This example demonstrates a slightly more sophisticated technique // which ensures that the subsequences generated in each thread are // disjoint. To achieve this, we use a single common stream // of random numbers, but partition it among threads to ensure no overlap // of substreams. The substreams are generated procedurally using // default_random_engine's discard(n) member function, which skips // past n states of the RNG. This function is accelerated and executes // in O(lg n) time. struct estimate_pi : public thrust::unary_function { __host__ __device__ float operator()(unsigned int thread_id) { float sum = 0; unsigned int N = 5000; // samples per stream // note that M * N <= default_random_engine::max, // which is also the period of this particular RNG // this ensures the substreams are disjoint // create a random number generator // note that each thread uses an RNG with the same seed thrust::default_random_engine rng; // jump past the numbers used by the subsequences before me rng.discard(N * thread_id); // create a mapping from random numbers to [0,1) thrust::uniform_real_distribution u01(0,1); // take N samples in a quarter circle for(unsigned int i = 0; i < N; ++i) { // draw a sample from the unit square float x = u01(rng); float y = u01(rng); // measure distance from the origin float dist = sqrtf(x*x + y*y); // add 1.0f if (u0,u1) is inside the quarter circle if(dist <= 1.0f) sum += 1.0f; } // multiply by 4 to get the area of the whole circle sum *= 4.0f; // divide by N return sum / N; } }; int main(void) { // use 30K subsequences of random numbers int M = 30000; float estimate = thrust::transform_reduce(thrust::counting_iterator(0), thrust::counting_iterator(M), estimate_pi(), 0.0f, thrust::plus()); estimate /= M; std::cout << "pi is around " << estimate << std::endl; return 0; } thrust-1.9.5/examples/mr_basic.cu000066400000000000000000000033321344621116200170060ustar00rootroot00000000000000#include #include #include #include #include template void do_stuff_with_vector(typename Vec::allocator_type alloc) { Vec v1(alloc); v1.push_back(1); assert(v1.back() == 1); Vec v2(alloc); v2 = v1; v1.swap(v2); v1.clear(); v1.resize(2); assert(v1.size() == 2); } int main() { thrust::mr::new_delete_resource memres; { // no virtual calls will be issued typedef thrust::mr::allocator Alloc; Alloc alloc(&memres); do_stuff_with_vector >(alloc); } { // virtual calls will be issued - wrapping in a polymorphic wrapper thrust::mr::polymorphic_adaptor_resource adaptor(&memres); typedef thrust::mr::polymorphic_allocator Alloc; Alloc alloc(&adaptor); do_stuff_with_vector >(alloc); } typedef thrust::mr::unsynchronized_pool_resource< thrust::mr::new_delete_resource > Pool; Pool pool(&memres); { typedef thrust::mr::allocator Alloc; Alloc alloc(&pool); do_stuff_with_vector >(alloc); } typedef thrust::mr::disjoint_unsynchronized_pool_resource< thrust::mr::new_delete_resource, thrust::mr::new_delete_resource > DisjointPool; DisjointPool disjoint_pool(&memres, &memres); { typedef thrust::mr::allocator Alloc; Alloc alloc(&disjoint_pool); do_stuff_with_vector >(alloc); } } thrust-1.9.5/examples/norm.cu000066400000000000000000000025651344621116200162110ustar00rootroot00000000000000#include #include #include #include #include #include // This example computes the norm [1] of a vector. The norm is // computed by squaring all numbers in the vector, summing the // squares, and taking the square root of the sum of squares. In // Thrust this operation is efficiently implemented with the // transform_reduce() algorith. Specifically, we first transform // x -> x^2 and the compute a standard plus reduction. Since there // is no built-in functor for squaring numbers, we define our own // square functor. // // [1] http://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm // square computes the square of a number f(x) -> x*x template struct square { __host__ __device__ T operator()(const T& x) const { return x * x; } }; int main(void) { // initialize host array float x[4] = {1.0, 2.0, 3.0, 4.0}; // transfer to device thrust::device_vector d_x(x, x + 4); // setup arguments square unary_op; thrust::plus binary_op; float init = 0; // compute norm float norm = std::sqrt( thrust::transform_reduce(d_x.begin(), d_x.end(), unary_op, init, binary_op) ); std::cout << "norm is " << norm << std::endl; return 0; } thrust-1.9.5/examples/omp/000077500000000000000000000000001344621116200154705ustar00rootroot00000000000000thrust-1.9.5/examples/omp/CMakeLists.txt000066400000000000000000000003571344621116200202350ustar00rootroot00000000000000FILE(GLOB SOURCES_CU *.cu) FILE(GLOB SOURCES_CPP *.cpp) set(SOURCES_BACKEND ${SOURCES_CU}) install(FILES ${SOURCES_BACKEND} DESTINATION "examples/omp" COMPONENT examples) if (NOT "x${DEVICE_BACKEND}" STREQUAL "xOMP") return() endif() thrust-1.9.5/examples/padded_grid_reduction.cu000066400000000000000000000075751344621116200215460ustar00rootroot00000000000000#include #include #include #include #include #include #include #include #include #include #include // This example computes the minimum and maximum values // over a padded grid. The padded values are not considered // during the reduction operation. // transform a tuple (int,value) into a tuple (bool,value,value) // where the bool is true for valid grid values and false for // values in the padded region of the grid template struct transform_tuple : public thrust::unary_function< thrust::tuple, thrust::tuple > { typedef typename thrust::tuple InputTuple; typedef typename thrust::tuple OutputTuple; IndexType n, N; transform_tuple(IndexType n, IndexType N) : n(n), N(N) {} __host__ __device__ OutputTuple operator()(const InputTuple& t) const { bool is_valid = (thrust::get<0>(t) % N) < n; return OutputTuple(is_valid, thrust::get<1>(t), thrust::get<1>(t)); } }; // reduce two tuples (bool,value,value) into a single tuple such that output // contains the smallest and largest *valid* values. template struct reduce_tuple : public thrust::binary_function< thrust::tuple, thrust::tuple, thrust::tuple > { typedef typename thrust::tuple Tuple; __host__ __device__ Tuple operator()(const Tuple& t0, const Tuple& t1) const { if(thrust::get<0>(t0) && thrust::get<0>(t1)) // both valid return Tuple(true, thrust::min(thrust::get<1>(t0), thrust::get<1>(t1)), thrust::max(thrust::get<2>(t0), thrust::get<2>(t1))); else if (thrust::get<0>(t0)) return t0; else if (thrust::get<0>(t1)) return t1; else return t1; // if neither is valid then it doesn't matter what we return } }; int main(void) { int M = 10; // number of rows int n = 11; // number of columns excluding padding int N = 16; // number of columns including padding thrust::default_random_engine rng(12345); thrust::uniform_real_distribution dist(0.0f, 1.0f); thrust::device_vector data(M * N, -1); // initialize valid values in grid for(int i = 0; i < M; i++) for(int j = 0; j < n; j++) data[i * N + j] = dist(rng); // print full grid std::cout << "padded grid" << std::endl; std::cout << std::fixed << std::setprecision(4); for(int i = 0; i < M; i++) { std::cout << " "; for(int j = 0; j < N; j++) { std::cout << data[i * N + j] << " "; } std::cout << "\n"; } std::cout << "\n"; // compute min & max over valid region of the 2d grid typedef thrust::tuple result_type; result_type init(true, FLT_MAX, -FLT_MAX); // initial value transform_tuple unary_op(n, N); // transformation operator reduce_tuple binary_op; // reduction operator result_type result = thrust::transform_reduce( thrust::make_zip_iterator(thrust::make_tuple(thrust::counting_iterator(0), data.begin())), thrust::make_zip_iterator(thrust::make_tuple(thrust::counting_iterator(0), data.begin())) + data.size(), unary_op, init, binary_op); std::cout << "minimum value: " << thrust::get<1>(result) << std::endl; std::cout << "maximum value: " << thrust::get<2>(result) << std::endl; return 0; } thrust-1.9.5/examples/permutation_iterator.cu000066400000000000000000000016701344621116200215120ustar00rootroot00000000000000#include #include #include #include // this example fuses a gather operation with a reduction for // greater efficiency than separate gather() and reduce() calls int main(void) { // gather locations thrust::device_vector map(4); map[0] = 3; map[1] = 1; map[2] = 0; map[3] = 5; // array to gather from thrust::device_vector source(6); source[0] = 10; source[1] = 20; source[2] = 30; source[3] = 40; source[4] = 50; source[5] = 60; // fuse gather with reduction: // sum = source[map[0]] + source[map[1]] + ... int sum = thrust::reduce(thrust::make_permutation_iterator(source.begin(), map.begin()), thrust::make_permutation_iterator(source.begin(), map.end())); // print sum std::cout << "sum is " << sum << std::endl; return 0; } thrust-1.9.5/examples/raw_reference_cast.cu000066400000000000000000000064651344621116200210620ustar00rootroot00000000000000#include #include #include #include #include // This example illustrates how to use the raw_reference_cast to convert // system-specific reference wrappers into native references. // // Using iterators in the manner described here is generally discouraged. // Users should only resort to this technique if there is no viable // implemention of a given operation in terms of Thrust algorithms. // For example this particular example is better solved with thrust::copy, // which is safer and potentially faster. Only use this approach after all // safer alternatives have been exhausted. // // When a Thrust iterator is referenced (e.g. *iter) the result is not // a native or "raw" reference like int& or float&. Instead, // the result is a type such as thrust::system::cuda::reference // or thrust::system::tbb::reference, depending on the system // to which the data belongs. These reference wrappers are necessary // to make expressions like *iter1 = *iter2; work correctly when // iter1 and iter2 refer to data in different memory spaces on // heterogenous systems. // // The raw_reference_cast function essentially strips away the system-specific // meta-data so it should only be used when the code is guaranteed to be // executed within an appropriate context. __host__ __device__ void assign_reference_to_reference(int& x, int& y) { y = x; } __host__ __device__ void assign_value_to_reference(int x, int& y) { y = x; } template struct copy_iterators { InputIterator input; OutputIterator output; copy_iterators(InputIterator input, OutputIterator output) : input(input), output(output) {} __host__ __device__ void operator()(int i) { InputIterator in = input + i; OutputIterator out = output + i; // invalid - reference is not convertible to int& // assign_reference_to_reference(*in, *out); // valid - reference explicitly converted to int& assign_reference_to_reference(thrust::raw_reference_cast(*in), thrust::raw_reference_cast(*out)); // valid - since reference is convertible to int assign_value_to_reference(*in, thrust::raw_reference_cast(*out)); } }; template void print(const std::string& name, const Vector& v) { typedef typename Vector::value_type T; std::cout << name << ": "; thrust::copy(v.begin(), v.end(), std::ostream_iterator(std::cout, " ")); std::cout << "\n"; } int main(void) { typedef thrust::device_vector Vector; typedef Vector::iterator Iterator; typedef thrust::device_system_tag System; size_t N = 5; // allocate device memory Vector A(N); Vector B(N); // initialize A and B thrust::sequence(A.begin(), A.end()); thrust::fill(B.begin(), B.end(), 0); std::cout << "Before A->B Copy" << std::endl; print("A", A); print("B", B); // note: we must specify the System to ensure correct execution thrust::for_each(thrust::counting_iterator(0), thrust::counting_iterator(N), copy_iterators(A.begin(), B.begin())); std::cout << "After A->B Copy" << std::endl; print("A", A); print("B", B); return 0; } thrust-1.9.5/examples/remove_points2d.cu000066400000000000000000000042001344621116200203410ustar00rootroot00000000000000#include #include #include // This example generates random points in the // unit square [0,1)x[0,1) and then removes all // points where x^2 + y^2 > 1 // // The x and y coordinates are stored in separate arrays // and a zip_iterator is used to combine them together template struct is_outside_circle { template inline __host__ __device__ bool operator()(const Tuple& tuple) const { // unpack the tuple into x and y coordinates const T x = thrust::get<0>(tuple); const T y = thrust::get<1>(tuple); if (x*x + y*y > 1) return true; else return false; } }; int main(void) { const size_t N = 20; // generate random points in the unit square on the host thrust::default_random_engine rng; thrust::uniform_real_distribution u01(0.0f, 1.0f); thrust::host_vector x(N); thrust::host_vector y(N); for(size_t i = 0; i < N; i++) { x[i] = u01(rng); y[i] = u01(rng); } // print the initial points std::cout << std::fixed; std::cout << "Generated " << N << " points" << std::endl; for(size_t i = 0; i < N; i++) std::cout << "(" << x[i] << "," << y[i] << ")" << std::endl; std::cout << std::endl; // remove points where x^2 + y^2 > 1 and determine new array sizes size_t new_size = thrust::remove_if(thrust::make_zip_iterator(thrust::make_tuple(x.begin(), y.begin())), thrust::make_zip_iterator(thrust::make_tuple(x.end(), y.end())), is_outside_circle()) - thrust::make_zip_iterator(thrust::make_tuple(x.begin(), y.begin())); // resize the vectors (note: this does not free any memory) x.resize(new_size); y.resize(new_size); // print the filtered points std::cout << "After stream compaction, " << new_size << " points remain" << std::endl; for(size_t i = 0; i < new_size; i++) std::cout << "(" << x[i] << "," << y[i] << ")" << std::endl; return 0; } thrust-1.9.5/examples/repeated_range.cu000066400000000000000000000056111344621116200201760ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // this example illustrates how to make repeated access to a range of values // examples: // repeated_range([0, 1, 2, 3], 1) -> [0, 1, 2, 3] // repeated_range([0, 1, 2, 3], 2) -> [0, 0, 1, 1, 2, 2, 3, 3] // repeated_range([0, 1, 2, 3], 3) -> [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3] // ... template class repeated_range { public: typedef typename thrust::iterator_difference::type difference_type; struct repeat_functor : public thrust::unary_function { difference_type repeats; repeat_functor(difference_type repeats) : repeats(repeats) {} __host__ __device__ difference_type operator()(const difference_type& i) const { return i / repeats; } }; typedef typename thrust::counting_iterator CountingIterator; typedef typename thrust::transform_iterator TransformIterator; typedef typename thrust::permutation_iterator PermutationIterator; // type of the repeated_range iterator typedef PermutationIterator iterator; // construct repeated_range for the range [first,last) repeated_range(Iterator first, Iterator last, difference_type repeats) : first(first), last(last), repeats(repeats) {} iterator begin(void) const { return PermutationIterator(first, TransformIterator(CountingIterator(0), repeat_functor(repeats))); } iterator end(void) const { return begin() + repeats * (last - first); } protected: Iterator first; Iterator last; difference_type repeats; }; int main(void) { thrust::device_vector data(4); data[0] = 10; data[1] = 20; data[2] = 30; data[3] = 40; // print the initial data std::cout << "range "; thrust::copy(data.begin(), data.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; typedef thrust::device_vector::iterator Iterator; // create repeated_range with elements repeated twice repeated_range twice(data.begin(), data.end(), 2); std::cout << "repeated x2: "; thrust::copy(twice.begin(), twice.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; // create repeated_range with elements repeated x3 repeated_range thrice(data.begin(), data.end(), 3); std::cout << "repeated x3: "; thrust::copy(thrice.begin(), thrice.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; return 0; } thrust-1.9.5/examples/run_length_decoding.cu000066400000000000000000000035511344621116200212330ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // This example decodes a run-length code [1] for an array of characters. // // [1] http://en.wikipedia.org/wiki/Run-length_encoding int main(void) { // allocate storage for compressed input and run lengths thrust::device_vector input(6); thrust::device_vector lengths(6); input[0] = 'a'; lengths[0] = 3; input[1] = 'b'; lengths[1] = 5; input[2] = 'c'; lengths[2] = 1; input[3] = 'd'; lengths[3] = 2; input[4] = 'e'; lengths[4] = 9; input[5] = 'f'; lengths[5] = 2; // print the initial data std::cout << "run-length encoded input:" << std::endl; for(size_t i = 0; i < 6; i++) std::cout << "(" << input[i] << "," << lengths[i] << ")"; std::cout << std::endl << std::endl; // scan the lengths thrust::inclusive_scan(lengths.begin(), lengths.end(), lengths.begin()); // output size is sum of the run lengths int N = lengths.back(); // compute input index for each output element thrust::device_vector indices(N); thrust::lower_bound(lengths.begin(), lengths.end(), thrust::counting_iterator(1), thrust::counting_iterator(N + 1), indices.begin()); // gather input elements thrust::device_vector output(N); thrust::gather(indices.begin(), indices.end(), input.begin(), output.begin()); // print the initial data std::cout << "decoded output:" << std::endl; thrust::copy(output.begin(), output.end(), std::ostream_iterator(std::cout, "")); std::cout << std::endl; return 0; } thrust-1.9.5/examples/run_length_encoding.cu000066400000000000000000000032731344621116200212460ustar00rootroot00000000000000#include #include #include #include #include #include // This example computes a run-length code [1] for an array of characters. // // [1] http://en.wikipedia.org/wiki/Run-length_encoding int main(void) { // input data on the host const char data[] = "aaabbbbbcddeeeeeeeeeff"; const size_t N = (sizeof(data) / sizeof(char)) - 1; // copy input data to the device thrust::device_vector input(data, data + N); // allocate storage for output data and run lengths thrust::device_vector output(N); thrust::device_vector lengths(N); // print the initial data std::cout << "input data:" << std::endl; thrust::copy(input.begin(), input.end(), std::ostream_iterator(std::cout, "")); std::cout << std::endl << std::endl; // compute run lengths size_t num_runs = thrust::reduce_by_key (input.begin(), input.end(), // input key sequence thrust::constant_iterator(1), // input value sequence output.begin(), // output key sequence lengths.begin() // output value sequence ).first - output.begin(); // compute the output size // print the output std::cout << "run-length encoded output:" << std::endl; for(size_t i = 0; i < num_runs; i++) std::cout << "(" << output[i] << "," << lengths[i] << ")"; std::cout << std::endl; return 0; } thrust-1.9.5/examples/saxpy.cu000066400000000000000000000040261344621116200163740ustar00rootroot00000000000000#include #include #include #include #include #include #include // This example illustrates how to implement the SAXPY // operation (Y[i] = a * X[i] + Y[i]) using Thrust. // The saxpy_slow function demonstrates the most // straightforward implementation using a temporary // array and two separate transformations, one with // multiplies and one with plus. The saxpy_fast function // implements the operation with a single transformation // and represents "best practice". struct saxpy_functor : public thrust::binary_function { const float a; saxpy_functor(float _a) : a(_a) {} __host__ __device__ float operator()(const float& x, const float& y) const { return a * x + y; } }; void saxpy_fast(float A, thrust::device_vector& X, thrust::device_vector& Y) { // Y <- A * X + Y thrust::transform(X.begin(), X.end(), Y.begin(), Y.begin(), saxpy_functor(A)); } void saxpy_slow(float A, thrust::device_vector& X, thrust::device_vector& Y) { thrust::device_vector temp(X.size()); // temp <- A thrust::fill(temp.begin(), temp.end(), A); // temp <- A * X thrust::transform(X.begin(), X.end(), temp.begin(), temp.begin(), thrust::multiplies()); // Y <- A * X + Y thrust::transform(temp.begin(), temp.end(), Y.begin(), Y.begin(), thrust::plus()); } int main(void) { // initialize host arrays float x[4] = {1.0, 1.0, 1.0, 1.0}; float y[4] = {1.0, 2.0, 3.0, 4.0}; { // transfer to device thrust::device_vector X(x, x + 4); thrust::device_vector Y(y, y + 4); // slow method saxpy_slow(2.0, X, Y); } { // transfer to device thrust::device_vector X(x, x + 4); thrust::device_vector Y(y, y + 4); // fast method saxpy_fast(2.0, X, Y); } return 0; } thrust-1.9.5/examples/scan_by_key.cu000066400000000000000000000056631344621116200175260ustar00rootroot00000000000000#include #include #include #include // BinaryPredicate for the head flag segment representation // equivalent to thrust::not2(thrust::project2nd())); template struct head_flag_predicate : public thrust::binary_function { __host__ __device__ bool operator()(HeadFlagType left, HeadFlagType right) const { return !right; } }; template void print(const Vector& v) { for(size_t i = 0; i < v.size(); i++) std::cout << v[i] << " "; std::cout << "\n"; } int main(void) { int keys[] = {0,0,0,1,1,2,2,2,2,3,4,4,5,5,5}; // segments represented with keys int flags[] = {1,0,0,1,0,1,0,0,0,1,1,0,1,0,0}; // segments represented with head flags int values[] = {2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}; // values corresponding to each key int N = sizeof(keys) / sizeof(int); // number of elements // copy input data to device thrust::device_vector d_keys (keys, keys + N); thrust::device_vector d_flags (flags, flags + N); thrust::device_vector d_values(values, values + N); // allocate storage for output thrust::device_vector d_output(N); // inclusive scan using keys thrust::inclusive_scan_by_key (d_keys.begin(), d_keys.end(), d_values.begin(), d_output.begin()); std::cout << "Inclusive Segmented Scan w/ Key Sequence\n"; std::cout << " keys : "; print(d_keys); std::cout << " input values : "; print(d_values); std::cout << " output values : "; print(d_output); // inclusive scan using head flags thrust::inclusive_scan_by_key (d_flags.begin(), d_flags.end(), d_values.begin(), d_output.begin(), head_flag_predicate()); std::cout << "\nInclusive Segmented Scan w/ Head Flag Sequence\n"; std::cout << " head flags : "; print(d_flags); std::cout << " input values : "; print(d_values); std::cout << " output values : "; print(d_output); // exclusive scan using keys thrust::exclusive_scan_by_key (d_keys.begin(), d_keys.end(), d_values.begin(), d_output.begin()); std::cout << "\nExclusive Segmented Scan w/ Key Sequence\n"; std::cout << " keys : "; print(d_keys); std::cout << " input values : "; print(d_values); std::cout << " output values : "; print(d_output); // exclusive scan using head flags thrust::exclusive_scan_by_key (d_flags.begin(), d_flags.end(), d_values.begin(), d_output.begin(), 0, head_flag_predicate()); std::cout << "\nExclusive Segmented Scan w/ Head Flag Sequence\n"; std::cout << " head flags : "; print(d_flags); std::cout << " input values : "; print(d_values); std::cout << " output values : "; print(d_output); return 0; } thrust-1.9.5/examples/set_operations.cu000066400000000000000000000107461344621116200202740ustar00rootroot00000000000000#include #include #include #include #include // This example illustrates use of the set operation algorithms // - merge // - set_union // - set_intersection // - set_difference // - set_symmetric_difference // // In this context a "set" is simply a sequence of sorted values, // allowing the standard set operations to be performed more efficiently // than on unsorted data. Since the output of a set operation is a valid // set (i.e. a sorted sequence) it is possible to apply the set operations // in a nested fashion to compute arbitrary set expressions. // // Set operation usage notes: // - The output set size is variable (except for thrust::merge), // so the return value is important. // - Generally one would conservatively allocate storage for the output // and then resize or shrink an output container as necessary. // Alternatively, one can compute the exact output size by // outputting to a discard_iterator. This approach is more computationally // expensive (approximately 2x), but conserves memory capacity. // Refer to the SetIntersectionSize function for implementation details. // - Sets are allowed to have duplicate elements, which are carried // through to the output in a algorithm-specific manner. Refer // to the full documentation for precise semantics. // helper routine template void print(const String& s, const Vector& v) { std::cout << s << " ["; for(size_t i = 0; i < v.size(); i++) std::cout << " " << v[i]; std::cout << " ]\n"; } template void Merge(const Vector& A, const Vector& B) { // merged output is always exactly A.size() + B.size() Vector C(A.size() + B.size()); thrust::merge(A.begin(), A.end(), B.begin(), B.end(), C.begin()); print("Merge(A,B)", C); } template void SetUnion(const Vector& A, const Vector& B) { // union output is at most A.size() + B.size() Vector C(A.size() + B.size()); // set_union returns an iterator C_end denoting the end of input typename Vector::iterator C_end; C_end = thrust::set_union(A.begin(), A.end(), B.begin(), B.end(), C.begin()); // shrink C to exactly fit output C.erase(C_end, C.end()); print("Union(A,B)", C); } template void SetIntersection(const Vector& A, const Vector& B) { // intersection output is at most min(A.size(), B.size()) Vector C(thrust::min(A.size(), B.size())); // set_union returns an iterator C_end denoting the end of input typename Vector::iterator C_end; C_end = thrust::set_intersection(A.begin(), A.end(), B.begin(), B.end(), C.begin()); // shrink C to exactly fit output C.erase(C_end, C.end()); print("Intersection(A,B)", C); } template void SetDifference(const Vector& A, const Vector& B) { // difference output is at most A.size() Vector C(A.size()); // set_union returns an iterator C_end denoting the end of input typename Vector::iterator C_end; C_end = thrust::set_difference(A.begin(), A.end(), B.begin(), B.end(), C.begin()); // shrink C to exactly fit output C.erase(C_end, C.end()); print("Difference(A,B)", C); } template void SetSymmetricDifference(const Vector& A, const Vector& B) { // symmetric difference output is at most A.size() + B.size() Vector C(A.size() + B.size()); // set_union returns an iterator C_end denoting the end of input typename Vector::iterator C_end; C_end = thrust::set_symmetric_difference(A.begin(), A.end(), B.begin(), B.end(), C.begin()); // shrink C to exactly fit output C.erase(C_end, C.end()); print("SymmetricDifference(A,B)", C); } template void SetIntersectionSize(const Vector& A, const Vector& B) { // computes the exact size of the intersection without allocating output thrust::discard_iterator<> C_begin, C_end; C_end = thrust::set_intersection(A.begin(), A.end(), B.begin(), B.end(), C_begin); std::cout << "SetIntersectionSize(A,B) " << (C_end - C_begin) << std::endl; } int main(void) { int a[] = {0,2,4,5,6,8,9}; int b[] = {0,1,2,3,5,7,8}; thrust::device_vector A(a, a + sizeof(a) / sizeof(int)); thrust::device_vector B(b, b + sizeof(b) / sizeof(int)); print("Set A", A); print("Set B", B); Merge(A,B); SetUnion(A,B); SetIntersection(A,B); SetDifference(A,B); SetSymmetricDifference(A,B); SetIntersectionSize(A,B); return 0; } thrust-1.9.5/examples/simple_moving_average.cu000066400000000000000000000053541344621116200215770ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // Efficiently computes the simple moving average (SMA) [1] of a data series // using a parallel prefix-sum or "scan" operation. // // Note: additional numerical precision should be used in the cumulative summing // stage when computing the SMA of large data series. The most straightforward // remedy is to replace 'float' with 'double'. Alternatively a Kahan or // "compensated" summation algorithm could be applied [2]. // // [1] http://en.wikipedia.org/wiki/Moving_average#Simple_moving_average // [2] http://en.wikipedia.org/wiki/Kahan_summation_algorithm // compute the difference of two positions in the cumumulative sum and // divide by the SMA window size w. template struct minus_and_divide : public thrust::binary_function { T w; minus_and_divide(T w) : w(w) {} __host__ __device__ T operator()(const T& a, const T& b) const { return (a - b) / w; } }; template void simple_moving_average(const InputVector& data, size_t w, OutputVector& output) { typedef typename InputVector::value_type T; if (data.size() < w) return; // allocate storage for cumulative sum thrust::device_vector temp(data.size() + 1); // compute cumulative sum thrust::exclusive_scan(data.begin(), data.end(), temp.begin()); temp[data.size()] = data.back() + temp[data.size() - 1]; // compute moving averages from cumulative sum thrust::transform(temp.begin() + w, temp.end(), temp.begin(), output.begin(), minus_and_divide(T(w))); } int main(void) { // length of data series size_t n = 30; // window size of the moving average size_t w = 4; // generate random data series thrust::device_vector data(n); thrust::default_random_engine rng; thrust::uniform_int_distribution dist(0, 10); for (size_t i = 0; i < n; i++) data[i] = static_cast(dist(rng)); // allocate storage for averages thrust::device_vector averages(data.size() - (w - 1)); // compute SMA using standard summation simple_moving_average(data, w, averages); // print data series std::cout << "data series: [ "; for (size_t i = 0; i < data.size(); i++) std::cout << data[i] << " "; std::cout << "]" << std::endl; // print moving averages std::cout << "simple moving averages (window = " << w << ")" << std::endl; for (size_t i = 0; i < averages.size(); i++) std::cout << " [" << std::setw(2) << i << "," << std::setw(2) << (i + w) << ") = " << averages[i] << std::endl; return 0; } thrust-1.9.5/examples/sort.cu000066400000000000000000000075541344621116200162300ustar00rootroot00000000000000#include #include #include #include #include // Helper routines void initialize(thrust::device_vector& v) { thrust::default_random_engine rng(123456); thrust::uniform_int_distribution dist(10, 99); for(size_t i = 0; i < v.size(); i++) v[i] = dist(rng); } void initialize(thrust::device_vector& v) { thrust::default_random_engine rng(123456); thrust::uniform_int_distribution dist(2, 19); for(size_t i = 0; i < v.size(); i++) v[i] = dist(rng) / 2.0f; } void initialize(thrust::device_vector< thrust::pair >& v) { thrust::default_random_engine rng(123456); thrust::uniform_int_distribution dist(0,9); for(size_t i = 0; i < v.size(); i++) { int a = dist(rng); int b = dist(rng); v[i] = thrust::make_pair(a,b); } } void initialize(thrust::device_vector& v1, thrust::device_vector& v2) { thrust::default_random_engine rng(123456); thrust::uniform_int_distribution dist(10, 99); for(size_t i = 0; i < v1.size(); i++) { v1[i] = dist(rng); v2[i] = i; } } void print(const thrust::device_vector& v) { for(size_t i = 0; i < v.size(); i++) std::cout << " " << v[i]; std::cout << "\n"; } void print(const thrust::device_vector& v) { for(size_t i = 0; i < v.size(); i++) std::cout << " " << std::fixed << std::setprecision(1) << v[i]; std::cout << "\n"; } void print(const thrust::device_vector< thrust::pair >& v) { for(size_t i = 0; i < v.size(); i++) { thrust::pair p = v[i]; std::cout << " (" << p.first << "," << p.second << ")"; } std::cout << "\n"; } void print(thrust::device_vector& v1, thrust::device_vector v2) { for(size_t i = 0; i < v1.size(); i++) std::cout << " (" << v1[i] << "," << std::setw(2) << v2[i] << ")"; std::cout << "\n"; } // user-defined comparison operator that acts like less, // except even numbers are considered to be smaller than odd numbers struct evens_before_odds { __host__ __device__ bool operator()(int x, int y) { if (x % 2 == y % 2) return x < y; else if (x % 2) return false; else return true; } }; int main(void) { size_t N = 16; std::cout << "sorting integers\n"; { thrust::device_vector keys(N); initialize(keys); print(keys); thrust::sort(keys.begin(), keys.end()); print(keys); } std::cout << "\nsorting integers (descending)\n"; { thrust::device_vector keys(N); initialize(keys); print(keys); thrust::sort(keys.begin(), keys.end(), thrust::greater()); print(keys); } std::cout << "\nsorting integers (user-defined comparison)\n"; { thrust::device_vector keys(N); initialize(keys); print(keys); thrust::sort(keys.begin(), keys.end(), evens_before_odds()); print(keys); } std::cout << "\nsorting floats\n"; { thrust::device_vector keys(N); initialize(keys); print(keys); thrust::sort(keys.begin(), keys.end()); print(keys); } std::cout << "\nsorting pairs\n"; { thrust::device_vector< thrust::pair > keys(N); initialize(keys); print(keys); thrust::sort(keys.begin(), keys.end()); print(keys); } std::cout << "\nkey-value sorting\n"; { thrust::device_vector keys(N); thrust::device_vector values(N); initialize(keys, values); print(keys, values); thrust::sort_by_key(keys.begin(), keys.end(), values.begin()); print(keys, values); } std::cout << "\nkey-value sorting (descending)\n"; { thrust::device_vector keys(N); thrust::device_vector values(N); initialize(keys, values); print(keys, values); thrust::sort_by_key(keys.begin(), keys.end(), values.begin(), thrust::greater()); print(keys, values); } return 0; } thrust-1.9.5/examples/sorting_aos_vs_soa.cu000066400000000000000000000045241344621116200211340ustar00rootroot00000000000000#include #include #include #include #include "include/timer.h" // This examples compares sorting performance using Array of Structures (AoS) // and Structure of Arrays (SoA) data layout. Legacy applications will often // store data in C/C++ structs, such as MyStruct defined below. Although // Thrust can process array of structs, it is typically less efficient than // the equivalent structure of arrays layout. In this particular example, // the optimized SoA approach is approximately *five times faster* than the // traditional AoS method. Therefore, it is almost always worthwhile to // convert AoS data structures to SoA. struct MyStruct { int key; float value; __host__ __device__ bool operator<(const MyStruct other) const { return key < other.key; } }; void initialize_keys(thrust::device_vector& keys) { thrust::default_random_engine rng; thrust::uniform_int_distribution dist(0, 2147483647); thrust::host_vector h_keys(keys.size()); for(size_t i = 0; i < h_keys.size(); i++) h_keys[i] = dist(rng); keys = h_keys; } void initialize_keys(thrust::device_vector& structures) { thrust::default_random_engine rng; thrust::uniform_int_distribution dist(0, 2147483647); thrust::host_vector h_structures(structures.size()); for(size_t i = 0; i < h_structures.size(); i++) h_structures[i].key = dist(rng); structures = h_structures; } int main(void) { size_t N = 2 * 1024 * 1024; // Sort Key-Value pairs using Array of Structures (AoS) storage { thrust::device_vector structures(N); initialize_keys(structures); timer t; thrust::sort(structures.begin(), structures.end()); assert(thrust::is_sorted(structures.begin(), structures.end())); std::cout << "AoS sort took " << 1e3 * t.elapsed() << " milliseconds" << std::endl; } // Sort Key-Value pairs using Structure of Arrays (SoA) storage { thrust::device_vector keys(N); thrust::device_vector values(N); initialize_keys(keys); timer t; thrust::sort_by_key(keys.begin(), keys.end(), values.begin()); assert(thrust::is_sorted(keys.begin(), keys.end())); std::cout << "SoA sort took " << 1e3 * t.elapsed() << " milliseconds" << std::endl; } return 0; } thrust-1.9.5/examples/sparse_vector.cu000066400000000000000000000074471344621116200201210ustar00rootroot00000000000000#include #include #include #include #include #include #include template void print_sparse_vector(const IndexVector& A_index, const ValueVector& A_value) { // sanity test assert(A_index.size() == A_value.size()); for(size_t i = 0; i < A_index.size(); i++) std::cout << "(" << A_index[i] << "," << A_value[i] << ") "; std::cout << std::endl; } template void sum_sparse_vectors(const IndexVector1& A_index, const ValueVector1& A_value, const IndexVector2& B_index, const ValueVector2& B_value, IndexVector3& C_index, ValueVector3& C_value) { typedef typename IndexVector3::value_type IndexType; typedef typename ValueVector3::value_type ValueType; // sanity test assert(A_index.size() == A_value.size()); assert(B_index.size() == B_value.size()); size_t A_size = A_index.size(); size_t B_size = B_index.size(); // allocate storage for the combined contents of sparse vectors A and B IndexVector3 temp_index(A_size + B_size); ValueVector3 temp_value(A_size + B_size); // merge A and B by index thrust::merge_by_key(A_index.begin(), A_index.end(), B_index.begin(), B_index.end(), A_value.begin(), B_value.begin(), temp_index.begin(), temp_value.begin()); // compute number of unique indices size_t C_size = thrust::inner_product(temp_index.begin(), temp_index.end() - 1, temp_index.begin() + 1, size_t(0), thrust::plus(), thrust::not_equal_to()) + 1; // allocate space for output C_index.resize(C_size); C_value.resize(C_size); // sum values with the same index thrust::reduce_by_key(temp_index.begin(), temp_index.end(), temp_value.begin(), C_index.begin(), C_value.begin(), thrust::equal_to(), thrust::plus()); } int main(void) { // initialize sparse vector A with 4 elements thrust::device_vector A_index(4); thrust::device_vector A_value(4); A_index[0] = 2; A_value[0] = 10; A_index[1] = 3; A_value[1] = 60; A_index[2] = 5; A_value[2] = 20; A_index[3] = 8; A_value[3] = 40; // initialize sparse vector B with 6 elements thrust::device_vector B_index(6); thrust::device_vector B_value(6); B_index[0] = 1; B_value[0] = 50; B_index[1] = 2; B_value[1] = 30; B_index[2] = 4; B_value[2] = 80; B_index[3] = 5; B_value[3] = 30; B_index[4] = 7; B_value[4] = 90; B_index[5] = 8; B_value[5] = 10; // compute sparse vector C = A + B thrust::device_vector C_index; thrust::device_vector C_value; sum_sparse_vectors(A_index, A_value, B_index, B_value, C_index, C_value); std::cout << "Computing C = A + B for sparse vectors A and B" << std::endl; std::cout << "A "; print_sparse_vector(A_index, A_value); std::cout << "B "; print_sparse_vector(B_index, B_value); std::cout << "C "; print_sparse_vector(C_index, C_value); } thrust-1.9.5/examples/stream_compaction.cu000066400000000000000000000043331344621116200207400ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // this functor returns true if the argument is odd, and false otherwise template struct is_odd : public thrust::unary_function { __host__ __device__ bool operator()(T x) { return x % 2; } }; template void print_range(const std::string& name, Iterator first, Iterator last) { typedef typename std::iterator_traits::value_type T; std::cout << name << ": "; thrust::copy(first, last, std::ostream_iterator(std::cout, " ")); std::cout << "\n"; } int main(void) { // input size size_t N = 10; // define some types typedef thrust::device_vector Vector; typedef Vector::iterator Iterator; // allocate storage for array Vector values(N); // initialize array to [0, 1, 2, ... ] thrust::sequence(values.begin(), values.end()); print_range("values", values.begin(), values.end()); // allocate output storage, here we conservatively assume all values will be copied Vector output(values.size()); // copy odd numbers to separate array Iterator output_end = thrust::copy_if(values.begin(), values.end(), output.begin(), is_odd()); print_range("output", output.begin(), output_end); // another approach is to count the number of values that will // be copied, and allocate an array of the right size size_t N_odd = thrust::count_if(values.begin(), values.end(), is_odd()); Vector small_output(N_odd); thrust::copy_if(values.begin(), values.end(), small_output.begin(), is_odd()); print_range("small_output", small_output.begin(), small_output.end()); // we can also compact sequences with the remove functions, which do the opposite of copy Iterator values_end = thrust::remove_if(values.begin(), values.end(), is_odd()); // since the values after values_end are garbage, we'll resize the vector values.resize(values_end - values.begin()); print_range("values", values.begin(), values.end()); return 0; } thrust-1.9.5/examples/strided_range.cu000066400000000000000000000061231344621116200200420ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // this example illustrates how to make strided access to a range of values // examples: // strided_range([0, 1, 2, 3, 4, 5, 6], 1) -> [0, 1, 2, 3, 4, 5, 6] // strided_range([0, 1, 2, 3, 4, 5, 6], 2) -> [0, 2, 4, 6] // strided_range([0, 1, 2, 3, 4, 5, 6], 3) -> [0, 3, 6] // ... template class strided_range { public: typedef typename thrust::iterator_difference::type difference_type; struct stride_functor : public thrust::unary_function { difference_type stride; stride_functor(difference_type stride) : stride(stride) {} __host__ __device__ difference_type operator()(const difference_type& i) const { return stride * i; } }; typedef typename thrust::counting_iterator CountingIterator; typedef typename thrust::transform_iterator TransformIterator; typedef typename thrust::permutation_iterator PermutationIterator; // type of the strided_range iterator typedef PermutationIterator iterator; // construct strided_range for the range [first,last) strided_range(Iterator first, Iterator last, difference_type stride) : first(first), last(last), stride(stride) {} iterator begin(void) const { return PermutationIterator(first, TransformIterator(CountingIterator(0), stride_functor(stride))); } iterator end(void) const { return begin() + ((last - first) + (stride - 1)) / stride; } protected: Iterator first; Iterator last; difference_type stride; }; int main(void) { thrust::device_vector data(8); data[0] = 10; data[1] = 20; data[2] = 30; data[3] = 40; data[4] = 50; data[5] = 60; data[6] = 70; data[7] = 80; // print the initial data std::cout << "data: "; thrust::copy(data.begin(), data.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; typedef thrust::device_vector::iterator Iterator; // create strided_range with indices [0,2,4,6] strided_range evens(data.begin(), data.end(), 2); std::cout << "sum of even indices: " << thrust::reduce(evens.begin(), evens.end()) << std::endl; // create strided_range with indices [1,3,5,7] strided_range odds(data.begin() + 1, data.end(), 2); std::cout << "sum of odd indices: " << thrust::reduce(odds.begin(), odds.end()) << std::endl; // set odd elements to 0 with fill() std::cout << "setting odd indices to zero: "; thrust::fill(odds.begin(), odds.end(), 0); thrust::copy(data.begin(), data.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; return 0; } thrust-1.9.5/examples/sum.cu000066400000000000000000000015631344621116200160370ustar00rootroot00000000000000#include #include #include #include #include #include int my_rand(void) { static thrust::default_random_engine rng; static thrust::uniform_int_distribution dist(0, 9999); return dist(rng); } int main(void) { // generate random data on the host thrust::host_vector h_vec(100); thrust::generate(h_vec.begin(), h_vec.end(), my_rand); // transfer to device and compute sum thrust::device_vector d_vec = h_vec; // initial value of the reduction int init = 0; // binary operation used to reduce values thrust::plus binary_op; // compute sum on the device int sum = thrust::reduce(d_vec.begin(), d_vec.end(), init, binary_op); // print the sum std::cout << "sum is " << sum << std::endl; return 0; } thrust-1.9.5/examples/sum_rows.cu000066400000000000000000000031171344621116200171060ustar00rootroot00000000000000#include #include #include #include #include #include #include // convert a linear index to a row index template struct linear_index_to_row_index : public thrust::unary_function { T C; // number of columns __host__ __device__ linear_index_to_row_index(T C) : C(C) {} __host__ __device__ T operator()(T i) { return i / C; } }; int main(void) { int R = 5; // number of rows int C = 8; // number of columns thrust::default_random_engine rng; thrust::uniform_int_distribution dist(10, 99); // initialize data thrust::device_vector array(R * C); for (size_t i = 0; i < array.size(); i++) array[i] = dist(rng); // allocate storage for row sums and indices thrust::device_vector row_sums(R); thrust::device_vector row_indices(R); // compute row sums by summing values with equal row indices thrust::reduce_by_key (thrust::make_transform_iterator(thrust::counting_iterator(0), linear_index_to_row_index(C)), thrust::make_transform_iterator(thrust::counting_iterator(0), linear_index_to_row_index(C)) + (R*C), array.begin(), row_indices.begin(), row_sums.begin(), thrust::equal_to(), thrust::plus()); // print data for(int i = 0; i < R; i++) { std::cout << "[ "; for(int j = 0; j < C; j++) std::cout << array[i * C + j] << " "; std::cout << "] = " << row_sums[i] << "\n"; } return 0; } thrust-1.9.5/examples/summary_statistics.cu000066400000000000000000000114721344621116200212020ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // This example computes several statistical properties of a data // series in a single reduction. The algorithm is described in detail here: // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm // // Thanks to Joseph Rhoads for contributing this example // structure used to accumulate the moments and other // statistical properties encountered so far. template struct summary_stats_data { T n; T min; T max; T mean; T M2; T M3; T M4; // initialize to the identity element void initialize() { n = mean = M2 = M3 = M4 = 0; min = std::numeric_limits::max(); max = std::numeric_limits::min(); } T variance() { return M2 / (n - 1); } T variance_n() { return M2 / n; } T skewness() { return std::sqrt(n) * M3 / std::pow(M2, (T) 1.5); } T kurtosis() { return n * M4 / (M2 * M2); } }; // stats_unary_op is a functor that takes in a value x and // returns a variace_data whose mean value is initialized to x. template struct summary_stats_unary_op { __host__ __device__ summary_stats_data operator()(const T& x) const { summary_stats_data result; result.n = 1; result.min = x; result.max = x; result.mean = x; result.M2 = 0; result.M3 = 0; result.M4 = 0; return result; } }; // summary_stats_binary_op is a functor that accepts two summary_stats_data // structs and returns a new summary_stats_data which are an // approximation to the summary_stats for // all values that have been agregated so far template struct summary_stats_binary_op : public thrust::binary_function&, const summary_stats_data&, summary_stats_data > { __host__ __device__ summary_stats_data operator()(const summary_stats_data& x, const summary_stats_data & y) const { summary_stats_data result; // precompute some common subexpressions T n = x.n + y.n; T n2 = n * n; T n3 = n2 * n; T delta = y.mean - x.mean; T delta2 = delta * delta; T delta3 = delta2 * delta; T delta4 = delta3 * delta; //Basic number of samples (n), min, and max result.n = n; result.min = thrust::min(x.min, y.min); result.max = thrust::max(x.max, y.max); result.mean = x.mean + delta * y.n / n; result.M2 = x.M2 + y.M2; result.M2 += delta2 * x.n * y.n / n; result.M3 = x.M3 + y.M3; result.M3 += delta3 * x.n * y.n * (x.n - y.n) / n2; result.M3 += (T) 3.0 * delta * (x.n * y.M2 - y.n * x.M2) / n; result.M4 = x.M4 + y.M4; result.M4 += delta4 * x.n * y.n * (x.n * x.n - x.n * y.n + y.n * y.n) / n3; result.M4 += (T) 6.0 * delta2 * (x.n * x.n * y.M2 + y.n * y.n * x.M2) / n2; result.M4 += (T) 4.0 * delta * (x.n * y.M3 - y.n * x.M3) / n; return result; } }; template void print_range(const std::string& name, Iterator first, Iterator last) { typedef typename std::iterator_traits::value_type T; std::cout << name << ": "; thrust::copy(first, last, std::ostream_iterator(std::cout, " ")); std::cout << "\n"; } int main(void) { typedef float T; // initialize host array T h_x[] = {4, 7, 13, 16}; // transfer to device thrust::device_vector d_x(h_x, h_x + sizeof(h_x) / sizeof(T)); // setup arguments summary_stats_unary_op unary_op; summary_stats_binary_op binary_op; summary_stats_data init; init.initialize(); // compute summary statistics summary_stats_data result = thrust::transform_reduce(d_x.begin(), d_x.end(), unary_op, init, binary_op); std::cout <<"******Summary Statistics Example*****"< #include #include #include #include #include #include #include #include // This example computes a summed area table using segmented scan // http://en.wikipedia.org/wiki/Summed_area_table // convert a linear index to a linear index in the transpose struct transpose_index : public thrust::unary_function { size_t m, n; __host__ __device__ transpose_index(size_t _m, size_t _n) : m(_m), n(_n) {} __host__ __device__ size_t operator()(size_t linear_index) { size_t i = linear_index / n; size_t j = linear_index % n; return m * j + i; } }; // convert a linear index to a row index struct row_index : public thrust::unary_function { size_t n; __host__ __device__ row_index(size_t _n) : n(_n) {} __host__ __device__ size_t operator()(size_t i) { return i / n; } }; // transpose an M-by-N array template void transpose(size_t m, size_t n, thrust::device_vector& src, thrust::device_vector& dst) { thrust::counting_iterator indices(0); thrust::gather (thrust::make_transform_iterator(indices, transpose_index(n, m)), thrust::make_transform_iterator(indices, transpose_index(n, m)) + dst.size(), src.begin(), dst.begin()); } // scan the rows of an M-by-N array template void scan_horizontally(size_t n, thrust::device_vector& d_data) { thrust::counting_iterator indices(0); thrust::inclusive_scan_by_key (thrust::make_transform_iterator(indices, row_index(n)), thrust::make_transform_iterator(indices, row_index(n)) + d_data.size(), d_data.begin(), d_data.begin()); } // print an M-by-N array template void print(size_t m, size_t n, thrust::device_vector& d_data) { thrust::host_vector h_data = d_data; for(size_t i = 0; i < m; i++) { for(size_t j = 0; j < n; j++) std::cout << std::setw(8) << h_data[i * n + j] << " "; std::cout << "\n"; } } int main(void) { size_t m = 3; // number of rows size_t n = 4; // number of columns // 2d array stored in row-major order [(0,0), (0,1), (0,2) ... ] thrust::device_vector data(m * n, 1); std::cout << "[step 0] initial array" << std::endl; print(m, n, data); std::cout << "[step 1] scan horizontally" << std::endl; scan_horizontally(n, data); print(m, n, data); std::cout << "[step 2] transpose array" << std::endl; thrust::device_vector temp(m * n); transpose(m, n, data, temp); print(n, m, temp); std::cout << "[step 3] scan transpose horizontally" << std::endl; scan_horizontally(m, temp); print(n, m, temp); std::cout << "[step 4] transpose the transpose" << std::endl; transpose(n, m, temp, data); print(m, n, data); return 0; } thrust-1.9.5/examples/tiled_range.cu000066400000000000000000000054761344621116200175170ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // this example illustrates how to tile a range multiple times // examples: // tiled_range([0, 1, 2, 3], 1) -> [0, 1, 2, 3] // tiled_range([0, 1, 2, 3], 2) -> [0, 1, 2, 3, 0, 1, 2, 3] // tiled_range([0, 1, 2, 3], 3) -> [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] // ... template class tiled_range { public: typedef typename thrust::iterator_difference::type difference_type; struct tile_functor : public thrust::unary_function { difference_type tile_size; tile_functor(difference_type tile_size) : tile_size(tile_size) {} __host__ __device__ difference_type operator()(const difference_type& i) const { return i % tile_size; } }; typedef typename thrust::counting_iterator CountingIterator; typedef typename thrust::transform_iterator TransformIterator; typedef typename thrust::permutation_iterator PermutationIterator; // type of the tiled_range iterator typedef PermutationIterator iterator; // construct repeated_range for the range [first,last) tiled_range(Iterator first, Iterator last, difference_type tiles) : first(first), last(last), tiles(tiles) {} iterator begin(void) const { return PermutationIterator(first, TransformIterator(CountingIterator(0), tile_functor(last - first))); } iterator end(void) const { return begin() + tiles * (last - first); } protected: Iterator first; Iterator last; difference_type tiles; }; int main(void) { thrust::device_vector data(4); data[0] = 10; data[1] = 20; data[2] = 30; data[3] = 40; // print the initial data std::cout << "range "; thrust::copy(data.begin(), data.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; typedef thrust::device_vector::iterator Iterator; // create tiled_range with two tiles tiled_range two(data.begin(), data.end(), 2); std::cout << "two tiles: "; thrust::copy(two.begin(), two.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; // create tiled_range with three tiles tiled_range three(data.begin(), data.end(), 3); std::cout << "three tiles: "; thrust::copy(three.begin(), three.end(), std::ostream_iterator(std::cout, " ")); std::cout << std::endl; return 0; } thrust-1.9.5/examples/transform_iterator.cu000066400000000000000000000074461344621116200211650ustar00rootroot00000000000000#include #include #include #include #include #include #include #include // this functor clamps a value to the range [lo, hi] template struct clamp : public thrust::unary_function { T lo, hi; __host__ __device__ clamp(T _lo, T _hi) : lo(_lo), hi(_hi) {} __host__ __device__ T operator()(T x) { if (x < lo) return lo; else if (x < hi) return x; else return hi; } }; template struct simple_negate : public thrust::unary_function { __host__ __device__ T operator()(T x) { return -x; } }; template void print_range(const std::string& name, Iterator first, Iterator last) { typedef typename std::iterator_traits::value_type T; std::cout << name << ": "; thrust::copy(first, last, std::ostream_iterator(std::cout, " ")); std::cout << "\n"; } int main(void) { // clamp values to the range [1, 5] int lo = 1; int hi = 5; // define some types typedef thrust::device_vector Vector; typedef Vector::iterator VectorIterator; // initialize values Vector values(8); values[0] = 2; values[1] = 5; values[2] = 7; values[3] = 1; values[4] = 6; values[5] = 0; values[6] = 3; values[7] = 8; print_range("values ", values.begin(), values.end()); // define some more types typedef thrust::transform_iterator, VectorIterator> ClampedVectorIterator; // create a transform_iterator that applies clamp() to the values array ClampedVectorIterator cv_begin = thrust::make_transform_iterator(values.begin(), clamp(lo, hi)); ClampedVectorIterator cv_end = cv_begin + values.size(); // now [clamped_begin, clamped_end) defines a sequence of clamped values print_range("clamped values ", cv_begin, cv_end); //// // compute the sum of the clamped sequence with reduce() std::cout << "sum of clamped values : " << thrust::reduce(cv_begin, cv_end) << "\n"; //// // combine transform_iterator with other fancy iterators like counting_iterator typedef thrust::counting_iterator CountingIterator; typedef thrust::transform_iterator, CountingIterator> ClampedCountingIterator; CountingIterator count_begin(0); CountingIterator count_end(10); print_range("sequence ", count_begin, count_end); ClampedCountingIterator cs_begin = thrust::make_transform_iterator(count_begin, clamp(lo, hi)); ClampedCountingIterator cs_end = thrust::make_transform_iterator(count_end, clamp(lo, hi)); print_range("clamped sequence ", cs_begin, cs_end); //// // combine transform_iterator with another transform_iterator typedef thrust::transform_iterator, ClampedCountingIterator> NegatedClampedCountingIterator; NegatedClampedCountingIterator ncs_begin = thrust::make_transform_iterator(cs_begin, thrust::negate()); NegatedClampedCountingIterator ncs_end = thrust::make_transform_iterator(cs_end, thrust::negate()); print_range("negated sequence ", ncs_begin, ncs_end); //// // when a functor does not define result_type, a third template argument must be provided typedef thrust::transform_iterator, VectorIterator, int> NegatedVectorIterator; NegatedVectorIterator nv_begin(values.begin(), simple_negate()); NegatedVectorIterator nv_end(values.end(), simple_negate()); print_range("negated values ", nv_begin, nv_end); return 0; } thrust-1.9.5/examples/transform_output_iterator.cu000066400000000000000000000021521344621116200225720ustar00rootroot00000000000000#include #include #include #include #include struct Functor { template __host__ __device__ float operator()(const Tuple& tuple) const { const float x = thrust::get<0>(tuple); const float y = thrust::get<1>(tuple); return x*y*2.0f / 3.0f; } }; int main(void) { float u[4] = { 4 , 3, 2, 1}; float v[4] = {-1, 1, 1, -1}; int idx[3] = {3, 0, 1}; float w[3] = {0, 0, 0}; thrust::device_vector U(u, u + 4); thrust::device_vector V(v, v + 4); thrust::device_vector IDX(idx, idx + 3); thrust::device_vector W(w, w + 3); // gather multiple elements and apply a function before writing result in memory thrust::gather( IDX.begin(), IDX.end(), thrust::make_zip_iterator(thrust::make_tuple(U.begin(), V.begin())), thrust::make_transform_output_iterator(W.begin(), Functor())); std::cout << "result= [ "; for (size_t i = 0; i < 3; i++) std::cout << W[i] << " "; std::cout << "] \n"; return 0; } thrust-1.9.5/examples/uninitialized_vector.cu000066400000000000000000000044501344621116200214630ustar00rootroot00000000000000// Occasionally, it is advantageous to avoid initializing the individual // elements of a device_vector. For example, the default behavior of // zero-initializing numeric data may introduce undesirable overhead. // This example demonstrates how to avoid default construction of a // device_vector's data by using a custom allocator. #include #include #include #include #include // uninitialized_allocator is an allocator which // derives from device_allocator and which has a // no-op construct member function template struct uninitialized_allocator : thrust::device_allocator { // the default generated constructors and destructors are implicitly // marked __host__ __device__, but the current Thrust device_allocator // can only be constructed and destroyed on the host; therefore, we // define these as host only __host__ uninitialized_allocator() {} __host__ uninitialized_allocator(const uninitialized_allocator & other) : thrust::device_allocator(other) {} __host__ ~uninitialized_allocator() {} // for correctness, you should also redefine rebind when you inherit // from an allocator type; this way, if the allocator is rebound somewhere, // it's going to be rebound to the correct type - and not to its base // type for U template struct rebind { typedef uninitialized_allocator other; }; // note that construct is annotated as // a __host__ __device__ function __host__ __device__ void construct(T *p) { // no-op } }; // to make a device_vector which does not initialize its elements, // use uninitialized_allocator as the 2nd template parameter typedef thrust::device_vector > uninitialized_vector; int main() { uninitialized_vector vec(10); // the initial value of vec's 10 elements is undefined // resize without default value does not initialize elements vec.resize(20); // resize with default value does initialize elements vec.resize(30, 13); // the value of elements [0,20) is still undefined // but the value of elements [20,30) is 13: using namespace thrust::placeholders; assert(thrust::all_of(vec.begin() + 20, vec.end(), _1 == 13)); return 0; } thrust-1.9.5/examples/version.cu000066400000000000000000000005301344621116200167110ustar00rootroot00000000000000#include #include int main(void) { int major = THRUST_MAJOR_VERSION; int minor = THRUST_MINOR_VERSION; int subminor = THRUST_SUBMINOR_VERSION; int patch = THRUST_PATCH_NUMBER; std::cout << "Thrust v" << major << "." << minor << "." << subminor << "-" << patch << std::endl; return 0; } thrust-1.9.5/examples/weld_vertices.cu000066400000000000000000000050521344621116200200670ustar00rootroot00000000000000#include #include #include #include #include #include /* * This example "welds" triangle vertices together by taking as * input "triangle soup" and eliminating redundant vertex positions * and shared edges. A connected mesh is the result. * * * Input: 9 vertices representing a mesh with 3 triangles * * Mesh Vertices * ------ (2) (5)--(4) (8) * | \ 2| \ | \ \ | | \ * | \ | \ <-> | \ \ | | \ * | 0 \| 1 \ | \ \ | | \ * ----------- (0)--(1) (3) (6)--(7) * * (vertex 1 equals vertex 3, vertex 2 equals vertex 5, ...) * * Output: mesh representation with 5 vertices and 9 indices * * Vertices Indices * (1)--(3) [(0,2,1), * | \ | \ (2,3,1), * | \ | \ (2,4,3)] * | \| \ * (0)--(2)--(4) */ // define a 2d float vector typedef thrust::tuple vec2; int main(void) { // allocate memory for input mesh representation thrust::device_vector input(9); input[0] = vec2(0,0); // First Triangle input[1] = vec2(1,0); input[2] = vec2(0,1); input[3] = vec2(1,0); // Second Triangle input[4] = vec2(1,1); input[5] = vec2(0,1); input[6] = vec2(1,0); // Third Triangle input[7] = vec2(2,0); input[8] = vec2(1,1); // allocate space for output mesh representation thrust::device_vector vertices = input; thrust::device_vector indices(input.size()); // sort vertices to bring duplicates together thrust::sort(vertices.begin(), vertices.end()); // find unique vertices and erase redundancies vertices.erase(thrust::unique(vertices.begin(), vertices.end()), vertices.end()); // find index of each input vertex in the list of unique vertices thrust::lower_bound(vertices.begin(), vertices.end(), input.begin(), input.end(), indices.begin()); // print output mesh representation std::cout << "Output Representation" << std::endl; for(size_t i = 0; i < vertices.size(); i++) { vec2 v = vertices[i]; std::cout << " vertices[" << i << "] = (" << thrust::get<0>(v) << "," << thrust::get<1>(v) << ")" << std::endl; } for(size_t i = 0; i < indices.size(); i++) { std::cout << " indices[" << i << "] = " << indices[i] << std::endl; } return 0; } thrust-1.9.5/examples/word_count.cu000066400000000000000000000053351344621116200174170ustar00rootroot00000000000000#include #include #include #include #include // This example computes the number of words in a text sample // with a single call to thrust::inner_product. The algorithm // counts the number of characters which start a new word, i.e. // the number of characters where input[i] is an alphabetical // character and input[i-1] is not an alphabetical character. // determines whether the character is alphabetical __host__ __device__ bool is_alpha(const char c) { return (c >= 'A' && c <= 'z'); } // determines whether the right character begins a new word struct is_word_start : public thrust::binary_function { __host__ __device__ bool operator()(const char& left, const char& right) const { return is_alpha(right) && !is_alpha(left); } }; int word_count(const thrust::device_vector& input) { // check for empty string if (input.empty()) return 0; // compute the number characters that start a new word int wc = thrust::inner_product(input.begin(), input.end() - 1, // sequence of left characters input.begin() + 1, // sequence of right characters 0, // initialize sum to 0 thrust::plus(), // sum values together is_word_start()); // how to compare the left and right characters // if the first character is alphabetical, then it also begins a word if (is_alpha(input.front())) wc++; return wc; } int main(void) { // Paragraph from 'The Raven' by Edgar Allan Poe // http://en.wikipedia.org/wiki/The_Raven const char raw_input[] = " But the raven, sitting lonely on the placid bust, spoke only,\n" " That one word, as if his soul in that one word he did outpour.\n" " Nothing further then he uttered - not a feather then he fluttered -\n" " Till I scarcely more than muttered `Other friends have flown before -\n" " On the morrow he will leave me, as my hopes have flown before.'\n" " Then the bird said, `Nevermore.'\n"; std::cout << "Text sample:" << std::endl; std::cout << raw_input << std::endl; // transfer to device thrust::device_vector input(raw_input, raw_input + sizeof(raw_input)); // count words int wc = word_count(input); std::cout << "Text sample contains " << wc << " words" << std::endl; return 0; } thrust-1.9.5/generate_mk.py000077500000000000000000000106141344621116200157170ustar00rootroot00000000000000#!/usr/bin/env python # Generate set of projects mk files. # Usage: python generate_mk.py PROJECTS_MK_DIR THRUST_SOURCE_DIR # The program scans through unit tests and examples in THRUST_SOURCE_DIR # and generates project mk for each of the tests and examples in PROJECTS_MK_DIR # A single example or unit test source file generates its own executable # This program is called by a top level Makefile, but can also be used stand-alone for debugging # This program also generates testing.mk, examples.mk and dependencies.mk import sys import shutil as sh import os import glob import re test_template = """ TEST_SRC := %(TEST_SRC)s TEST_NAME := %(TEST_NAME)s include $(ROOTDIR)/thrust/internal/build/generic_test.mk """ example_template = """ EXAMPLE_SRC := %(EXAMPLE_SRC)s EXAMPLE_NAME := %(EXAMPLE_NAME)s include $(ROOTDIR)/thrust/internal/build/generic_example.mk """ def Glob(pattern, directory,exclude='\B'): src = glob.glob(os.path.join(directory,pattern)) p = re.compile(exclude) src = [s for s in src if not p.match(s)] return src def generate_test_mk(mk_path, test_path, group, TEST_DIR): print 'Generating makefiles in "'+mk_path+'" for tests in "'+test_path+'"' src_cu = Glob("*.cu", test_path, ".*testframework.cu$") src_cxx = Glob("*.cpp", test_path, ".*testframework.cpp$") src_cu.sort(); src_cxx.sort(); src_all = src_cu + src_cxx; tests_all = [] dependencies_all = [] for s in src_all: fn = os.path.splitext(os.path.basename(s)); t = "thrust."+group+"."+fn[0] e = fn[1] mkfile = test_template % {"TEST_SRC" : s, "TEST_NAME" : t} f = open(os.path.join(mk_path,t+".mk"), 'w') f.write(mkfile) f.close() tests_all.append(os.path.join(mk_path,t)) dependencies_all.append(t+": testframework") return [tests_all, dependencies_all] def generate_example_mk(mk_path, example_path, group, EXAMPLE_DIR): print 'Generating makefiles in "'+mk_path+'" for examples in "'+example_path+'"' src_cu = Glob("*.cu", example_path) src_cxx = Glob("*.cpp", example_path) src_cu.sort(); src_cxx.sort(); src_all = src_cu + src_cxx; examples_all = [] for s in src_all: fn = os.path.splitext(os.path.basename(s)); t = "thrust."+group+"."+fn[0] e = fn[1] mkfile = example_template % {"EXAMPLE_SRC" : s, "EXAMPLE_NAME" : t} f = open(os.path.join(mk_path,t+".mk"), 'w') f.write(mkfile) f.close() examples_all.append(os.path.join(mk_path,t)) return examples_all ## relpath : backported from os.relpath form python 2.6+ def relpath(path, start): """Return a relative version of a path""" import posixpath if not path: raise ValueError("no path specified") start_list = posixpath.abspath(start).split(posixpath.sep) path_list = posixpath.abspath(path).split(posixpath.sep) # Work out how much of the filepath is shared by start and path. i = len(posixpath.commonprefix([start_list, path_list])) rel_list = [posixpath.pardir] * (len(start_list)-i) + path_list[i:] if not rel_list: return posixpath.curdir return posixpath.join(*rel_list) mk_path=sys.argv[1] REL_DIR="../../" if (len(sys.argv) > 2): root_path=sys.argv[2]; mk_path = relpath(mk_path, root_path) REL_DIR = relpath(root_path,mk_path) try: sh.rmtree(mk_path) except: pass os.makedirs(mk_path) tests_all, dependencies_all = generate_test_mk(mk_path, "testing/", "test", REL_DIR) tests_cu, dependencies_cu = generate_test_mk(mk_path, "testing/backend/cuda/", "test.cuda", REL_DIR) tests_all.extend(tests_cu) dependencies_all.extend(dependencies_cu) testing_mk = "" for t in tests_all: testing_mk += "PROJECTS += "+t+"\n" testing_mk += "PROJECTS += internal/build/testframework\n" f = open(os.path.join(mk_path,"testing.mk"),'w') f.write(testing_mk) f.close() dependencies_mk = "" for d in dependencies_all: dependencies_mk += d + "\n" f = open(os.path.join(mk_path,"dependencies.mk"),'w') f.write(dependencies_mk) f.close() examples_mk = "" examples_all = generate_example_mk(mk_path, "examples/", "example", REL_DIR) examples_cuda = generate_example_mk(mk_path, "examples/cuda/", "example.cuda", REL_DIR) examples_all.extend(examples_cuda) for e in examples_all: examples_mk += "PROJECTS += "+e+"\n" f = open(os.path.join(mk_path,"examples.mk"),'w') f.write(examples_mk) f.close() thrust-1.9.5/internal/000077500000000000000000000000001344621116200146735ustar00rootroot00000000000000thrust-1.9.5/internal/benchmark/000077500000000000000000000000001344621116200166255ustar00rootroot00000000000000thrust-1.9.5/internal/benchmark/README.txt000066400000000000000000000021671344621116200203310ustar00rootroot00000000000000Directions for compiling and running the benchmark with Ubuntu Linux: Install Intel's Threading Building Blocks library (TBB): $ sudo apt-get install libtbb-dev Compile the benchmark: $ nvcc -O3 -arch=sm_20 bench.cu -ltbb -o bench Run the benchmark: $ ./bench Typical output (Tesla C2050): Benchmarking with input size 33554432 Core Primitive Performance (elements per second) Algorithm, STL, TBB, Thrust reduce, 3121746688, 3739585536, 26134038528 transform, 1869492736, 2347719424, 13804681216 scan, 1394143744, 1439394816, 5039195648 sort, 11070660, 34622352, 673543168 Sorting Performance (keys per second) Type, STL, TBB, Thrust char, 24050078, 62987040, 2798874368 short, 15644141, 41275164, 1428603008 int, 11062616, 33478628, 682295744 long, 11249874, 33972564, 219719184 float, 9850043, 29011806, 692407232 double, 9700181, 27153626, 224345568 The reported numbers are performance rates in "elements per second" (higher is better). thrust-1.9.5/internal/benchmark/bench.cu000066400000000000000000001042531344621116200202420ustar00rootroot00000000000000#include #include #include #include #include #include #include #include #include #include #include #include #include #include // For `atoi`. #include // For CHAR_BIT. #include // For `sqrt` and `abs`. #include // For `intN_t`. #include "random.h" #include "timer.h" #if defined(HAVE_TBB) #include "tbb_algos.h" #endif #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA #include // For `thrust::system_error` #include // For `thrust::cuda_category` #endif // We don't use THRUST_PP_STRINGIZE and THRUST_PP_CAT because they are new, and // we want this benchmark to be backwards-compatible to older versions of Thrust. #define PP_STRINGIZE_(expr) #expr #define PP_STRINGIZE(expr) PP_STRINGIZE_(expr) #define PP_CAT(a, b) a ## b // We don't use THRUST_NOEXCEPT because it's new, and we want this benchmark to // be backwards-compatible to older versions of Thrust. #if __cplusplus >= 201103L #define NOEXCEPT noexcept #else #define NOEXCEPT throw() #endif /////////////////////////////////////////////////////////////////////////////// template struct squared_difference { private: T const average; public: __host__ __device__ squared_difference(squared_difference const& rhs) : average(rhs.average) {} __host__ __device__ squared_difference(T average_) : average(average_) {} __host__ __device__ T operator()(T x) const { return (x - average) * (x - average); } }; template struct value_and_count { T value; uint64_t count; __host__ __device__ value_and_count(value_and_count const& other) : value(other.value), count(other.count) {} __host__ __device__ value_and_count(T const& value_) : value(value_), count(1) {} __host__ __device__ value_and_count(T const& value_, uint64_t count_) : value(value_), count(count_) {} __host__ __device__ value_and_count& operator=(value_and_count const& other) { value = other.value; count = other.count; return *this; } __host__ __device__ value_and_count& operator=(T const& value_) { value = value_; count = 1; return *this; } }; template struct counting_op { private: ReduceOp reduce; public: __host__ __device__ counting_op() : reduce() {} __host__ __device__ counting_op(counting_op const& other) : reduce(other.reduce) {} __host__ __device__ counting_op(ReduceOp const& reduce_) : reduce(reduce_) {} __host__ __device__ value_and_count operator()( value_and_count const& x , T const& y ) const { return value_and_count(reduce(x.value, y), x.count + 1); } __host__ __device__ value_and_count operator()( value_and_count const& x , value_and_count const& y ) const { return value_and_count(reduce(x.value, y.value), x.count + y.count); } }; template T arithmetic_mean(InputIt first, InputIt last, T init) { value_and_count init_vc(init, 0); counting_op > reduce_vc; value_and_count vc = thrust::reduce(first, last, init_vc, reduce_vc); return vc.value / vc.count; } template typename thrust::iterator_traits::value_type arithmetic_mean(InputIt first, InputIt last) { typedef typename thrust::iterator_traits::value_type T; return arithmetic_mean(first, last, T()); } template T sample_standard_deviation(InputIt first, InputIt last, T average) { value_and_count init_vc(T(), 0); counting_op > reduce_vc; squared_difference transform(average); value_and_count vc = thrust::transform_reduce(first, last, transform, init_vc, reduce_vc); return std::sqrt(vc.value / T(vc.count - 1)); } /////////////////////////////////////////////////////////////////////////////// // Formulas for propagation of uncertainty from: // // https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulas // // Even though it's Wikipedia, I trust it as I helped write that table. // // XXX Replace with a proper reference. // Compute the propagated uncertainty from the multiplication of two uncertain // values, `A +/- A_unc` and `B +/- B_unc`. Given `f = AB` or `f = A/B`, where // `A != 0` and `B != 0`, the uncertainty in `f` is approximately: // // f_unc = abs(f) * sqrt((A_unc / A) ^ 2 + (B_unc / B) ^ 2) // template __host__ __device__ T uncertainty_multiplicative( T const& f , T const& A, T const& A_unc , T const& B, T const& B_unc ) { return std::abs(f) * std::sqrt((A_unc / A) * (A_unc / A) + (B_unc / B) * (B_unc / B)); } // Compute the propagated uncertainty from addition of two uncertain values, // `A +/- A_unc` and `B +/- B_unc`. Given `f = cA + dB` (where `c` and `d` are // certain constants), the uncertainty in `f` is approximately: // // f_unc = sqrt(c ^ 2 * A_unc ^ 2 + d ^ 2 * B_unc ^ 2) // template __host__ __device__ T uncertainty_additive( T const& c, T const& A_unc , T const& d, T const& B_unc ) { return std::sqrt((c * c * A_unc * A_unc) + (d * d * B_unc * B_unc)); } /////////////////////////////////////////////////////////////////////////////// // Return the significant digit of `x`. The result is the number of digits // after the decimal place to round to (negative numbers indicate rounding // before the decimal place) template int find_significant_digit(T x) { if (x == T(0)) return T(0); return -int(std::floor(std::log10(std::abs(x)))); } // Round `x` to `ndigits` after the decimal place (Python-style). template T round_to_precision(T x, N ndigits) { double m = (x < 0.0) ? -1.0 : 1.0; double pwr = std::pow(T(10.0), ndigits); return (std::floor(x * m * pwr + 0.5) / pwr) * m; } /////////////////////////////////////////////////////////////////////////////// void print_experiment_header() { // {{{ std::cout << "Thrust Version" << "," << "Algorithm" << "," << "Element Type" << "," << "Element Size" << "," << "Elements per Trial" << "," << "Total Input Size" << "," << "STL Trials" << "," << "STL Average Walltime" << "," << "STL Walltime Uncertainty" << "," << "STL Average Throughput" << "," << "STL Throughput Uncertainty" << "," << "Thrust Trials" << "," << "Thrust Average Walltime" << "," << "Thrust Walltime Uncertainty" << "," << "Thrust Average Throughput" << "," << "Thrust Throughput Uncertainty" #if defined(HAVE_TBB) << "," << "TBB Trials" << "," << "TBB Average Walltime" << "," << "TBB Walltime Uncertainty" << "," << "TBB Average Throughput" << "," << "TBB Throughput Uncertainty" #endif << std::endl; std::cout << "" // Thrust Version. << "," << "" // Algorithm. << "," << "" // Element Type. << "," << "bits/element" // Element Size. << "," << "elements" // Elements per Trial. << "," << "MiBs" // Total Input Size. << "," << "trials" // STL Trials. << "," << "secs" // STL Average Walltime. << "," << "secs" // STL Walltime Uncertainty. << "," << "elements/sec" // STL Average Throughput. << "," << "elements/sec" // STL Throughput Uncertainty. << "," << "trials" // Thrust Trials. << "," << "secs" // Thrust Average Walltime. << "," << "secs" // Thrust Walltime Uncertainty. << "," << "elements/sec" // Thrust Average Throughput. << "," << "elements/sec" // Thrust Throughput Uncertainty. #if defined(HAVE_TBB) << "," << "trials" // TBB Trials. << "," << "secs" // TBB Average Walltime. << "," << "secs" // TBB Walltime Uncertainty. << "," << "elements/sec" // TBB Average Throughput. << "," << "elements/sec" // TBB Throughput Uncertainty. #endif << std::endl; } // }}} /////////////////////////////////////////////////////////////////////////////// struct experiment_results { double const average_time; // Arithmetic mean of trial times in seconds. double const stdev_time; // Sample standard deviation of trial times. experiment_results(double average_time_, double stdev_time_) : average_time(average_time_), stdev_time(stdev_time_) {} }; /////////////////////////////////////////////////////////////////////////////// template < template class Test , typename ElementMetaType // Has an embedded typedef `type, // and a static method `name` that // returns a char const*. , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > struct experiment_driver { typedef typename ElementMetaType::type element_type; static char const* const test_name; static char const* const element_type_name; // Element type name as a string. static uint64_t const elements; // # of elements per trial. static uint64_t const element_size; // Size of each element in bits. static double const input_size; // `elements` * `element_size` in MiB. static uint64_t const baseline_trials; // # of baseline trials per experiment. static uint64_t const regular_trials; // # of regular trials per experiment. static void run_experiment() { // {{{ experiment_results stl = std_experiment(); experiment_results thrust = thrust_experiment(); #if defined(HAVE_TBB) experiment_results tbb = tbb_experiment(); #endif double stl_average_walltime = stl.average_time; double thrust_average_walltime = thrust.average_time; #if defined(HAVE_TBB) double tbb_average_walltime = tbb.average_time; #endif double stl_average_throughput = elements / stl.average_time; double thrust_average_throughput = elements / thrust.average_time; #if defined(HAVE_TBB) double tbb_average_throughput = elements / tbb.average_time; #endif double stl_walltime_uncertainty = stl.stdev_time; double thrust_walltime_uncertainty = thrust.stdev_time; #if defined(HAVE_TBB) double tbb_walltime_uncertainty = tbb.stdev_time; #endif double stl_throughput_uncertainty = uncertainty_multiplicative( stl_average_throughput , double(elements), 0.0 , stl_average_walltime, stl_walltime_uncertainty ); double thrust_throughput_uncertainty = uncertainty_multiplicative( thrust_average_throughput , double(elements), 0.0 , thrust_average_walltime, thrust_walltime_uncertainty ); #if defined(HAVE_TBB) double tbb_throughput_uncertainty = uncertainty_multiplicative( tbb_average_throughput , double(elements), 0.0 , tbb_average_walltime, tbb_walltime_uncertainty ); #endif // Round the average walltime and walltime uncertainty to the // significant figure of the walltime uncertainty. int stl_walltime_precision = std::max( find_significant_digit(stl.average_time) , find_significant_digit(stl.stdev_time) ); int thrust_walltime_precision = std::max( find_significant_digit(thrust.average_time) , find_significant_digit(thrust.stdev_time) ); #if defined(HAVE_TBB) int tbb_walltime_precision = std::max( find_significant_digit(tbb.average_time) , find_significant_digit(tbb.stdev_time) ); #endif /* stl_average_walltime = round_to_precision( stl_average_walltime, stl_walltime_precision ); thrust_average_walltime = round_to_precision( thrust_average_walltime, thrust_walltime_precision ); #if defined(HAVE_TBB) tbb_average_walltime = round_to_precision( tbb_average_walltime, tbb_walltime_precision ); #endif stl_walltime_uncertainty = round_to_precision( stl_walltime_uncertainty, stl_walltime_precision ); thrust_walltime_uncertainty = round_to_precision( thrust_walltime_uncertainty, thrust_walltime_precision ); #if defined(HAVE_TBB) tbb_walltime_uncertainty = round_to_precision( tbb_walltime_uncertainty, tbb_walltime_precision ); #endif */ // Round the average throughput and throughput uncertainty to the // significant figure of the throughput uncertainty. int stl_throughput_precision = std::max( find_significant_digit(stl_average_throughput) , find_significant_digit(stl_throughput_uncertainty) ); int thrust_throughput_precision = std::max( find_significant_digit(thrust_average_throughput) , find_significant_digit(thrust_throughput_uncertainty) ); #if defined(HAVE_TBB) int tbb_throughput_precision = std::max( find_significant_digit(tbb_average_throughput) , find_significant_digit(tbb_throughput_uncertainty) ); #endif /* stl_average_throughput = round_to_precision( stl_average_throughput, stl_throughput_precision ); thrust_average_throughput = round_to_precision( thrust_average_throughput, thrust_throughput_precision ); #if defined(HAVE_TBB) tbb_average_throughput = round_to_precision( tbb_average_throughput, tbb_throughput_precision ); #endif stl_throughput_uncertainty = round_to_precision( stl_throughput_uncertainty, stl_throughput_precision ); thrust_throughput_uncertainty = round_to_precision( thrust_throughput_uncertainty, thrust_throughput_precision ); #if defined(HAVE_TBB) tbb_throughput_uncertainty = round_to_precision( tbb_throughput_uncertainty, tbb_throughput_precision ); #endif */ std::cout << THRUST_VERSION // Thrust Version. << "," << test_name // Algorithm. << "," << element_type_name // Element Type. << "," << element_size // Element Size. << "," << elements // Elements per Trial. << "," << input_size // Total Input Size. << "," << baseline_trials // STL Trials. << "," << stl_average_walltime // STL Average Walltime. << "," << stl_walltime_uncertainty // STL Walltime Uncertainty. << "," << stl_average_throughput // STL Average Throughput. << "," << stl_throughput_uncertainty // STL Throughput Uncertainty. << "," << regular_trials // Thrust Trials. << "," << thrust_average_walltime // Thrust Average Walltime. << "," << thrust_walltime_uncertainty // Thrust Walltime Uncertainty. << "," << thrust_average_throughput // Thrust Average Throughput. << "," << thrust_throughput_uncertainty // Thrust Throughput Uncertainty. #if defined(HAVE_TBB) << "," << regular_trials // TBB Trials. << "," << tbb_average_walltime // TBB Average Walltime. << "," << tbb_walltime_uncertainty // TBB Walltime Uncertainty. << "," << tbb_average_throughput // TBB Average Throughput. << "," << tbb_throughput_uncertainty // TBB Throughput Uncertainty. #endif << std::endl; } // }}} private: static experiment_results std_experiment() { return experiment::std_trial>(); } static experiment_results thrust_experiment() { return experiment::thrust_trial>(); } #if defined(HAVE_TBB) static experiment_results tbb_experiment() { return experiment::tbb_trial>(); } #endif template static experiment_results experiment() { // {{{ Trial trial; // Allocate storage and generate random input for the warmup trial. trial.setup(elements); // Warmup trial. trial(); uint64_t const trials = trial.is_baseline() ? baseline_trials : regular_trials; std::vector times; times.reserve(trials); for (uint64_t t = 0; t < trials; ++t) { // Generate random input for next trial. trial.setup(elements); steady_timer e; // Benchmark. e.start(); trial(); e.stop(); times.push_back(e.seconds_elapsed()); } double average_time = arithmetic_mean(times.begin(), times.end()); double stdev_time = sample_standard_deviation(times.begin(), times.end(), average_time); return experiment_results(average_time, stdev_time); } // }}} }; template < template class Test , typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > char const* const experiment_driver< Test, ElementMetaType, Elements, BaselineTrials, RegularTrials >::test_name = Test::test_name(); template < template class Test , typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > char const* const experiment_driver< Test, ElementMetaType, Elements, BaselineTrials, RegularTrials >::element_type_name = ElementMetaType::name(); template < template class Test , typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > uint64_t const experiment_driver< Test, ElementMetaType, Elements, BaselineTrials, RegularTrials >::element_size = CHAR_BIT * sizeof(typename ElementMetaType::type); template < template class Test , typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > uint64_t const experiment_driver< Test, ElementMetaType, Elements, BaselineTrials, RegularTrials >::elements = Elements; template < template class Test , typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > double const experiment_driver< Test, ElementMetaType, Elements, BaselineTrials, RegularTrials >::input_size = double( Elements /* [elements] */ * sizeof(typename ElementMetaType::type) /* [bytes/element] */ ) / double(1024 * 1024 /* [bytes/MiB] */); template < template class Test , typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > uint64_t const experiment_driver< Test, ElementMetaType, Elements, BaselineTrials, RegularTrials >::baseline_trials = BaselineTrials; template < template class Test , typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > uint64_t const experiment_driver< Test, ElementMetaType, Elements, BaselineTrials, RegularTrials >::regular_trials = RegularTrials; /////////////////////////////////////////////////////////////////////////////// // Never create variables, pointers or references of any of the `*_trial_base` // classes. They are purely mixin base classes and do not have vtables and // virtual destructors. Using them for polymorphism instead of composition will // probably cause slicing. struct baseline_trial {}; struct regular_trial {}; template struct trial_base; template <> struct trial_base { static bool is_baseline() { return true; } }; template <> struct trial_base { static bool is_baseline() { return false; } }; template struct inplace_trial_base : trial_base { Container input; void setup(uint64_t elements) { input.resize(elements); randomize(input); } }; template struct copy_trial_base : trial_base { Container input; Container output; void setup(uint64_t elements) { input.resize(elements); output.resize(elements); randomize(input); } }; /////////////////////////////////////////////////////////////////////////////// template struct reduce_tester { static char const* test_name() { return "reduce"; } struct std_trial : inplace_trial_base, baseline_trial> { void operator()() { if (std::accumulate(this->input.begin(), this->input.end(), T(0)) == 0) // Prevent optimizer from removing body. std::cout << "xyz"; } }; struct thrust_trial : inplace_trial_base > { void operator()() { thrust::reduce(this->input.begin(), this->input.end()); } }; #if defined(HAVE_TBB) struct tbb_trial : inplace_trial_base > { void operator()() { tbb_reduce(this->input); } }; #endif }; template struct sort_tester { static char const* test_name() { return "sort"; } struct std_trial : inplace_trial_base, baseline_trial> { void operator()() { std::sort(this->input.begin(), this->input.end()); } }; struct thrust_trial : inplace_trial_base > { void operator()() { thrust::sort(this->input.begin(), this->input.end()); #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA cudaError_t err = cudaDeviceSynchronize(); if (err != cudaSuccess) throw thrust::error_code(err, thrust::cuda_category()); #endif } }; #if defined(HAVE_TBB) struct tbb_trial : inplace_trial_base > { void operator()() { tbb_sort(this->input); } } #endif }; template struct transform_inplace_tester { static char const* test_name() { return "transform_inplace"; } struct std_trial : inplace_trial_base, baseline_trial> { void operator()() { std::transform( this->input.begin(), this->input.end(), this->input.begin() , thrust::negate() ); } }; struct thrust_trial : inplace_trial_base > { void operator()() { thrust::transform( this->input.begin(), this->input.end(), this->input.begin() , thrust::negate() ); #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA cudaError_t err = cudaDeviceSynchronize(); if (err != cudaSuccess) throw thrust::error_code(err, thrust::cuda_category()); #endif } }; #if defined(HAVE_TBB) struct tbb_trial : inplace_trial_base > { void operator()() { tbb_transform(this->input); } }; #endif }; template struct inclusive_scan_inplace_tester { static char const* test_name() { return "inclusive_scan_inplace"; } struct std_trial : inplace_trial_base, baseline_trial> { void operator()() { std::partial_sum( this->input.begin(), this->input.end(), this->input.begin() ); } }; struct thrust_trial : inplace_trial_base > { void operator()() { thrust::inclusive_scan( this->input.begin(), this->input.end(), this->input.begin() ); #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA cudaError_t err = cudaDeviceSynchronize(); if (err != cudaSuccess) throw thrust::error_code(err, thrust::cuda_category()); #endif } }; #if defined(HAVE_TBB) struct tbb_trial : inplace_trial_base > { void operator()() { tbb_scan(this->input); } }; #endif }; template struct copy_tester { static char const* test_name() { return "copy"; } struct std_trial : copy_trial_base > { void operator()() { std::copy(this->input.begin(), this->input.end(), this->output.begin()); } }; struct thrust_trial : copy_trial_base > { void operator()() { thrust::copy(this->input.begin(), this->input.end(), this->input.begin()); #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA cudaError_t err = cudaDeviceSynchronize(); if (err != cudaSuccess) throw thrust::error_code(err, thrust::cuda_category()); #endif } }; #if defined(HAVE_TBB) struct tbb_trial : copy_trial_base > { void operator()() { tbb_copy(this->input, this->output); } }; #endif }; /////////////////////////////////////////////////////////////////////////////// template < typename ElementMetaType , uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > void run_core_primitives_experiments_for_type() { experiment_driver< reduce_tester , ElementMetaType , Elements / sizeof(typename ElementMetaType::type) , BaselineTrials , RegularTrials >::run_experiment(); experiment_driver< transform_inplace_tester , ElementMetaType , Elements / sizeof(typename ElementMetaType::type) , BaselineTrials , RegularTrials >::run_experiment(); experiment_driver< inclusive_scan_inplace_tester , ElementMetaType , Elements / sizeof(typename ElementMetaType::type) , BaselineTrials , RegularTrials >::run_experiment(); experiment_driver< sort_tester , ElementMetaType // , Elements / sizeof(typename ElementMetaType::type) , (Elements >> 6) // Sorting is more sensitive to element count than // memory footprint. , BaselineTrials , RegularTrials >::run_experiment(); experiment_driver< copy_tester , ElementMetaType , Elements / sizeof(typename ElementMetaType::type) , BaselineTrials , RegularTrials >::run_experiment(); } /////////////////////////////////////////////////////////////////////////////// #define DEFINE_ELEMENT_META_TYPE(T) \ struct PP_CAT(T, _meta) \ { \ typedef T type; \ \ static char const* name() { return PP_STRINGIZE(T); } \ }; \ /**/ DEFINE_ELEMENT_META_TYPE(char); DEFINE_ELEMENT_META_TYPE(int); DEFINE_ELEMENT_META_TYPE(int8_t); DEFINE_ELEMENT_META_TYPE(int16_t); DEFINE_ELEMENT_META_TYPE(int32_t); DEFINE_ELEMENT_META_TYPE(int64_t); DEFINE_ELEMENT_META_TYPE(float); DEFINE_ELEMENT_META_TYPE(double); /////////////////////////////////////////////////////////////////////////////// template < uint64_t Elements , uint64_t BaselineTrials , uint64_t RegularTrials > void run_core_primitives_experiments() { run_core_primitives_experiments_for_type< char_meta, Elements, BaselineTrials, RegularTrials >(); run_core_primitives_experiments_for_type< int_meta, Elements, BaselineTrials, RegularTrials >(); run_core_primitives_experiments_for_type< int8_t_meta, Elements, BaselineTrials, RegularTrials >(); run_core_primitives_experiments_for_type< int16_t_meta, Elements, BaselineTrials, RegularTrials >(); run_core_primitives_experiments_for_type< int32_t_meta, Elements, BaselineTrials, RegularTrials >(); run_core_primitives_experiments_for_type< int64_t_meta, Elements, BaselineTrials, RegularTrials >(); run_core_primitives_experiments_for_type< float_meta, Elements, BaselineTrials, RegularTrials >(); run_core_primitives_experiments_for_type< double_meta, Elements, BaselineTrials, RegularTrials >(); } /////////////////////////////////////////////////////////////////////////////// // XXX Use `std::string_view` when possible. std::vector split(std::string const& str, std::string const& delim) { std::vector tokens; std::string::size_type prev = 0, pos = 0; do { pos = str.find(delim, prev); if (pos == std::string::npos) pos = str.length(); std::string token = str.substr(prev, pos - prev); if (!token.empty()) tokens.push_back(token); prev = pos + delim.length(); } while (pos < str.length() && prev < str.length()); return tokens; } /////////////////////////////////////////////////////////////////////////////// struct command_line_option_error : std::exception { virtual ~command_line_option_error() NOEXCEPT {} virtual const char* what() const NOEXCEPT = 0; }; struct only_one_option_allowed : command_line_option_error { // Construct a new `only_one_option_allowed` exception. `key` is the // option name and `[first, last)` is a sequence of // `std::pair`s (the values). template only_one_option_allowed(std::string const& key, InputIt first, InputIt last) : message() { message = "Only one `--"; message += key; message += "` option is allowed, but multiple were received: "; for (; first != last; ++first) { message += "`"; message += (*first).second; message += "` "; } // Remove the trailing space added by the last iteration of the above loop. message.erase(message.size() - 1, 1); message += "."; } virtual ~only_one_option_allowed() NOEXCEPT {} virtual const char* what() const NOEXCEPT { return message.c_str(); } private: std::string message; }; struct required_option_missing : command_line_option_error { // Construct a new `requirement_option_missing` exception. `key` is the // option name. required_option_missing(std::string const& key) : message() { message = "`--"; message += key; message += "` option is required."; } virtual ~required_option_missing() NOEXCEPT {} virtual const char* what() const NOEXCEPT { return message.c_str(); } private: std::string message; }; struct command_line_processor { typedef std::vector positional_options_type; typedef std::multimap keyword_options_type; typedef std::pair< keyword_options_type::const_iterator , keyword_options_type::const_iterator > keyword_option_values; command_line_processor(int argc, char** argv) : pos_args(), kw_args() { // {{{ for (int i = 1; i < argc; ++i) { std::string arg(argv[i]); // Look for --key or --key=value options. if (arg.substr(0, 2) == "--") { std::string::size_type n = arg.find('=', 2); keyword_options_type::value_type key_value; if (n == std::string::npos) // --key kw_args.insert(keyword_options_type::value_type( arg.substr(2), "" )); else // --key=value kw_args.insert(keyword_options_type::value_type( arg.substr(2, n - 2), arg.substr(n + 1) )); kw_args.insert(key_value); } else // Assume it's positional. pos_args.push_back(arg); } } // }}} // Return the value for option `key`. // // Throws: // * `only_one_option_allowed` if there is more than one value for `key`. // * `required_option_missing` if there is no value for `key`. std::string operator()(std::string const& key) const { keyword_option_values v = kw_args.equal_range(key); keyword_options_type::difference_type d = std::distance(v.first, v.second); if (1 < d) // Too many options. throw only_one_option_allowed(key, v.first, v.second); else if (0 == d) // No option. throw required_option_missing(key); return (*v.first).second; } // Return the value for option `key`, or `dflt` if `key` has no value. // // Throws: `only_one_option_allowed` if there is more than one value for `key`. std::string operator()(std::string const& key, std::string const& dflt) const { keyword_option_values v = kw_args.equal_range(key); keyword_options_type::difference_type d = std::distance(v.first, v.second); if (1 < d) // Too many options. throw only_one_option_allowed(key, v.first, v.second); if (0 == d) // No option. return dflt; else // 1 option. return (*v.first).second; } // Returns `true` if the option `key` was specified at least once. bool has(std::string const& key) const { return kw_args.count(key) > 0; } private: positional_options_type pos_args; keyword_options_type kw_args; }; /////////////////////////////////////////////////////////////////////////////// int main(int argc, char** argv) { command_line_processor clp(argc, argv); #if defined(HAVE_TBB) tbb::task_scheduler_init init; test_tbb(); #endif #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA // Set the CUDA device to use for the benchmark - `0` by default. int device = std::atoi(clp("device", "0").c_str()); // `std::atoi` returns 0 if the conversion fails. cudaSetDevice(device); #endif if (!clp.has("no-header")) print_experiment_header(); /* Elements | Trials */ /* | Baseline | Regular */ //run_core_primitives_experiments< 1LLU << 21LLU , 4 , 16 >(); //run_core_primitives_experiments< 1LLU << 22LLU , 4 , 16 >(); //run_core_primitives_experiments< 1LLU << 23LLU , 4 , 16 >(); //run_core_primitives_experiments< 1LLU << 24LLU , 4 , 16 >(); //run_core_primitives_experiments< 1LLU << 25LLU , 4 , 16 >(); run_core_primitives_experiments< 1LLU << 26LLU , 4 , 16 >(); run_core_primitives_experiments< 1LLU << 27LLU , 4 , 16 >(); //run_core_primitives_experiments< 1LLU << 28LLU , 4 , 16 >(); //run_core_primitives_experiments< 1LLU << 29LLU , 4 , 16 >(); return 0; } // TODO: Add different input sizes and half precision thrust-1.9.5/internal/benchmark/bench.mk000066400000000000000000000007101344621116200202330ustar00rootroot00000000000000# XXX Use the common Thrust Makefiles instead of this. EXECUTABLE := bench BUILD_SRC := $(ROOTDIR)/thrust/internal/benchmark/bench.cu ifeq ($(OS),Linux) LIBRARIES += m endif # XXX Why is this needed? ifeq ($(OS),Linux) ifeq ($(ABITYPE), androideabi) override ALL_SASS_ARCHITECTURES := 32 endif endif ARCH_NEG_FILTER += 20 21 include $(ROOTDIR)/thrust/internal/build/common_detect.mk include $(ROOTDIR)/thrust/internal/build/common_build.mk thrust-1.9.5/internal/benchmark/combine_benchmark_results.py000077500000000000000000000654431344621116200244250ustar00rootroot00000000000000#! /usr/bin/env python # -*- coding: utf-8 -*- ############################################################################### # Copyright (c) 2012-7 Bryce Adelstein Lelbach aka wash # # Distributed under the Boost Software License, Version 1.0. (See accompanying # file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ############################################################################### ############################################################################### # Copyright (c) 2018 NVIDIA Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ############################################################################### # XXX Put code shared with `compare_benchmark_results.py` in a common place. # XXX Relative uncertainty. from sys import exit, stdout from os.path import splitext from itertools import imap # Lazy map. from math import sqrt, log10, floor from collections import deque from argparse import ArgumentParser as argument_parser from csv import DictReader as csv_dict_reader from csv import DictWriter as csv_dict_writer from re import compile as regex_compile ############################################################################### def unpack_tuple(f): """Return a unary function that calls `f` with its argument unpacked.""" return lambda args: f(*iter(args)) def strip_dict(d): """Strip leading and trailing whitespace from all keys and values in `d`.""" d.update({key: value.strip() for (key, value) in d.items()}) def merge_dicts(d0, d1): """Create a new `dict` that is the union of `dict`s `d0` and `d1`.""" d = d0.copy() d.update(d1) return d def strip_list(l): """Strip leading and trailing whitespace from all values in `l`.""" for i, value in enumerate(l): l[i] = value.strip() ############################################################################### def int_or_float(x): """Convert `x` to either `int` or `float`, preferring `int`. Raises: ValueError : If `x` is not convertible to either `int` or `float` """ try: return int(x) except ValueError: return float(x) def try_int_or_float(x): """Try to convert `x` to either `int` or `float`, preferring `int`. `x` is returned unmodified if conversion fails. """ try: return int_or_float(x) except ValueError: return x ############################################################################### def find_significant_digit(x): """Return the significant digit of the number x. The result is the number of digits after the decimal place to round to (negative numbers indicate rounding before the decimal place).""" if x == 0: return 0 return -int(floor(log10(abs(x)))) def round_with_int_conversion(x, ndigits = None): """Rounds `x` to `ndigits` after the the decimal place. If `ndigits` is less than 1, convert the result to `int`. If `ndigits` is `None`, the significant digit of `x` is used.""" if ndigits is None: ndigits = find_significant_digit(x) x_rounded = round(x, ndigits) return int(x_rounded) if ndigits < 1 else x_rounded ############################################################################### class measured_variable(object): """A meta-variable representing measured data. It is composed of three raw variables plus units meta-data. Attributes: quantity (`str`) : Name of the quantity variable of this object. uncertainty (`str`) : Name of the uncertainty variable of this object. sample_size (`str`) : Name of the sample size variable of this object. units (units class or `None`) : The units the value is measured in. """ def __init__(self, quantity, uncertainty, sample_size, units = None): self.quantity = quantity self.uncertainty = uncertainty self.sample_size = sample_size self.units = units def as_tuple(self): return (self.quantity, self.uncertainty, self.sample_size, self.units) def __iter__(self): return iter(self.as_tuple()) def __str__(self): return str(self.as_tuple()) def __repr__(self): return str(self) class measured_value(object): """An object that represents a value determined by multiple measurements. Attributes: quantity (scalar) : The quantity of the value, e.g. the arithmetic mean. uncertainty (scalar) : The measurement uncertainty, e.g. the sample standard deviation. sample_size (`int`) : The number of observations contributing to the value. units (units class or `None`) : The units the value is measured in. """ def __init__(self, quantity, uncertainty, sample_size = 1, units = None): self.quantity = quantity self.uncertainty = uncertainty self.sample_size = sample_size self.units = units def as_tuple(self): return (self.quantity, self.uncertainty, self.sample_size, self.units) def __iter__(self): return iter(self.as_tuple()) def __str__(self): return str(self.as_tuple()) def __repr__(self): return str(self) ############################################################################### def arithmetic_mean(X): """Computes the arithmetic mean of the sequence `X`. Let: * `n = len(X)`. * `u` denote the arithmetic mean of `X`. .. math:: u = \frac{\sum_{i = 0}^{n - 1} X_i}{n} """ return sum(X) / len(X) def sample_variance(X, u = None): """Computes the sample variance of the sequence `X`. Let: * `n = len(X)`. * `u` denote the arithmetic mean of `X`. * `s` denote the sample standard deviation of `X`. .. math:: v = \frac{\sum_{i = 0}^{n - 1} (X_i - u)^2}{n - 1} Args: X (`Iterable`) : The sequence of values. u (number) : The arithmetic mean of `X`. """ if u is None: u = arithmetic_mean(X) return sum(imap(lambda X_i: (X_i - u) ** 2, X)) / (len(X) - 1) def sample_standard_deviation(X, u = None, v = None): """Computes the sample standard deviation of the sequence `X`. Let: * `n = len(X)`. * `u` denote the arithmetic mean of `X`. * `v` denote the sample variance of `X`. * `s` denote the sample standard deviation of `X`. .. math:: s &= \sqrt{v} &= \sqrt{\frac{\sum_{i = 0}^{n - 1} (X_i - u)^2}{n - 1}} Args: X (`Iterable`) : The sequence of values. u (number) : The arithmetic mean of `X`. v (number) : The sample variance of `X`. """ if u is None: u = arithmetic_mean(X) if v is None: v = sample_variance(X, u) return sqrt(v) def combine_sample_size(As): """Computes the combined sample variance of a group of `measured_value`s. Let: * `g = len(As)`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. .. math:: n = \sum{i = 0}^{g - 1} n_i """ return sum(imap(unpack_tuple(lambda u_i, s_i, n_i, t_i: n_i), As)) def combine_arithmetic_mean(As, n = None): """Computes the combined arithmetic mean of a group of `measured_value`s. Let: * `g = len(As)`. * `u_i = As[i].quantity`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. * `u` denote the arithmetic mean of the quantities of `As`. .. math:: u = \frac{\sum{i = 0}^{g - 1} n_i u_i}{n} """ if n is None: n = combine_sample_size(As) return sum(imap(unpack_tuple(lambda u_i, s_i, n_i, t_i: n_i * u_i), As)) / n def combine_sample_variance(As, n = None, u = None): """Computes the combined sample variance of a group of `measured_value`s. Let: * `g = len(As)`. * `u_i = As[i].quantity`. * `s_i = As[i].uncertainty`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. * `u` denote the arithmetic mean of the quantities of `As`. * `v` denote the sample variance of `X`. .. math:: v = \frac{(\sum_{i = 0}^{g - 1} n_i (u_i - u)^2 + s_i^2 (n_i - 1))}{n - 1} Args: As (`Iterable` of `measured_value`s) : The sequence of values. n (number) : The combined sample sizes of `As`. u (number) : The combined arithmetic mean of `As`. """ if n <= 1: return 0 if n is None: n = combine_sample_size(As) if u is None: u = combine_arithmetic_mean(As, n) return sum(imap(unpack_tuple( lambda u_i, s_i, n_i, t_i: n_i * (u_i - u) ** 2 + (s_i ** 2) * (n_i - 1) ), As)) / (n - 1) def combine_sample_standard_deviation(As, n = None, u = None, v = None): """Computes the combined sample standard deviation of a group of `measured_value`s. Let: * `g = len(As)`. * `u_i = As[i].quantity`. * `s_i = As[i].uncertainty`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. * `u` denote the arithmetic mean of the quantities of `As`. * `v` denote the sample variance of `X`. * `s` denote the sample standard deviation of `X`. .. math:: s &= \sqrt{v} &= \sqrt{\frac{(\sum_{i = 0}^{g - 1} n_i (u_i - u)^2 + s_i^2 (n_i - 1))}{n - 1}} Args: As (`Iterable` of `measured_value`s) : The sequence of values. n (number) : The combined sample sizes of `As`. u (number) : The combined arithmetic mean of `As`. v (number) : The combined sample variance of `As`. """ if n <= 1: return 0 if n is None: n = combine_sample_size(As) if u is None: u = combine_arithmetic_mean(As, n) if v is None: v = combine_sample_variance(As, n, u) return sqrt(v) ############################################################################### def process_program_arguments(): ap = argument_parser( description = ( "Aggregates the results of multiple runs of benchmark results stored in " "CSV format." ) ) ap.add_argument( "-d", "--dependent-variable", help = ("Treat the specified three variables as a dependent variable. The " "1st variable is the measured quantity, the 2nd is the uncertainty " "of the measurement and the 3rd is the sample size. The defaults " "are the dependent variables of Thrust's benchmark suite. May be " "specified multiple times."), action = "append", type = str, dest = "dependent_variables", metavar = "QUANTITY,UNCERTAINTY,SAMPLES" ) ap.add_argument( "-p", "--preserve-whitespace", help = ("Don't trim leading and trailing whitespace from each CSV cell."), action = "store_true", default = False ) ap.add_argument( "-o", "--output-file", help = ("The file that results are written to. If `-`, results are " "written to stdout."), action = "store", type = str, default = "-", metavar = "OUTPUT" ) ap.add_argument( "input_files", help = ("Input CSV files. The first two rows should be a header. The 1st " "header row specifies the name of each variable, and the 2nd " "header row specifies the units for that variable."), type = str, nargs = "+", metavar = "INPUTS" ) return ap.parse_args() ############################################################################### def filter_comments(f, s = "#"): """Return an iterator to the file `f` which filters out all lines beginning with `s`.""" return filter(lambda line: not line.startswith(s), f) ############################################################################### class io_manager(object): """Manages I/O operations and represents the input data as an `Iterable` sequence of `dict`s. It is `Iterable` and an `Iterator`. It can be used with `with`. Attributes: preserve_whitespace (`bool`) : If `False`, leading and trailing whitespace is stripped from each CSV cell. writer (`csv_dict_writer`) : CSV writer object that the output is written to. output_file (`file` or `stdout`) : The output `file` object. readers (`list` of `csv_dict_reader`s) : List of input files as CSV reader objects. input_files (list of `file`s) : List of input `file` objects. variable_names (`list` of `str`s) : Names of the variables, in order. variable_units (`list` of `str`s) : Units of the variables, in order. """ def __init__(self, input_files, output_file, preserve_whitespace = True): """Read input files and open the output file and construct a new `io_manager` object. If `preserve_whitespace` is `False`, leading and trailing whitespace is stripped from each CSV cell. Raises AssertionError : If `len(input_files) <= 0` or `type(preserve_whitespace) != bool`. """ assert len(input_files) > 0, "No input files provided." assert type(preserve_whitespace) == bool self.preserve_whitespace = preserve_whitespace self.readers = deque() self.variable_names = None self.variable_units = None self.input_files = deque() for input_file in input_files: input_file_object = open(input_file) reader = csv_dict_reader(filter_comments(input_file_object)) if not self.preserve_whitespace: strip_list(reader.fieldnames) if self.variable_names is None: self.variable_names = reader.fieldnames else: # Make sure all inputs have the same schema. assert self.variable_names == reader.fieldnames, \ "Input file (`" + input_file + "`) variable schema `" + \ str(reader.fieldnames) + "` does not match the variable schema `" + \ str(self.variable_names) + "`." # Consume the next row, which should be the second line of the header. variable_units = reader.next() if not self.preserve_whitespace: strip_dict(variable_units) if self.variable_units is None: self.variable_units = variable_units else: # Make sure all inputs have the same units schema. assert self.variable_units == variable_units, \ "Input file (`" + input_file + "`) units schema `" + \ str(variable_units) + "` does not match the units schema `" + \ str(self.variable_units) + "`." self.readers.append(reader) self.input_files.append(input_file_object) if output_file == "-": # Output to stdout. self.output_file = stdout else: # Output to user-specified file. self.output_file = open(output_file, "w") self.writer = csv_dict_writer( self.output_file, fieldnames = self.variable_names ) def __enter__(self): """Called upon entering a `with` statement.""" return self def __exit__(self, *args): """Called upon exiting a `with` statement.""" if self.output_file is stdout: self.output_file = None elif self.output_file is not None: self.output_file.__exit__(*args) for input_file in self.input_files: input_file.__exit__(*args) ############################################################################# # Input Stream. def __iter__(self): """Return an iterator to the input sequence. This is a requirement for the `Iterable` protocol. """ return self def next(self): """Consume and return the next record (a `dict` representing a CSV row) in the input. This is a requirement for the `Iterator` protocol. Raises: StopIteration : If there is no more input. """ if len(self.readers) == 0: raise StopIteration() try: row = self.readers[0].next() if not self.preserve_whitespace: strip_dict(row) return row except StopIteration: # The current reader is empty, so pop it, pop it's input file, close the # input file, and then call ourselves again. self.readers.popleft() self.input_files.popleft().close() return self.next() ############################################################################# # Output. def write_header(self): """Write the header for the output CSV file.""" # Write the first line of the header. self.writer.writeheader() # Write the second line of the header. self.writer.writerow(self.variable_units) def write(self, d): """Write a record (a `dict`) to the output CSV file.""" self.writer.writerow(d) ############################################################################### class dependent_variable_parser(object): """Parses a `--dependent-variable=AVG,STDEV,TRIALS` command line argument.""" ############################################################################# # Grammar # Parse a variable_name. variable_name_rule = r'[^,]+' # Parse a variable classification. dependent_variable_rule = r'(' + variable_name_rule + r')' \ + r',' \ + r'(' + variable_name_rule + r')' \ + r',' \ + r'(' + variable_name_rule + r')' engine = regex_compile(dependent_variable_rule) ############################################################################# def __call__(self, s): """Parses the string `s` with the form "AVG,STDEV,TRIALS". Returns: A `measured_variable`. Raises: AssertionError : If parsing fails. """ match = self.engine.match(s) assert match is not None, \ "Dependent variable (-d) `" +s+ "` is invalid, the format is " + \ "`AVG,STDEV,TRIALS`." return measured_variable(match.group(1), match.group(2), match.group(3)) ############################################################################### class record_aggregator(object): """Consumes and combines records and represents the result as an `Iterable` sequence of `dict`s. It is `Iterable` and an `Iterator`. Attributes: dependent_variables (`list` of `measured_variable`s) : A list of dependent variables provided on the command line. dataset (`dict`) : A mapping of distinguishing (e.g. control + independent) values (`tuple`s of variable-quantity pairs) to `list`s of dependent values (`dict`s from variables to lists of cells). in_order_dataset_keys : A list of unique dataset keys (e.g. distinguishing variables) in order of appearance. """ parse_dependent_variable = dependent_variable_parser() def __init__(self, raw_dependent_variables): """Parse dependent variables and construct a new `record_aggregator` object. Raises: AssertionError : If parsing of dependent variables fails. """ self.dependent_variables = [] if raw_dependent_variables is not None: for variable in raw_dependent_variables: self.dependent_variables.append(self.parse_dependent_variable(variable)) self.dataset = {} self.in_order_dataset_keys = deque() ############################################################################# # Insertion. def append(self, record): """Add `record` to the dataset. Raises: ValueError : If any `str`-to-numeric conversions fail. """ # The distinguishing variables are the control and independent variables. # They form the key for each record in the dataset. Records with the same # distinguishing variables are treated as observations of the same data # point. dependent_values = {} # To allow the same sample size variable to be used for multiple dependent # variables, we don't pop sample size variables until we're done processing # all variables. sample_size_variables = [] # Separate the dependent values from the distinguishing variables and # perform `str`-to-numeric conversions. for variable in self.dependent_variables: quantity, uncertainty, sample_size, units = variable.as_tuple() dependent_values[quantity] = [int_or_float(record.pop(quantity))] dependent_values[uncertainty] = [int_or_float(record.pop(uncertainty))] dependent_values[sample_size] = [int(record[sample_size])] sample_size_variables.append(sample_size) # Pop sample size variables. for sample_size_variable in sample_size_variables: # Allowed to fail, as we may have duplicates. record.pop(sample_size_variable, None) # `dict`s aren't hashable, so create a tuple of key-value pairs. distinguishing_values = tuple(record.items()) if distinguishing_values in self.dataset: # These distinguishing values already exist, so get the `dict` they're # mapped to, look up each key in `dependent_values` in the `dict`, and # add the corresponding quantity in `dependent_values` to the list in the # the `dict`. for variable, columns in dependent_values.iteritems(): self.dataset[distinguishing_values][variable] += columns else: # These distinguishing values aren't in the dataset, so add them and # record them in `in_order_dataset_keys`. self.dataset[distinguishing_values] = dependent_values self.in_order_dataset_keys.append(distinguishing_values) ############################################################################# # Postprocessing. def combine_dependent_values(self, dependent_values): """Takes a mapping of dependent variables to lists of cells and returns a new mapping with the cells combined. Raises: AssertionError : If class invariants were violated. """ combined_dependent_values = dependent_values.copy() for variable in self.dependent_variables: quantity, uncertainty, sample_size, units = variable.as_tuple() quantities = dependent_values[quantity] uncertainties = dependent_values[uncertainty] sample_sizes = dependent_values[sample_size] if type(sample_size) is list: # Sample size hasn't been combined yet. assert len(quantities) == len(uncertainties) \ and len(uncertainties) == len(sample_sizes), \ "Length of quantities list `(" + str(len(quantities)) + ")`, " + \ "length of uncertainties list `(" + str(len(uncertainties)) + \ "),` and length of sample sizes list `(" + str(len(sample_sizes)) + \ ")` are not the same." else: # Another dependent variable that uses our sample size has combined it # already. assert len(quantities) == len(uncertainties), \ "Length of quantities list `(" + str(len(quantities)) + ")` and " + \ "length of uncertainties list `(" + str(len(uncertainties)) + \ ")` are not the same." # Convert the three separate `list`s into one list of `measured_value`s. measured_values = [] for i in range(len(quantities)): mv = measured_value( quantities[i], uncertainties[i], sample_sizes[i], units ) measured_values.append(mv) # Combine the `measured_value`s. combined_sample_size = combine_sample_size( measured_values ) combined_arithmetic_mean = combine_arithmetic_mean( measured_values, combined_sample_size ) combined_sample_standard_deviation = combine_sample_standard_deviation( measured_values, combined_sample_size, combined_arithmetic_mean ) # Round the quantity and uncertainty to the significant digit of # uncertainty and insert the combined values into the results. sigdig = find_significant_digit(combined_sample_standard_deviation) # combined_arithmetic_mean = round_with_int_conversion( # combined_arithmetic_mean, sigdig # ) # combined_sample_standard_deviation = round_with_int_conversion( # combined_sample_standard_deviation, sigdig # ) combined_dependent_values[quantity] = combined_arithmetic_mean combined_dependent_values[uncertainty] = combined_sample_standard_deviation combined_dependent_values[sample_size] = combined_sample_size return combined_dependent_values ############################################################################# # Output Stream. def __iter__(self): """Return an iterator to the output sequence of separated distinguishing variables and dependent variables (a tuple of two `dict`s). This is a requirement for the `Iterable` protocol. """ return self def records(self): """Return an iterator to the output sequence of CSV rows (`dict`s of variables to values). """ return imap(unpack_tuple(lambda dist, dep: merge_dicts(dist, dep)), self) def next(self): """Produce the components of the next output record - a tuple of two `dict`s. The first `dict` is a mapping of distinguishing variables to distinguishing values, the second `dict` is a mapping of dependent variables to combined dependent values. Combining the two dicts forms a CSV row suitable for output. This is a requirement for the `Iterator` protocol. Raises: StopIteration : If there is no more output. AssertionError : If class invariants were violated. """ assert len(self.dataset.keys()) == len(self.in_order_dataset_keys), \ "Number of dataset keys (`" + str(len(self.dataset.keys())) + \ "`) is not equal to the number of keys in the ordering list (`" + \ str(len(self.in_order_dataset_keys)) + "`)." if len(self.in_order_dataset_keys) == 0: raise StopIteration() # Get the next set of distinguishing values and convert them to a `dict`. raw_distinguishing_values = self.in_order_dataset_keys.popleft() distinguishing_values = dict(raw_distinguishing_values) dependent_values = self.dataset.pop(raw_distinguishing_values) combined_dependent_values = self.combine_dependent_values(dependent_values) return (distinguishing_values, combined_dependent_values) ############################################################################### args = process_program_arguments() if args.dependent_variables is None: args.dependent_variables = [ "STL Average Walltime,STL Walltime Uncertainty,STL Trials", "STL Average Throughput,STL Throughput Uncertainty,STL Trials", "Thrust Average Walltime,Thrust Walltime Uncertainty,Thrust Trials", "Thrust Average Throughput,Thrust Throughput Uncertainty,Thrust Trials" ] # Read input files and open the output file. with io_manager(args.input_files, args.output_file, args.preserve_whitespace) as iom: # Parse dependent variable options. ra = record_aggregator(args.dependent_variables) # Add all input data to the `record_aggregator`. for record in iom: ra.append(record) iom.write_header() # Write combined results out. for record in ra.records(): iom.write(record) thrust-1.9.5/internal/benchmark/compare_benchmark_results.py000077500000000000000000001311231344621116200244240ustar00rootroot00000000000000#! /usr/bin/env python # -*- coding: utf-8 -*- ############################################################################### # Copyright (c) 2012-7 Bryce Adelstein Lelbach aka wash # # Distributed under the Boost Software License, Version 1.0. (See accompanying # file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ############################################################################### ############################################################################### # Copyright (c) 2018 NVIDIA Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ############################################################################### # XXX Put code shared with `combine_benchmark_results.py` in a common place. # XXX Relative uncertainty. # XXX Create uncertain value class which is quantity + uncertainty. from sys import exit, stdout from os.path import splitext from itertools import imap # Lazy map. from math import sqrt, log10, floor from collections import deque from argparse import ArgumentParser as argument_parser from argparse import Action as argument_action from csv import DictReader as csv_dict_reader from csv import DictWriter as csv_dict_writer from re import compile as regex_compile ############################################################################### def unpack_tuple(f): """Return a unary function that calls `f` with its argument unpacked.""" return lambda args: f(*iter(args)) def strip_dict(d): """Strip leading and trailing whitespace from all keys and values in `d`. Returns: The modified dict `d`. """ d.update({key: value.strip() for (key, value) in d.items()}) return d def merge_dicts(d0, d1): """Create a new `dict` that is the union of `dict`s `d0` and `d1`.""" d = d0.copy() d.update(d1) return d def change_key_in_dict(d, old_key, new_key): """Change the key of the entry in `d` with key `old_key` to `new_key`. If there is an existing entry Returns: The modified dict `d`. Raises: KeyError : If `old_key` is not in `d`. """ d[new_key] = d.pop(old_key) return d def key_from_dict(d): """Create a hashable key from a `dict` by converting the `dict` to a tuple.""" return tuple(sorted(d.items())) def strip_list(l): """Strip leading and trailing whitespace from all values in `l`.""" for i, value in enumerate(l): l[i] = value.strip() return l def remove_from_list(l, item): """Remove the first occurence of `item` from list `l` and return a tuple of the index that was removed and the element that was removed. Raises: ValueError : If `item` is not in `l`. """ idx = l.index(item) item = l.pop(idx) return (idx, item) ############################################################################### def int_or_float(x): """Convert `x` to either `int` or `float`, preferring `int`. Raises: ValueError : If `x` is not convertible to either `int` or `float` """ try: return int(x) except ValueError: return float(x) def try_int_or_float(x): """Try to convert `x` to either `int` or `float`, preferring `int`. `x` is returned unmodified if conversion fails. """ try: return int_or_float(x) except ValueError: return x ############################################################################### def ranges_overlap(x1, x2, y1, y2): """Returns true if the ranges `[x1, x2]` and `[y1, y2]` overlap, where `x1 <= x2` and `y1 <= y2`. Raises: AssertionError : If `x1 > x2` or `y1 > y2`. """ assert x1 <= x2 assert y1 <= y2 return x1 <= y2 and y1 <= x2 def ranges_overlap_uncertainty(x, x_unc, y, y_unc): """Returns true if the ranges `[x - x_unc, x + x_unc]` and `[y - y_unc, y + y_unc]` overlap, where `x_unc >= 0` and `y_unc >= 0`. Raises: AssertionError : If `x_unc < 0` or `y_unc < 0`. """ assert x_unc >= 0 assert y_unc >= 0 return ranges_overlap(x - x_unc, x + x_unc, y - y_unc, y + y_unc) ############################################################################### # Formulas for propagation of uncertainty from: # # https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulas # # Even though it's Wikipedia, I trust it as I helped write that table. # # XXX Replace with a proper reference. def uncertainty_multiplicative(f, A, A_abs_unc, B, B_abs_unc): """Compute the propagated uncertainty from the multiplication of two uncertain values, `A +/- A_abs_unc` and `B +/- B_abs_unc`. Given `f = AB` or `f = A/B`, where `A != 0` and `B != 0`, the uncertainty in `f` is approximately: .. math:: \sigma_f = |f| \sqrt{\frac{\sigma_A}{A} ^ 2 + \frac{\sigma_B}{B} ^ 2} Raises: ZeroDivisionError : If `A == 0` or `B == 0`. """ return abs(f) * sqrt((A_abs_unc / A) ** 2 + (B_abs_unc / B) ** 2); def uncertainty_additive(c, A_abs_unc, d, B_abs_unc): """Compute the propagated uncertainty from addition of two uncertain values, `A +/- A_abs_unc` and `B +/- B_abs_unc`. Given `f = cA + dB`, where `c` and `d` are certain constants, the uncertainty in `f` is approximately: .. math:: f_{\sigma} = \sqrt{c ^ 2 * A_{\sigma} ^ 2 + d ^ 2 * B_{\sigma} ^ 2} """ return sqrt(((c ** 2) * (A_abs_unc ** 2)) + ((d ** 2) * (B_abs_unc ** 2))) ############################################################################### # XXX Create change class. def absolute_change(old, new): """Computes the absolute change from old to new: .. math:: absolute_change = new - old """ return new - old def absolute_change_uncertainty(old, old_unc, new, new_unc): """Computes the uncertainty in the absolute change from old to new and returns a tuple of the absolute change and the absolute change uncertainty. """ absolute_change = new - old absolute_change_unc = uncertainty_additive(1.0, new_unc, -1.0, old_unc) return (absolute_change, absolute_change_unc) def percent_change(old, new): """Computes the percent change from old to new: .. math:: percent_change = 100 \frac{new - old}{abs(old)} """ return float(new - old) / abs(old) def percent_change_uncertainty(old, old_unc, new, new_unc): """Computes the uncertainty in the percent change from old to new and returns a tuple of the absolute change, the absolute change uncertainty, the percent change and the percent change uncertainty. """ # Let's break this down into a few sub-operations: # # absolute_change = new - old <- Additive propagation. # relative_change = change / abs(old) <- Multiplicative propagation. # percent_change = 100 * y <- Multiplicative propagation. if old == 0: # We can't compute relative change because the old value is 0. return (float("nan"), float("nan"), float("nan"), float("nan")) (absolute_change, absolute_change_unc) = absolute_change_uncertainty( old, old_unc, new, new_unc ) if absolute_change == 0: # We can't compute relative change uncertainty because the relative # uncertainty of a value of 0 is undefined. return (absolute_change, absolute_change_unc, float("nan"), float("nan")) relative_change = float(absolute_change) / abs(old) relative_change_unc = uncertainty_multiplicative( relative_change, absolute_change, absolute_change_unc, old, old_unc ) percent_change = 100.0 * relative_change percent_change_unc = uncertainty_multiplicative( percent_change, 100.0, 0.0, relative_change, relative_change_unc ) return ( absolute_change, absolute_change_unc, percent_change, percent_change_unc ) ############################################################################### def find_significant_digit(x): """Return the significant digit of the number x. The result is the number of digits after the decimal place to round to (negative numbers indicate rounding before the decimal place).""" if x == 0: return 0 return -int(floor(log10(abs(x)))) def round_with_int_conversion(x, ndigits = None): """Rounds `x` to `ndigits` after the the decimal place. If `ndigits` is less than 1, convert the result to `int`. If `ndigits` is `None`, the significant digit of `x` is used.""" if ndigits is None: ndigits = find_significant_digit(x) x_rounded = round(x, ndigits) return int(x_rounded) if ndigits < 1 else x_rounded ############################################################################### class measured_variable(object): """A meta-variable representing measured data. It is composed of three raw variables plus units meta-data. Attributes: quantity (`str`) : Name of the quantity variable of this object. uncertainty (`str`) : Name of the uncertainty variable of this object. sample_size (`str`) : Name of the sample size variable of this object. units (units class or `None`) : The units the value is measured in. """ def __init__(self, quantity, uncertainty, sample_size, units = None): self.quantity = quantity self.uncertainty = uncertainty self.sample_size = sample_size self.units = units def as_tuple(self): return (self.quantity, self.uncertainty, self.sample_size, self.units) def __iter__(self): return iter(self.as_tuple()) def __str__(self): return str(self.as_tuple()) def __repr__(self): return str(self) class measured_value(object): """An object that represents a value determined by multiple measurements. Attributes: quantity (scalar) : The quantity of the value, e.g. the arithmetic mean. uncertainty (scalar) : The measurement uncertainty, e.g. the sample standard deviation. sample_size (`int`) : The number of observations contributing to the value. units (units class or `None`) : The units the value is measured in. """ def __init__(self, quantity, uncertainty, sample_size = 1, units = None): self.quantity = quantity self.uncertainty = uncertainty self.sample_size = sample_size self.units = units def as_tuple(self): return (self.quantity, self.uncertainty, self.sample_size, self.units) def __iter__(self): return iter(self.as_tuple()) def __str__(self): return str(self.as_tuple()) def __repr__(self): return str(self) ############################################################################### def arithmetic_mean(X): """Computes the arithmetic mean of the sequence `X`. Let: * `n = len(X)`. * `u` denote the arithmetic mean of `X`. .. math:: u = \frac{\sum_{i = 0}^{n - 1} X_i}{n} """ return sum(X) / len(X) def sample_variance(X, u = None): """Computes the sample variance of the sequence `X`. Let: * `n = len(X)`. * `u` denote the arithmetic mean of `X`. * `s` denote the sample standard deviation of `X`. .. math:: v = \frac{\sum_{i = 0}^{n - 1} (X_i - u)^2}{n - 1} Args: X (`Iterable`) : The sequence of values. u (number) : The arithmetic mean of `X`. """ if u is None: u = arithmetic_mean(X) return sum(imap(lambda X_i: (X_i - u) ** 2, X)) / (len(X) - 1) def sample_standard_deviation(X, u = None, v = None): """Computes the sample standard deviation of the sequence `X`. Let: * `n = len(X)`. * `u` denote the arithmetic mean of `X`. * `v` denote the sample variance of `X`. * `s` denote the sample standard deviation of `X`. .. math:: s &= \sqrt{v} &= \sqrt{\frac{\sum_{i = 0}^{n - 1} (X_i - u)^2}{n - 1}} Args: X (`Iterable`) : The sequence of values. u (number) : The arithmetic mean of `X`. v (number) : The sample variance of `X`. """ if u is None: u = arithmetic_mean(X) if v is None: v = sample_variance(X, u) return sqrt(v) def combine_sample_size(As): """Computes the combined sample variance of a group of `measured_value`s. Let: * `g = len(As)`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. .. math:: n = \sum{i = 0}^{g - 1} n_i """ return sum(imap(unpack_tuple(lambda u_i, s_i, n_i, t_i: n_i), As)) def combine_arithmetic_mean(As, n = None): """Computes the combined arithmetic mean of a group of `measured_value`s. Let: * `g = len(As)`. * `u_i = As[i].quantity`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. * `u` denote the arithmetic mean of the quantities of `As`. .. math:: u = \frac{\sum{i = 0}^{g - 1} n_i u_i}{n} """ if n is None: n = combine_sample_size(As) return sum(imap(unpack_tuple(lambda u_i, s_i, n_i, t_i: n_i * u_i), As)) / n def combine_sample_variance(As, n = None, u = None): """Computes the combined sample variance of a group of `measured_value`s. Let: * `g = len(As)`. * `u_i = As[i].quantity`. * `s_i = As[i].uncertainty`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. * `u` denote the arithmetic mean of the quantities of `As`. * `v` denote the sample variance of `X`. .. math:: v = \frac{(\sum_{i = 0}^{g - 1} n_i (u_i - u)^2 + s_i^2 (n_i - 1))}{n - 1} Args: As (`Iterable` of `measured_value`s) : The sequence of values. n (number) : The combined sample sizes of `As`. u (number) : The combined arithmetic mean of `As`. """ if n <= 1: return 0 if n is None: n = combine_sample_size(As) if u is None: u = combine_arithmetic_mean(As, n) return sum(imap(unpack_tuple( lambda u_i, s_i, n_i, t_i: n_i * (u_i - u) ** 2 + (s_i ** 2) * (n_i - 1) ), As)) / (n - 1) def combine_sample_standard_deviation(As, n = None, u = None, v = None): """Computes the combined sample standard deviation of a group of `measured_value`s. Let: * `g = len(As)`. * `u_i = As[i].quantity`. * `s_i = As[i].uncertainty`. * `n_i = As[i].samples`. * `n` denote the combined sample size of `As`. * `u` denote the arithmetic mean of the quantities of `As`. * `v` denote the sample variance of `X`. * `s` denote the sample standard deviation of `X`. .. math:: v &= \frac{(\sum_{i = 0}^{g - 1} n_i (u_i - u)^2 + s_i^2 (n_i - 1))}{n - 1} s &= \sqrt{v} Args: As (`Iterable` of `measured_value`s) : The sequence of values. n (number) : The combined sample sizes of `As`. u (number) : The combined arithmetic mean of `As`. v (number) : The combined sample variance of `As`. """ if n <= 1: return 0 if n is None: n = combine_sample_size(As) if u is None: u = combine_arithmetic_mean(As, n) if v is None: v = combine_sample_variance(As, n, u) return sqrt(v) ############################################################################### def store_const_multiple(const, *destinations): """Returns an `argument_action` class that sets multiple argument destinations (`destinations`) to `const`.""" class store_const_multiple_action(argument_action): def __init__(self, *args, **kwargs): super(store_const_multiple_action, self).__init__( metavar = None, nargs = 0, const = const, *args, **kwargs ) def __call__(self, parser, namespace, values, option_string = None): for destination in destinations: setattr(namespace, destination, const) return store_const_multiple_action def store_true_multiple(*destinations): """Returns an `argument_action` class that sets multiple argument destinations (`destinations`) to `True`.""" return store_const_multiple(True, *destinations) def store_false_multiple(*destinations): """Returns an `argument_action` class that sets multiple argument destinations (`destinations`) to `False`.""" return store_const_multiple(False, *destinations) ############################################################################### def process_program_arguments(): ap = argument_parser( description = ( "Compares two sets of combined performance results and identifies " "statistically significant changes." ) ) ap.add_argument( "baseline_input_file", help = ("CSV file containing the baseline performance results. The first " "two rows should be a header. The 1st header row specifies the " "name of each variable, and the 2nd header row specifies the units " "for that variable. The baseline results may be a superset of the " "observed performance results, but the reverse is not true. The " "baseline results must contain data for every datapoint in the " "observed performance results."), type = str ) ap.add_argument( "observed_input_file", help = ("CSV file containing the observed performance results. The first " "two rows should be a header. The 1st header row specifies the name " "of header row specifies the units for that variable."), type = str ) ap.add_argument( "-o", "--output-file", help = ("The file that results are written to. If `-`, results are " "written to stdout."), action = "store", type = str, default = "-", metavar = "OUTPUT" ) ap.add_argument( "-c", "--control-variable", help = ("Treat the specified variable as a control variable. This means " "it will be filtered out when forming dataset keys. For example, " "this could be used to ignore a timestamp variable that is " "different in the baseline and observed results. May be specified " "multiple times."), action = "append", type = str, dest = "control_variables", default = [], metavar = "QUANTITY" ) ap.add_argument( "-d", "--dependent-variable", help = ("Treat the specified three variables as a dependent variable. The " "1st variable is the measured quantity, the 2nd is the uncertainty " "of the measurement and the 3rd is the sample size. The defaults " "are the dependent variables of Thrust's benchmark suite. May be " "specified multiple times."), action = "append", type = str, dest = "dependent_variables", default = [], metavar = "QUANTITY,UNCERTAINTY,SAMPLES" ) ap.add_argument( "-t", "--change-threshold", help = ("Treat relative changes less than this amount (a percentage) as " "statistically insignificant. The default is 5%%."), action = "store", type = float, default = 5, metavar = "PERCENTAGE" ) ap.add_argument( "-p", "--preserve-whitespace", help = ("Don't trim leading and trailing whitespace from each CSV cell."), action = "store_true", default = False ) ap.add_argument( "--output-all-variables", help = ("Don't omit original absolute values in output."), action = "store_true", default = False ) ap.add_argument( "--output-all-datapoints", help = ("Don't omit datapoints that are statistically indistinguishable " "in output."), action = "store_true", default = False ) ap.add_argument( "-a", "--output-all", help = ("Equivalent to `--output-all-variables --output-all-datapoints`."), action = store_true_multiple("output_all_variables", "output_all_datapoints") ) return ap.parse_args() ############################################################################### def filter_comments(f, s = "#"): """Return an iterator to the file `f` which filters out all lines beginning with `s`.""" return filter(lambda line: not line.startswith(s), f) ############################################################################### class io_manager(object): """Manages I/O operations and represents the input data as an `Iterable` sequence of `dict`s. It is `Iterable` and an `Iterator`. It can be used with `with`. Attributes: preserve_whitespace (`bool`) : If `False`, leading and trailing whitespace is stripped from each CSV cell. writer (`csv_dict_writer`) : CSV writer object that the output is written to. output_file (`file` or `stdout`) : The output `file` object. baseline_reader (`csv_dict_reader`) : CSV reader object for the baseline results. observed_reader (`csv_dict_reader`) : CSV reader object for the observed results. baseline_input_file (`file`) : `file` object for the baseline results. observed_input_file (`file`) : `file` object for the observed results.. variable_names (`list` of `str`s) : Names of the variables, in order. variable_units (`list` of `str`s) : Units of the variables, in order. """ def __init__(self, baseline_input_file, observed_input_file, output_file, preserve_whitespace = False): """Read input files and open the output file and construct a new `io_manager` object. If `preserve_whitespace` is `False`, leading and trailing whitespace is stripped from each CSV cell. Raises AssertionError : If `type(preserve_whitespace) != bool`. """ assert type(preserve_whitespace) == bool self.preserve_whitespace = preserve_whitespace # Open baseline results. self.baseline_input_file = open(baseline_input_file) self.baseline_reader = csv_dict_reader( filter_comments(self.baseline_input_file) ) if not self.preserve_whitespace: strip_list(self.baseline_reader.fieldnames) self.variable_names = list(self.baseline_reader.fieldnames) # Copy. self.variable_units = self.baseline_reader.next() if not self.preserve_whitespace: strip_dict(self.variable_units) # Open observed results. self.observed_input_file = open(observed_input_file) self.observed_reader = csv_dict_reader( filter_comments(self.observed_input_file) ) if not self.preserve_whitespace: strip_list(self.observed_reader.fieldnames) # Make sure all inputs have the same variables schema. assert self.variable_names == self.observed_reader.fieldnames, \ "Observed results input file (`" + observed_input_file + "`) " + \ "variable schema `" + str(self.observed_reader.fieldnames) + "` does " + \ "not match the baseline results input file (`" + baseline_input_file + \ "`) variable schema `" + str(self.variable_names) + "`." # Consume the next row, which should be the second line of the header. observed_variable_units = self.observed_reader.next() if not self.preserve_whitespace: strip_dict(observed_variable_units) # Make sure all inputs have the same units schema. assert self.variable_units == observed_variable_units, \ "Observed results input file (`" + observed_input_file + "`) " + \ "units schema `" + str(observed_variable_units) + "` does not " + \ "match the baseline results input file (`" + baseline_input_file + \ "`) units schema `" + str(self.variable_units) + "`." if output_file == "-": # Output to stdout. self.output_file = stdout else: # Output to user-specified file. self.output_file = open(output_file, "w") self.writer = csv_dict_writer( self.output_file, fieldnames = self.variable_names ) def __enter__(self): """Called upon entering a `with` statement.""" return self def __exit__(self, *args): """Called upon exiting a `with` statement.""" if self.output_file is stdout: self.output_file = None elif self.output_file is not None: self.output_file.__exit__(*args) self.baseline_input_file.__exit__(*args) self.observed_input_file.__exit__(*args) def append_variable(self, name, units): """Add a new variable to the output schema.""" self.variable_names.append(name) self.variable_units.update({name : units}) # Update CSV writer field names. self.writer.fieldnames = self.variable_names def insert_variable(self, idx, name, units): """Insert a new variable into the output schema at index `idx`.""" self.variable_names.insert(idx, name) self.variable_units.update({name : units}) # Update CSV writer field names. self.writer.fieldnames = self.variable_names def remove_variable(self, name): """Remove variable from the output schema and return a tuple of the variable index and the variable units. Raises: ValueError : If `name` is not in the output schema. """ # Remove the variable and get its index, which we'll need to remove the # corresponding units entry. (idx, item) = remove_from_list(self.variable_names, name) # Remove the units entry. units = self.variable_units.pop(item) # Update CSV writer field names. self.writer.fieldnames = self.variable_names return (idx, units) ############################################################################# # Input Stream. def baseline(self): """Return an iterator to the baseline results input sequence.""" return imap(lambda row: strip_dict(row), self.baseline_reader) def observed(self): """Return an iterator to the observed results input sequence.""" return imap(lambda row: strip_dict(row), self.observed_reader) ############################################################################# # Output. def write_header(self): """Write the header for the output CSV file.""" # Write the first line of the header. self.writer.writeheader() # Write the second line of the header. self.writer.writerow(self.variable_units) def write(self, d): """Write a record (a `dict`) to the output CSV file.""" self.writer.writerow(d) ############################################################################### class dependent_variable_parser(object): """Parses a `--dependent-variable=AVG,STDEV,TRIALS` command line argument.""" ############################################################################# # Grammar # Parse a variable_name. variable_name_rule = r'[^,]+' # Parse a variable classification. dependent_variable_rule = r'(' + variable_name_rule + r')' \ + r',' \ + r'(' + variable_name_rule + r')' \ + r',' \ + r'(' + variable_name_rule + r')' engine = regex_compile(dependent_variable_rule) ############################################################################# def __call__(self, s): """Parses the string `s` with the form "AVG,STDEV,TRIALS". Returns: A `measured_variable`. Raises: AssertionError : If parsing fails. """ match = self.engine.match(s) assert match is not None, \ "Dependent variable (-d) `" +s+ "` is invalid, the format is " + \ "`AVG,STDEV,TRIALS`." return measured_variable(match.group(1), match.group(2), match.group(3)) ############################################################################### class record_aggregator(object): """Consumes and combines records and represents the result as an `Iterable` sequence of `dict`s. It is `Iterable` and an `Iterator`. Attributes: dependent_variables (`list` of `measured_variable`s) : A list of dependent variables provided on the command line. control_variables (`list` of `str`s) : A list of control variables provided on the command line. dataset (`dict`) : A mapping of distinguishing (e.g. control + independent) values (`tuple`s of variable-quantity pairs) to `list`s of dependent values (`dict`s from variables to lists of cells). in_order_dataset_keys : A list of unique dataset keys (e.g. distinguishing variables) in order of appearance. """ def __init__(self, dependent_variables, control_variables): """Construct a new `record_aggregator` object. Raises: AssertionError : If parsing of dependent variables fails. """ self.dependent_variables = dependent_variables self.control_variables = control_variables self.dataset = {} self.in_order_dataset_keys = deque() ############################################################################# # Insertion. def key_from_dict(self, d): """Create a hashable key from a `dict` by filtering out control variables and then converting the `dict` to a tuple. Raises: AssertionError : If any control variable was not found in `d`. """ distinguishing_values = d.copy() # Filter out control variables. for var in self.control_variables: distinguishing_values.pop(var, None) return key_from_dict(distinguishing_values) def append(self, record): """Add `record` to the dataset. Raises: ValueError : If any `str`-to-numeric conversions fail. """ # The distinguishing variables are the control and independent variables. # They form the key for each record in the dataset. Records with the same # distinguishing variables are treated as observations of the same # datapoint. dependent_values = {} # To allow the same sample size variable to be used for multiple dependent # variables, we don't pop sample size variables until we're done processing # all variables. sample_size_variables = [] # Separate the dependent values from the distinguishing variables and # perform `str`-to-numeric conversions. for var in self.dependent_variables: quantity, uncertainty, sample_size, units = var.as_tuple() dependent_values[quantity] = [int_or_float(record.pop(quantity))] dependent_values[uncertainty] = [int_or_float(record.pop(uncertainty))] dependent_values[sample_size] = [int(record[sample_size])] sample_size_variables.append(sample_size) # Pop sample size variables. for var in sample_size_variables: # Allowed to fail, as we may have duplicates. record.pop(var, None) distinguishing_values = self.key_from_dict(record) if distinguishing_values in self.dataset: # These distinguishing values already exist, so get the `dict` they're # mapped to, look up each key in `dependent_values` in the `dict`, and # add the corresponding quantity in `dependent_values` to the list in the # the `dict`. for var, columns in dependent_values.iteritems(): self.dataset[distinguishing_values][var] += columns else: # These distinguishing values aren't in the dataset, so add them and # record them in `in_order_dataset_keys`. self.dataset[distinguishing_values] = dependent_values self.in_order_dataset_keys.append(distinguishing_values) ############################################################################# # Postprocessing. def combine_dependent_values(self, dependent_values): """Takes a mapping of dependent variables to lists of cells and returns a new mapping with the cells combined. Raises: AssertionError : If class invariants were violated. """ combined_dependent_values = dependent_values.copy() for var in self.dependent_variables: quantity, uncertainty, sample_size, units = var.as_tuple() quantities = dependent_values[quantity] uncertainties = dependent_values[uncertainty] sample_sizes = dependent_values[sample_size] if type(sample_size) is list: # Sample size hasn't been combined yet. assert len(quantities) == len(uncertainties) \ and len(uncertainties) == len(sample_sizes), \ "Length of quantities list `(" + str(len(quantities)) + ")`, " + \ "length of uncertainties list `(" + str(len(uncertainties)) + \ "),` and length of sample sizes list `(" + str(len(sample_sizes)) + \ ")` are not the same." else: # Another dependent variable that uses our sample size has combined it # already. assert len(quantities) == len(uncertainties), \ "Length of quantities list `(" + str(len(quantities)) + ")` and " + \ "length of uncertainties list `(" + str(len(uncertainties)) + \ ")` are not the same." # Convert the three separate `list`s into one list of `measured_value`s. measured_values = [] for i in range(len(quantities)): mv = measured_value( quantities[i], uncertainties[i], sample_sizes[i], units ) measured_values.append(mv) # Combine the `measured_value`s. combined_sample_size = combine_sample_size( measured_values ) combined_arithmetic_mean = combine_arithmetic_mean( measured_values, combined_sample_size ) combined_sample_standard_deviation = combine_sample_standard_deviation( measured_values, combined_sample_size, combined_arithmetic_mean ) # Round the quantity and uncertainty to the significant digit of # uncertainty and insert the combined values into the results. sigdig = find_significant_digit(combined_sample_standard_deviation) # combined_arithmetic_mean = round_with_int_conversion( # combined_arithmetic_mean, sigdig # ) # combined_sample_standard_deviation = round_with_int_conversion( # combined_sample_standard_deviation, sigdig # ) combined_dependent_values[quantity] = combined_arithmetic_mean combined_dependent_values[uncertainty] = combined_sample_standard_deviation combined_dependent_values[sample_size] = combined_sample_size return combined_dependent_values ############################################################################# # Output Stream. def __iter__(self): """Return an iterator to the output sequence of separated distinguishing variables and dependent variables (a tuple of two `dict`s). This is a requirement for the `Iterable` protocol. """ return self def records(self): """Return an iterator to the output sequence of CSV rows (`dict`s of variables to values). """ return imap(unpack_tuple(lambda dist, dep: merge_dicts(dist, dep)), self) def next(self): """Produce the components of the next output record - a tuple of two `dict`s. The first `dict` is a mapping of distinguishing variables to distinguishing values, the second `dict` is a mapping of dependent variables to combined dependent values. Combining the two dicts forms a CSV row suitable for output. This is a requirement for the `Iterator` protocol. Raises: StopIteration : If there is no more output. AssertionError : If class invariants were violated. """ assert len(self.dataset.keys()) == len(self.in_order_dataset_keys), \ "Number of dataset keys (`" + str(len(self.dataset.keys())) + \ "`) is not equal to the number of keys in the ordering list (`" + \ str(len(self.in_order_dataset_keys)) + "`)." if len(self.in_order_dataset_keys) == 0: raise StopIteration() # Get the next set of distinguishing values and convert them to a `dict`. raw_distinguishing_values = self.in_order_dataset_keys.popleft() distinguishing_values = dict(raw_distinguishing_values) dependent_values = self.dataset.pop(raw_distinguishing_values) combined_dependent_values = self.combine_dependent_values(dependent_values) return (distinguishing_values, combined_dependent_values) def __getitem__(self, distinguishing_values): """Produce the dependent component, a `dict` mapping dependent variables to combined dependent values, associated with `distinguishing_values`. Args: distinguishing_values (`dict`) : A `dict` mapping distinguishing variables to distinguishing values. Raises: KeyError : If `distinguishing_values` is not in the dataset. """ raw_distinguishing_values = self.key_from_dict(distinguishing_values) dependent_values = self.dataset[raw_distinguishing_values] combined_dependent_values = self.combine_dependent_values(dependent_values) return combined_dependent_values ############################################################################### args = process_program_arguments() if len(args.dependent_variables) == 0: args.dependent_variables = [ "STL Average Walltime,STL Walltime Uncertainty,STL Trials", "STL Average Throughput,STL Throughput Uncertainty,STL Trials", "Thrust Average Walltime,Thrust Walltime Uncertainty,Thrust Trials", "Thrust Average Throughput,Thrust Throughput Uncertainty,Thrust Trials" ] # Parse dependent variable options. dependent_variables = [] parse_dependent_variable = dependent_variable_parser() #if args.dependent_variables is not None: for var in args.dependent_variables: dependent_variables.append(parse_dependent_variable(var)) # Read input files and open the output file. with io_manager(args.baseline_input_file, args.observed_input_file, args.output_file, args.preserve_whitespace) as iom: # Create record aggregators. baseline_ra = record_aggregator(dependent_variables, args.control_variables) observed_ra = record_aggregator(dependent_variables, args.control_variables) # Duplicate dependent variables: one for baseline results, one for observed # results. baseline_suffix = " - `{0}`".format( args.baseline_input_file ) observed_suffix = " - `{0}`".format( args.observed_input_file ) for var in dependent_variables: # Remove the existing quantity variable: # # [ ..., a, b, c, ... ] # ^- remove b at index i # (quantity_idx, quantity_units) = iom.remove_variable(var.quantity) # If the `--output-all-variables` option was specified, add the new baseline # and observed quantity variables. Note that we insert in the reverse of # the order we desire (which is baseline then observed): # # [ ..., a, b_1, c, ... ] # ^- insert b_1 at index i # # [ ..., a, b_0, b_1, c, ... ] # ^- insert b_0 at index i # if args.output_all_variables: iom.insert_variable( quantity_idx, var.quantity + observed_suffix, quantity_units ) iom.insert_variable( quantity_idx, var.quantity + baseline_suffix, quantity_units ) # Remove the existing uncertainty variable. (uncertainty_idx, uncertainty_units) = iom.remove_variable(var.uncertainty) # If the `--output-all-variables` option was specified, add the new baseline # and observed uncertainty variables. if args.output_all_variables: iom.insert_variable( uncertainty_idx, var.uncertainty + observed_suffix, uncertainty_units ) iom.insert_variable( uncertainty_idx, var.uncertainty + baseline_suffix, uncertainty_units ) try: # Remove the existing sample size variable. (sample_size_idx, sample_size_units) = iom.remove_variable(var.sample_size) # If the `--output-all-variables` option was specified, add the new # baseline and observed sample size variables. if args.output_all_variables: iom.insert_variable( sample_size_idx, var.sample_size + observed_suffix, sample_size_units ) iom.insert_variable( sample_size_idx, var.sample_size + baseline_suffix, sample_size_units ) except ValueError: # This is alright, because dependent variables may share the same sample # size variable. pass for var in args.control_variables: iom.remove_variable(var) # Add change variables. absolute_change_suffix = " - Change (`{0}` - `{1}`)".format( args.observed_input_file, args.baseline_input_file ) percent_change_suffix = " - % Change (`{0}` to `{1}`)".format( args.observed_input_file, args.baseline_input_file ) for var in dependent_variables: iom.append_variable(var.quantity + absolute_change_suffix, var.units) iom.append_variable(var.uncertainty + absolute_change_suffix, var.units) iom.append_variable(var.quantity + percent_change_suffix, "") iom.append_variable(var.uncertainty + percent_change_suffix, "") # Add all baseline input data to the `record_aggregator`. for record in iom.baseline(): baseline_ra.append(record) for record in iom.observed(): observed_ra.append(record) iom.write_header() # Compare and output results. for distinguishing_values, observed_dependent_values in observed_ra: try: baseline_dependent_values = baseline_ra[distinguishing_values] except KeyError: assert False, \ "Distinguishing value `" + \ str(baseline_ra.key_from_dict(distinguishing_values)) + \ "` was not found in the baseline results." statistically_significant_change = False record = distinguishing_values.copy() # Compute changes, add the values and changes to the record, and identify # changes that are statistically significant. for var in dependent_variables: # Compute changes. baseline_quantity = baseline_dependent_values[var.quantity] baseline_uncertainty = baseline_dependent_values[var.uncertainty] baseline_sample_size = baseline_dependent_values[var.sample_size] observed_quantity = observed_dependent_values[var.quantity] observed_uncertainty = observed_dependent_values[var.uncertainty] observed_sample_size = observed_dependent_values[var.sample_size] (abs_change, abs_change_unc, per_change, per_change_unc) = \ percent_change_uncertainty( baseline_quantity, baseline_uncertainty, observed_quantity, observed_uncertainty ) # Round the change quantities and uncertainties to the significant digit # of uncertainty. try: abs_change_sigdig = max( find_significant_digit(abs_change), find_significant_digit(abs_change_unc), ) # abs_change = round_with_int_conversion( # abs_change, abs_change_sigdig # ) # abs_change_unc = round_with_int_conversion( # abs_change_unc, abs_change_sigdig # ) except: # Any value errors should be due to NaNs returned by # `percent_change_uncertainty` because quantities or change in # quantities was 0. We can ignore these. pass try: per_change_sigdig = max( find_significant_digit(per_change), find_significant_digit(per_change_unc) ) # per_change = round_with_int_conversion( # per_change, per_change_sigdig # ) # per_change_unc = round_with_int_conversion( # per_change_unc, per_change_sigdig # ) except: # Any value errors should be due to NaNs returned by # `percent_change_uncertainty` because quantities or change in # quantities was 0. We can ignore these. pass # Add the values (if the `--output-all-variables` option was specified) # and the changes to the record. Note that the record's schema is # different from the original schema. If multiple dependent variables # share the same sample size variable, it's fine - they will overwrite # each other, but with the same value. if args.output_all_variables: record[var.quantity + baseline_suffix] = baseline_quantity record[var.uncertainty + baseline_suffix] = baseline_uncertainty record[var.sample_size + baseline_suffix] = baseline_sample_size record[var.quantity + observed_suffix] = observed_quantity record[var.uncertainty + observed_suffix] = observed_uncertainty record[var.sample_size + observed_suffix] = observed_sample_size record[var.quantity + absolute_change_suffix] = abs_change record[var.uncertainty + absolute_change_suffix] = abs_change_unc record[var.quantity + percent_change_suffix] = per_change record[var.uncertainty + percent_change_suffix] = per_change_unc # If the range of uncertainties overlap don't overlap and the percentage # change is greater than the change threshold, then change is # statistically significant. overlap = ranges_overlap_uncertainty( baseline_quantity, baseline_uncertainty, observed_quantity, observed_uncertainty ) if not overlap and per_change >= args.change_threshold: statistically_significant_change = True # Print the record if a statistically significant change was found or if the # `--output-all-datapoints` option was specified. if args.output_all_datapoints or statistically_significant_change: iom.write(record) thrust-1.9.5/internal/benchmark/random.h000066400000000000000000000043771344621116200202710ustar00rootroot00000000000000#pragma once #include #include struct hash32 { __host__ __device__ unsigned int operator()(unsigned int h) const { h = ~h + (h << 15); h = h ^ (h >> 12); h = h + (h << 2); h = h ^ (h >> 4); h = h + (h << 3) + (h << 11); h = h ^ (h >> 16); return h; } }; struct hash64 { __host__ __device__ unsigned long long operator()(unsigned long long h) const { h = ~h + (h << 21); h = h ^ (h >> 24); h = (h + (h << 3)) + (h << 8); h = h ^ (h >> 14); h = (h + (h << 2)) + (h << 4); h = h ^ (h >> 28); h = h + (h << 31); return h; } }; struct hashtofloat { __host__ __device__ float operator()(unsigned int h) const { return static_cast(hash32()(h)) / 4294967296.0f; } }; struct hashtodouble { __host__ __device__ double operator()(unsigned long long h) const { return static_cast(hash64()(h)) / 18446744073709551616.0; } }; template void _randomize(Vector& v, T) { thrust::transform(thrust::counting_iterator(0), thrust::counting_iterator(0) + v.size(), v.begin(), hash32()); } template void _randomize(Vector& v, long long) { thrust::transform(thrust::counting_iterator(0), thrust::counting_iterator(0) + v.size(), v.begin(), hash64()); } template void _randomize(Vector& v, float) { thrust::transform(thrust::counting_iterator(0), thrust::counting_iterator(0) + v.size(), v.begin(), hashtofloat()); } template void _randomize(Vector& v, double) { thrust::transform(thrust::counting_iterator(0), thrust::counting_iterator(0) + v.size(), v.begin(), hashtodouble()); } // fill Vector with random values template void randomize(Vector& v) { _randomize(v, typename Vector::value_type()); } thrust-1.9.5/internal/benchmark/tbb_algos.h000066400000000000000000000072511344621116200207370ustar00rootroot00000000000000#pragma once #include #include #include #include #include #include #include #include // For std::size_t. #include template struct NegateBody { void operator()(T& x) const { x = -x; } }; template struct ForBody { typedef typename Vector::value_type T; private: Vector& v; public: ForBody(Vector& x) : v(x) {} void operator()(tbb::blocked_range const& r) const { for (std::size_t i = r.begin(); i != r.end(); ++i) v[i] = -v[i]; } }; template struct ReduceBody { typedef typename Vector::value_type T; private: Vector& v; public: T sum; ReduceBody(Vector& x) : v(x), sum(0) {} ReduceBody(ReduceBody& x, tbb::split) : v(x.v), sum(0) {} void operator()(tbb::blocked_range const& r) { for (std::size_t i = r.begin(); i != r.end(); ++i) sum += v[i]; } void join(ReduceBody const& x) { sum += x.sum; } }; template struct ScanBody { typedef typename Vector::value_type T; private: Vector& v; public: T sum; ScanBody(Vector& x) : sum(0), v(x) {} ScanBody(ScanBody& x, tbb::split) : v(x.v), sum(0) {} template void operator()(tbb::blocked_range const& r, Tag) { T temp = sum; for (std::size_t i = r.begin(); i < r.end(); ++i) { temp = temp + x[i]; if (Tag::is_final_scan()) x[i] = temp; } sum = temp; } void assign(ScanBody const& x) { sum = x.sum; } T get_sum() const { return sum; } void reverse_join(ScanBody const& x) { sum = x.sum + sum;} }; template struct CopyBody { typedef typename Vector::value_type T; private: Vector &v; Vector &u; public: CopyBody(Vector& x, Vector& y) : v(x), u(y) {} void operator()(tbb::blocked_range const& r) const { for (std::size_t i = r.begin(); i != r.end(); ++i) v[i] = u[i]; } }; template typename Vector::value_type tbb_reduce(Vector& v) { ReduceBody body(v); tbb::parallel_reduce(tbb::blocked_range(0, v.size()), body); return body.sum; } template void tbb_sort(Vector& v) { tbb::parallel_sort(v.begin(), v.end()); } template void tbb_transform(Vector& v) { ForBody body(v); tbb::parallel_for(tbb::blocked_range(0, v.size()), body); } template void tbb_scan(Vector& v) { ScanBody body(v); tbb::parallel_scan(tbb::blocked_range(0, v.size()), body); } template void tbb_copy(Vector& v, Vector& u) { CopyBody body(v, u); tbb::parallel_for(tbb::blocked_range(0, v.size()), body); } void test_tbb() { std::size_t elements = 1 << 20; std::vector A(elements); std::vector B(elements); std::vector C(elements); std::vector D(elements); randomize(A); randomize(B); assert(std::accumulate(A.begin(), A.end(), 0) == tbb_reduce(A)); randomize(A); randomize(B); std::transform(A.begin(), A.end(), A.begin(), thrust::negate()); tbb_transform(B); assert(A == B); randomize(A); randomize(B); std::partial_sum(A.begin(), A.end(), A.begin()); tbb_scan(B); assert(A == B); randomize(A); randomize(B); std::sort(A.begin(), A.end()); tbb_sort(B); assert(A == B); randomize(A); randomize(B); randomize(C); randomize(D); std::copy(A.begin(), A.end(), C.begin()); tbb_copy(B, D); assert(A == B); assert(C == D); } thrust-1.9.5/internal/benchmark/timer.h000066400000000000000000000062041344621116200201200ustar00rootroot00000000000000#pragma once #include # define CUDA_SAFE_CALL_NO_SYNC( call) do { \ cudaError err = call; \ if( cudaSuccess != err) { \ fprintf(stderr, "CUDA error in file '%s' in line %i : %s.\n", \ __FILE__, __LINE__, cudaGetErrorString( err) ); \ exit(EXIT_FAILURE); \ } } while (0) # define CUDA_SAFE_CALL( call) do { \ CUDA_SAFE_CALL_NO_SYNC(call); \ cudaError err = cudaDeviceSynchronize(); \ if( cudaSuccess != err) { \ fprintf(stderr, "CUDA error in file '%s' in line %i : %s.\n", \ __FILE__, __LINE__, cudaGetErrorString( err) ); \ exit(EXIT_FAILURE); \ } } while (0) class cuda_timer { cudaEvent_t start_; cudaEvent_t stop_; public: cuda_timer() { CUDA_SAFE_CALL(cudaEventCreate(&start_)); CUDA_SAFE_CALL(cudaEventCreate(&stop_)); } ~cuda_timer() { CUDA_SAFE_CALL(cudaEventDestroy(start_)); CUDA_SAFE_CALL(cudaEventDestroy(stop_)); } void start() { CUDA_SAFE_CALL(cudaEventRecord(start_, 0)); } void stop() { CUDA_SAFE_CALL(cudaEventRecord(stop_, 0)); CUDA_SAFE_CALL(cudaEventSynchronize(stop_)); } double milliseconds_elapsed() { float elapsed_time; CUDA_SAFE_CALL(cudaEventElapsedTime(&elapsed_time, start_, stop_)); return elapsed_time; } double seconds_elapsed() { return milliseconds_elapsed() / 1000.0; } }; #if (THRUST_HOST_COMPILER == THRUST_HOST_COMPILER_MSVC) #include class steady_timer { LARGE_INTEGER frequency_; // Cached to avoid system calls. LARGE_INTEGER start_; LARGE_INTEGER stop_; public: steady_timer() : start_(), stop_(), frequency_() { BOOL const r = QueryPerformanceFrequency(&frequency_); assert(0 != r); } void start() { BOOL const r = QueryPerformanceCounter(&start_); assert(0 != r); } void stop() { BOOL const r = QueryPerformanceCounter(&stop_); assert(0 != r); } double seconds_elapsed() { return double(stop_.QuadPart - start_.QuadPart) / double(frequency_.QuadPart); } }; #else #include class steady_timer { timespec start_; timespec stop_; public: steady_timer() : start_(), stop_() {} void start() { int const r = clock_gettime(CLOCK_MONOTONIC, &start_); assert(0 == r); } void stop() { int const r = clock_gettime(CLOCK_MONOTONIC, &stop_); assert(0 == r); } double seconds_elapsed() { return double(stop_.tv_sec - start_.tv_sec) + double(stop_.tv_nsec - start_.tv_nsec) * 1.0e-9; } }; #endif thrust-1.9.5/internal/build/000077500000000000000000000000001344621116200157725ustar00rootroot00000000000000thrust-1.9.5/internal/build/common_build.mk000066400000000000000000000046141344621116200207770ustar00rootroot00000000000000USE_NEW_PROJECT_MK := 1 ifeq ($(OS),Linux) LIBRARIES += m endif include $(ROOTDIR)/thrust/internal/build/common_warnings.mk # Add /bigobj to Windows build flag to workaround building Thrust with debug ifeq ($(OS), win32) CUDACC_FLAGS += -Xcompiler "/bigobj" endif ARCH_NEG_FILTER += 20 21 # Determine which SASS to generate # if DVS (either per-CL or on-demand) ifneq ($(or $(THRUST_DVS),$(THRUST_DVS_NIGHTLY)),) # DVS doesn't run Thrust on fermi so filter out SM 2.0/2.1 # DVS doesn't run Thrust on mobile so filter those out as well # DVS doesn't have PASCAL configs at the moment ARCH_NEG_FILTER += 20 21 32 37 53 60 else # If building for ARMv7 (32-bit ARM), build only mobile SASS since no dGPU+ARM32 are supported anymore ifeq ($(TARGET_ARCH),ARMv7) ARCH_FILTER = 32 53 62 endif # If its androideabi, we know its mobile, so can target specific SASS ifeq ($(OS),Linux) ifeq ($(ABITYPE), androideabi) ARCH_FILTER = 32 53 62 ifeq ($(THRUST_TEST),1) NVCC_OPTIONS += -include "$(ROOTDIR)/cuda/tools/demangler/demangler.h" LIBRARIES += demangler endif endif endif endif # Add -mthumb for Linux on ARM to work around bug in arm cross compiler from p4 ifeq ($(TARGET_ARCH),ARMv7) ifneq ($(HOST_ARCH),ARMv7) ifeq ($(THRUST_TEST),1) CUDACC_FLAGS += -Xcompiler "-mthumb" endif endif endif # Make PGI statically link against its libraries. ifeq ($(OS),$(filter $(OS),Linux Darwin)) ifdef USEPGCXX NVCC_LDFLAGS += -Xcompiler "-Bstatic_pgi" endif endif ifeq ($(SRC_PATH),) SRC_PATH:=$(dir $(BUILD_SRC)) BUILD_SRC:=$(notdir $(BUILD_SRC)) endif BUILD_SRC_SUFFIX:=$(suffix $(BUILD_SRC)) ifeq ($(BUILD_SRC_SUFFIX),.cu) CU_FILES += $(BUILD_SRC) else ifeq ($(BUILD_SRC_SUFFIX),.cpp) FILES += $(BUILD_SRC) endif # CUDA includes ifdef VULCAN INCLUDES_ABSPATH += $(VULCAN_INSTALL_DIR)/cuda/include INCLUDES_ABSPATH += $(VULCAN_INSTALL_DIR)/cuda/_internal/cudart else INCLUDES_ABSPATH += $(ROOTDIR)/cuda/inc INCLUDES_ABSPATH += $(ROOTDIR)/cuda/tools/cudart endif # Thrust includes ifdef VULCAN INCLUDES_ABSPATH += $(VULCAN_TOOLKIT_BASE)/thrust else INCLUDES_ABSPATH += $(ROOTDIR)/thrust endif ifdef VULCAN LIBDIRS_ABSPATH += $(VULCAN_BUILD_DIR)/bin/$(VULCAN_ARCH)_$(VULCAN_OS)$(VULCAN_ABI)_$(VULCAN_BUILD) endif ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/common.mk else include $(ROOTDIR)/build/common.mk endif thrust-1.9.5/internal/build/common_detect.mk000066400000000000000000000006401344621116200211430ustar00rootroot00000000000000ifeq ($(THRUST_TEST),1) include $(ROOTDIR)/build/getprofile.mk include $(ROOTDIR)/build/config/$(PROFILE).mk else ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/getprofile.mk include $(VULCAN_TOOLKIT_BASE)/build/config/$(PROFILE).mk else include $(ROOTDIR)/build/getprofile.mk include $(ROOTDIR)/build/config/$(PROFILE).mk endif # VULCAN_TOOLKIT_BASE endif # THRUST_TEST thrust-1.9.5/internal/build/common_warnings.mk000066400000000000000000000112331344621116200215230ustar00rootroot00000000000000ifeq ($(OS),$(filter $(OS),Linux Darwin)) ifndef USEPGCXX CUDACC_FLAGS += -Xcompiler "-Wall -Wextra -Werror" ifdef USEXLC # GCC does not warn about unused parameters in uninstantiated # template functions, but xlC does. This causes xlC to choke on the # OMP backend, which is mostly #ifdef'd out when you aren't using it. CUDACC_FLAGS += -Xcompiler "-Wno-unused-parameter" else # GCC, ICC or Clang AKA the sane ones. # XXX Enable -Wcast-align. CUDACC_FLAGS += -Xcompiler "-Winit-self -Woverloaded-virtual -Wno-cast-align -Wcast-qual -Wno-long-long -Wno-variadic-macros -Wno-unused-function" ifdef USE_CLANGLLVM IS_CLANG := 1 endif ifeq ($(ABITYPE), androideabi) ifneq ($(findstring clang, $(BASE_COMPILER)),) IS_CLANG := 1 endif endif ifeq ($(OS), Darwin) IS_CLANG := 1 endif ifdef IS_CLANG ifdef USE_CLANGLLVM CLANG_VERSION = $(shell $(USE_CLANGLLVM) --version 2>/dev/null | head -1 | sed -e 's/.*\([0-9]\)\.\([0-9]\)\(\.[0-9]\).*/\1\2/g') else CLANG_VERSION = $(shell $(CCBIN) --version 2>/dev/null | head -1 | sed -e 's/.*\([0-9]\)\.\([0-9]\)\(\.[0-9]\).*/\1\2/g') endif # GCC does not warn about unused parameters in uninstantiated # template functions, but Clang does. This causes Clang to choke on the # OMP backend, which is mostly #ifdef'd out when you aren't using it. CUDACC_FLAGS += -Xcompiler "-Wno-unused-parameter" # -Wunneeded-internal-declaration misfires in the unit test framework # on older versions of Clang. CUDACC_FLAGS += -Xcompiler "-Wno-unneeded-internal-declaration" ifeq ($(shell if test $(CLANG_VERSION) -ge 60; then echo true; fi),true) # Clang complains about name mangling changes due to `noexcept` # becoming part of the type system; we don't care. CUDACC_FLAGS += -Xcompiler "-Wno-noexcept-type" endif else # GCC ifdef CCBIN CCBIN_ENVIRONMENT := ifeq ($(OS), QNX) # QNX's GCC complains if QNX_HOST and QNX_TARGET aren't defined in the # environment. CCBIN_ENVIRONMENT := QNX_HOST=$(QNX_HOST) QNX_TARGET=$(QNX_TARGET) endif # Newer versions of GCC only print the major number with the # -dumpversion flag, but they print all three with -dumpfullversion. GCC_VERSION = $(shell $(CCBIN_ENVIRONMENT) $(CCBIN) -dumpfullversion 2>/dev/null | sed -e 's/\([0-9]\)\.\([0-9]\)\(\.[0-9]\)\?/\1\2/g') ifeq ($(GCC_VERSION),) # Older versions of GCC (~4.4 and older) seem to print three version # numbers (major, minor and patch) with the -dumpversion flag; newer # versions only print one or two numbers. GCC_VERSION = $(shell $(CCBIN_ENVIRONMENT) $(CCBIN) -dumpversion | sed -e 's/\([0-9]\)\.\([0-9]\)\(\.[0-9]\)\?/\1\2/g') endif ifeq ($(shell if test $(GCC_VERSION) -lt 42; then echo true; fi),true) # In GCC 4.1.2 and older, numeric conversion warnings are not # suppressable, so shut off -Wno-error. CUDACC_FLAGS += -Xcompiler "-Wno-error" endif ifeq ($(shell if test $(GCC_VERSION) -eq 44; then echo true; fi),true) # In GCC 4.4, the CUDA backend's kernel launch templates cause # impossible-to-decipher "'' is used uninitialized in # this function" warnings, so disable uninitialized variable # warnings. CUDACC_FLAGS += -Xcompiler "-Wno-uninitialized" endif ifeq ($(shell if test $(GCC_VERSION) -ge 45; then echo true; fi),true) # This isn't available until GCC 4.3, and misfires on TMP code until # GCC 4.5. CUDACC_FLAGS += -Xcompiler "-Wlogical-op" endif ifeq ($(shell if test $(GCC_VERSION) -ge 73; then echo true; fi),true) # GCC 7.3 complains about name mangling changes due to `noexcept` # becoming part of the type system; we don't care. CUDACC_FLAGS += -Xcompiler "-Wno-noexcept-type" endif else $(error CCBIN is not defined.) endif endif endif endif else ifeq ($(OS),win32) # XXX Enable /Wall CUDACC_FLAGS += -Xcompiler "/WX" # Disabled loss-of-data conversion warnings. # XXX Re-enable. CUDACC_FLAGS += -Xcompiler "/wd4244 /wd4267" # Suppress numeric conversion-to-bool warnings. # XXX Re-enable. CUDACC_FLAGS += -Xcompiler "/wd4800" # Disable warning about applying unary - to unsigned type. CUDACC_FLAGS += -Xcompiler "/wd4146" endif thrust-1.9.5/internal/build/generic_example.mk000066400000000000000000000010661344621116200214550ustar00rootroot00000000000000# Generic project mk that is included by examples mk # EXAMPLE_NAME : the name of the example # EXAMPLE_SRC : path to the source code relative to thrust EXECUTABLE := $(EXAMPLE_NAME) BUILD_SRC := $(ROOTDIR)/thrust/$(EXAMPLE_SRC) include $(ROOTDIR)/thrust/internal/build/common_detect.mk EXAMPLE_MAKEFILE := $(join $(dir $(BUILD_SRC)), $(basename $(notdir $(BUILD_SRC))).mk) ifneq ("$(wildcard $(EXAMPLE_MAKEFILE))","") # Check if the file exists. include $(EXAMPLE_MAKEFILE) endif include $(ROOTDIR)/thrust/internal/build/common_build.mk thrust-1.9.5/internal/build/generic_test.mk000066400000000000000000000013261344621116200210000ustar00rootroot00000000000000# Generic project mk that is included by unit tests mk # TEST_NAME : the name of the test # TEST_SRC : path to the source code relative to thrust EXECUTABLE := $(TEST_NAME) BUILD_SRC := $(ROOTDIR)/thrust/$(TEST_SRC) ifdef VULCAN INCLUDES_ABSPATH += $(VULCAN_TOOLKIT_BASE)/thrust/testing else INCLUDES_ABSPATH += $(ROOTDIR)/thrust/testing endif PROJ_LIBRARIES += testframework THRUST_TEST := 1 include $(ROOTDIR)/thrust/internal/build/common_detect.mk TEST_MAKEFILE := $(join $(dir $(BUILD_SRC)), $(basename $(notdir $(BUILD_SRC))).mk) ifneq ("$(wildcard $(TEST_MAKEFILE))","") # Check if the file exists. include $(TEST_MAKEFILE) endif include $(ROOTDIR)/thrust/internal/build/common_build.mk thrust-1.9.5/internal/build/testframework.mk000066400000000000000000000007151344621116200212230ustar00rootroot00000000000000STATIC_LIBRARY := testframework SRC_PATH := $(ROOTDIR)/thrust/testing/ BUILD_SRC := testframework.cpp CUSRC := backend/cuda/testframework.cu $(CUSRC).CUDACC_FLAGS := -I$(ROOTDIR)/thrust/testing/backend/cuda/ $(CUSRC).TARGET_BASENAME := testframework_cu CU_FILES += $(CUSRC) INCLUDES_ABSPATH += $(ROOTDIR)/thrust/testing THRUST_TEST := 1 include $(ROOTDIR)/thrust/internal/build/common_detect.mk include $(ROOTDIR)/thrust/internal/build/common_build.mk thrust-1.9.5/internal/build/warningstester.mk000066400000000000000000000031101344621116200213750ustar00rootroot00000000000000USE_NEW_PROJECT_MK := 1 EXECUTABLE := warningstester PROJ_DIR := internal/build #GENCODE := ifndef PROFILE ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/getprofile.mk include $(VULCAN_TOOLKIT_BASE)/build/config/$(PROFILE).mk else include $(ROOTDIR)/build/getprofile.mk include $(ROOTDIR)/build/config/$(PROFILE).mk endif endif ARCH_NEG_FILTER += 20 21 ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/config/DetectOS.mk else include $(ROOTDIR)/build/config/DetectOS.mk endif CU_FILES += ../test/warningstester.cu # Thrust includes (thrust/) ifdef VULCAN INCLUDES += $(VULCAN_INSTALL_DIR)/cuda/include/ INCLUDES += $(VULCAN_INSTALL_DIR)/cuda/_internal/cudart else INCLUDES += ../../ INCLUDES += ../../../cuda/tools/cudart endif # Location of generated include file that includes all Thrust public headers GENERATED_SOURCES = $(BUILT_CWD) CUDACC_FLAGS += -I$(GENERATED_SOURCES) include $(ROOTDIR)/thrust/internal/build/common_warnings.mk ifdef VULCAN_TOOLKIT_BASE include $(VULCAN_TOOLKIT_BASE)/build/common.mk else include $(ROOTDIR)/build/common.mk endif warningstester$(OBJSUFFIX): $(GENERATED_SOURCES)/warningstester.h $(GENERATED_SOURCES)/warningstester.h: FORCE ifdef VULCAN ifeq ($(TARGET_ARCH), ppc64le) $(PYTHON) $(SRC_CWD)/warningstester_create_uber_header.py $(VULCAN_INSTALL_DIR)/cuda/targets/ppc64le-linux/include > $@ else $(PYTHON) $(SRC_CWD)/warningstester_create_uber_header.py $(VULCAN_INSTALL_DIR)/cuda/include > $@ endif else $(PYTHON) $(SRC_CWD)/warningstester_create_uber_header.py $(SRC_CWD)/../.. > $@ endif FORCE: thrust-1.9.5/internal/build/warningstester_create_uber_header.py000066400000000000000000000030371344621116200252760ustar00rootroot00000000000000''' Helper script for creating a header file that includes all of Thrust's public headers. This is useful for instance, to quickly check that all the thrust headers obey proper syntax or are warning free. This script simply outputs a list of C-style #include's to the standard output--this should be redirected to a header file by the caller. ''' import sys import os import re from stat import * thrustdir = sys.argv[1] def find_headers(base_dir, rel_dir, exclude = ['\B']): ''' Recursively find all *.h files inside base_dir/rel_dir, except any that match the exclude regexp list ''' assert(type(exclude) == list) full_dir = base_dir + '/' + rel_dir result = [] for f in os.listdir(full_dir): rel_file = rel_dir + '/' + f for e in exclude: if re.match(e, rel_file): break else: if f.endswith('.h'): result.append(rel_file) elif S_ISDIR(os.stat(full_dir + '/' + f).st_mode): result.extend(find_headers(base_dir, rel_file, exclude)) return result print('/* File is generated by ' + sys.argv[0] + ' */') exclude_re = ['.*/detail$', 'thrust/iterator', 'thrust/random', 'thrust/system/tbb'] headers = find_headers(thrustdir, 'thrust', exclude_re) if len(headers) == 0: print('#error no include files found\n') print('#define THRUST_CPP11_REQUIRED_NO_ERROR') print('#define THRUST_MODERN_GCC_REQUIRED_NO_ERROR') for h in headers: print('#include <' + h + '>') exit() thrust-1.9.5/internal/racecheck.sh000077500000000000000000000014571344621116200171510ustar00rootroot00000000000000#!/bin/sh MEMCHECK=/work/nightly/memcheck/bin/x86_64_Linux_release/cuda-memcheck ######################### files=`ls thrust.test.*`; files=`ls thrust.example.*`; ######################### nfiles=0 for fn in $files; do nfiles=$((nfiles + 1)) done j=1 for fn in $files; do echo " ----------------------------------------------------------------------" echo " *** MEMCHECK *** [$j/$nfiles] $fn" echo " ----------------------------------------------------------------------" $MEMCHECK --tool memcheck ./$fn --verbose echo " ----------------------------------------------------------------------" echo " *** RACECHECK *** [$j/$nfiles] $fn" echo " ----------------------------------------------------------------------" $MEMCHECK --tool racecheck ./$fn --verbose --sizes=small j=$((j+1)) done; thrust-1.9.5/internal/rename_cub_namespace.sh000077500000000000000000000002741344621116200213510ustar00rootroot00000000000000#! /bin/bash # Run this in //sw/gpgpu/thrust/thrust/system/cuda/detail/cub to add a THRUST_ # prefix to CUB's namespace macro. sed -i -e 's/CUB_NS_P/THRUST_CUB_NS_P/g' `find . -type f` thrust-1.9.5/internal/reverse_rename_cub_namespace.sh000077500000000000000000000002711344621116200231010ustar00rootroot00000000000000#! /bin/bash # Run this in //sw/gpgpu/thrust/thrust/system/cuda/detail/cub to undo the # renaming of CUB's namespace macro. sed -i -e 's|THRUST_CUB_NS_P|CUB_NS_P|g' `find . -type f` thrust-1.9.5/internal/scripts/000077500000000000000000000000001344621116200163625ustar00rootroot00000000000000thrust-1.9.5/internal/scripts/eris_perf.py000077500000000000000000000135221344621116200207200ustar00rootroot00000000000000#! /usr/bin/env python # -*- coding: utf-8 -*- ############################################################################### # Copyright (c) 2018 NVIDIA Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ############################################################################### from sys import exit from os.path import join, dirname, basename, realpath from csv import DictReader as csv_dict_reader from subprocess import Popen from argparse import ArgumentParser as argument_parser ############################################################################### def printable_cmd(c): """Converts a `list` of `str`s representing a shell command to a printable `str`.""" return " ".join(map(lambda e: '"' + str(e) + '"', c)) ############################################################################### def print_file(p): """Open the path `p` and print its contents to `stdout`.""" print "********************************************************************************" with open(p) as f: for line in f: print line, print "********************************************************************************" ############################################################################### ap = argument_parser( description = ( "CUDA Eris driver script: runs a benchmark suite multiple times, combines " "the results, and outputs them in the CUDA Eris performance result format." ) ) ap.add_argument( "-b", "--benchmark", help = ("The location of the benchmark suite executable to run."), type = str, default = join(dirname(realpath(__file__)), "bench"), metavar = "R" ) ap.add_argument( "-p", "--postprocess", help = ("The location of the postprocessing script to run to combine the " "results."), type = str, default = join(dirname(realpath(__file__)), "combine_benchmark_results.py"), metavar = "R" ) ap.add_argument( "-r", "--runs", help = ("Run the benchmark suite `R` times.a),"), type = int, default = 5, metavar = "R" ) args = ap.parse_args() if args.runs <= 0: print "ERROR: `--runs` must be greater than `0`." ap.print_help() exit(1) BENCHMARK_EXE = args.benchmark BENCHMARK_NAME = basename(BENCHMARK_EXE) POSTPROCESS_EXE = args.postprocess OUTPUT_FILE_NAME = lambda i: BENCHMARK_NAME + "_" + str(i) + ".csv" COMBINED_OUTPUT_FILE_NAME = BENCHMARK_NAME + "_combined.csv" ############################################################################### print '&&&& RUNNING {0}'.format(BENCHMARK_NAME) print '#### RUNS {0}'.format(args.runs) ############################################################################### print '#### CMD {0}'.format(BENCHMARK_EXE) for i in xrange(args.runs): with open(OUTPUT_FILE_NAME(i), "w") as output_file: print '#### RUN {0} OUTPUT -> {1}'.format(i, OUTPUT_FILE_NAME(i)) p = None try: p = Popen(BENCHMARK_EXE, stdout = output_file, stderr = output_file) p.communicate() except OSError as ex: print_file(OUTPUT_FILE_NAME(i)) print '#### ERROR Caught OSError `{0}`.'.format(ex) print '&&&& FAILED {0}'.format(BENCHMARK_NAME) exit(-1) print_file(OUTPUT_FILE_NAME(i)) if p.returncode != 0: print '#### ERROR Process exited with code {0}.'.format(p.returncode) print '&&&& FAILED {0}'.format(BENCHMARK_NAME) exit(p.returncode) ############################################################################### post_cmd = [POSTPROCESS_EXE] # Add dependent variable options. post_cmd += ["-dSTL Average Walltime,STL Walltime Uncertainty,STL Trials"] post_cmd += ["-dSTL Average Throughput,STL Throughput Uncertainty,STL Trials"] post_cmd += ["-dThrust Average Walltime,Thrust Walltime Uncertainty,Thrust Trials"] post_cmd += ["-dThrust Average Throughput,Thrust Throughput Uncertainty,Thrust Trials"] post_cmd += [OUTPUT_FILE_NAME(i) for i in range(args.runs)] print '#### CMD {0}'.format(printable_cmd(post_cmd)) with open(COMBINED_OUTPUT_FILE_NAME, "w") as output_file: p = None try: p = Popen(post_cmd, stdout = output_file, stderr = output_file) p.communicate() except OSError as ex: print_file(COMBINED_OUTPUT_FILE_NAME) print '#### ERROR Caught OSError `{0}`.'.format(ex) print '&&&& FAILED {0}'.format(BENCHMARK_NAME) exit(-1) print_file(COMBINED_OUTPUT_FILE_NAME) if p.returncode != 0: print '#### ERROR Process exited with code {0}.'.format(p.returncode) print '&&&& FAILED {0}'.format(BENCHMARK_NAME) exit(p.returncode) with open(COMBINED_OUTPUT_FILE_NAME) as input_file: reader = csv_dict_reader(input_file) variable_units = reader.next() # Get units header row. distinguishing_variables = reader.fieldnames measured_variables = [ ("STL Average Throughput", "+"), ("Thrust Average Throughput", "+") ] for record in reader: for variable, directionality in measured_variables: print "&&&& PERF {0}_{1}_{2}bit_{3}mib_{4} {5} {6}{7}".format( record["Algorithm"], record["Element Type"], record["Element Size"], record["Total Input Size"], variable.replace(" ", "_").lower(), record[variable], directionality, variable_units[variable] ) ############################################################################### print '&&&& PASSED {0}'.format(BENCHMARK_NAME) thrust-1.9.5/internal/scripts/refresh_from_github2.sh000077500000000000000000000056521344621116200230360ustar00rootroot00000000000000branch="master" while getopts "hb:c:" opt; do case $opt in h) echo "Usage: $0 [-h] [-b ] -c " exit 1 ;; b) branch=$OPTARG ;; c) changelist=$OPTARG ;; /?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;; :) echo "Option -$OPTARG requires an argument"; exit 1 ;; esac done if [ "$changelist" == "" ]; then echo "Missing required option -c to specify P4 changelist to put changed files into" exit 1 fi # Cause script to exit on any command that results in an error set -e echo "Downloading thrust code from the $branch branch into /tmp/thrust-${branch}" rm -rf /tmp/thrust-${branch} git clone -q git://github.com/thrust/thrust.git -b ${branch} /tmp/thrust-${branch} cd `dirname $0`/../.. echo "Changed current directory to `pwd`" vulcan_files=`echo *.vlcc *.vlct` logdir=`mktemp -d /tmp/tmp.XXXXXXXX` echo "Logging p4 command outputs to temporary directory $logdir" for i in *; do if [[ "$i" != "internal" && "$i" != "Makefile" ]]; then ii="$i"; if [ -d $i ]; then ii="$i/..."; fi echo "Reverting, force syncing, and then removing $ii" p4 revert $ii >> $logdir/$i.revert.log 2>&1 p4 sync -f $ii >> $logdir/$i.sync.log 2>&1 rm -rf $i fi done echo "Copying downloaded thrust code to p4 client" cp -R /tmp/thrust-${branch}/* . find . -name ".gitignore" | xargs -n 1 rm echo "Checking if version has been bumped" new_version=`grep "#define THRUST_VERSION" thrust/version.h | sed -e "s/#define THRUST_VERSION //"` old_version=`p4 print thrust/version.h | grep "#define THRUST_VERSION" | sed -e "s/#define THRUST_VERSION //"` if [ "$new_version" != "$old_version" ]; then p4 edit internal/test/version.gold new_version_print="$(( $new_version / 100000 )).$(( ($new_version / 100) % 1000 )).$(( $new_version % 100 ))" sed -e "s/v[0-9\.][0-9\.]*/v${new_version_print}/" internal/test/version.gold > internal/test/version.gold.tmp mv internal/test/version.gold.tmp internal/test/version.gold echo "Updated version.gold to version $new_version_print" else echo "Version has not changed" fi echo "Reconciling changed code into changelist $changelist" p4 reconcile -c $changelist ... >> $logdir/reconcile.log 2>&1 p4 revert -c $changelist Makefile $vulcan_files internal/... >> $logdir/internal_files_revert.log 2>&1 echo "Looking for examples that were added" for e in `find examples -name "*.cu"`; do if [ ! -e internal/build/`basename $e .cu`.mk ]; then echo "ADDED: `basename $e .cu`"; fi done echo "Looking for examples that were deleted or moved" for e in `find internal/build -name "*.mk"`; do ee=`basename $e .mk` case "$ee" in generic_example | unittester* | warningstester) continue;; esac if [ "`find examples -name $ee.cu`" == "" ]; then echo "DELETED: $ee"; fi; done thrust-1.9.5/internal/scripts/tounix000077500000000000000000000005241344621116200176370ustar00rootroot00000000000000#!/bin/bash # converts all files in the current directory with extensions .h .inl or .cu to unix format #find . -type f \( -name "*.h" -o -name "*.inl" -o -name "*.cu" \) -a \( -not -wholename "*\.hg/*" \) -print find . -type f \( -name "*.h" -o -name "*.inl" -o -name "*.cu" \) -a \( -not -wholename "*\.hg/*" \) -exec fromdos -d {} \; thrust-1.9.5/internal/scripts/wiki2tex.py000066400000000000000000000132401344621116200205020ustar00rootroot00000000000000''' Convert Google Code .wiki files into .tex formatted files. Output is designed to be included within a larger TeX project, it is not standalone. ''' import sys import re import codecs print(sys.argv) ''' A "rule" is a begin tag, an end tag, and how to reformat the inner text (function) ''' def encase(pre, post, strip=False): """Return a function that prepends pre and postpends post""" def f(txt): if strip: return pre + txt.strip() + post else: return pre + txt + post return f def constant(text): def f(txt): return text return f def encase_with_rules(pre, post, rules, strip=False): def f(txt): if strip: return pre + apply_rules(txt, rules).strip() + post else: return pre + apply_rules(txt, rules) + post return f def encase_escape_underscore(pre, post): def f(txt): txt = sub(r'_', r'\_', txt) return pre + txt + post return f def sub(pat, repl, txt): """Substitute in repl for pat in txt, txt can be multiple lines""" return re.compile(pat, re.MULTILINE).sub(repl, txt) def process_list(rules): def f(txt): txt = ' *' + txt # was removed to match begin tag of list res = '\\begin{itemize}\n' for ln in txt.split('\n'): # Convert " *" to "\item " ln = sub(r'^ \*', r'\\item ', ln) res += apply_rules(ln, rules) + '\n' res += '\\end{itemize}\n' return res return f def process_link(rules): def f(txt): lst = txt.split(' ') lnk = lst[0] desc = apply_rules(' '.join(lst[1:]), rules) if lnk[:7] == 'http://': desc = apply_rules(' '.join(lst[1:]), rules) return r'\href{' + lnk + r'}{' + desc + r'}' if len(lst) > 1: return r'\href{}{' + desc + r'}' return r'\href{}{' + lnk + r'}' return f # Some rules can be used inside some other rules (backticks in section names) link_rules = [ ['_', '', constant(r'\_')], ] section_rules = [ ['`', '`', encase_escape_underscore(r'\texttt{', r'}')], ] item_rules = [ ['`', '`', encase(r'\verb|', r'|')], ['[', ']', process_link(link_rules)], ] # Main rules for Latex formatting rules = [ ['{{{', '}}}', encase(r'\begin{lstlisting}[language=c++]', r'\end{lstlisting}')], ['[', ']', process_link(link_rules)], [' *', '\n\n', process_list(item_rules)], ['"', '"', encase("``", "''")], ['`', '`', encase(r'\verb|', r'|')], ['*', '*', encase(r'\emph{', r'}')], ['_', '_', encase(r'\emph{', r'}')], ['==', '==', encase_with_rules(r'\section{', r'}', section_rules, True)], ['=', '=', encase_with_rules(r'\chapter{', r'}', section_rules, True)], ['(e.g. f(x) -> y and f(x,y) -> ', 'z)', constant(r'(e.g. $f(x)\to y$ and $f(x,y)\to z$)')], ] def match_rules(txt, rules): """Find rule that first matches in txt""" # Find first begin tag first_begin_loc = 10e100 matching_rule = None for rule in rules: begin_tag, end_tag, func = rule loc = txt.find(begin_tag) if loc > -1 and loc < first_begin_loc: first_begin_loc = loc matching_rule = rule return (matching_rule, first_begin_loc) def apply_rules(txt, rules): """Apply set of rules to give txt, return transformed version of txt""" matching_rule, first_begin_loc = match_rules(txt, rules) if matching_rule is None: return txt begin_tag, end_tag, func = matching_rule end_loc = txt.find(end_tag, first_begin_loc + 1) if end_loc == -1: sys.exit('Could not find end tag {0} after position {1}'.format(end_tag, first_begin_loc + 1)) inner_txt = txt[first_begin_loc + len(begin_tag) : end_loc] # Copy characters up until begin tag # Then have output of rule function on inner text new_txt_start = txt[:first_begin_loc] + func(inner_txt) # Follow with the remaining processed text remaining_txt = txt[end_loc + len(end_tag):] return new_txt_start + apply_rules(remaining_txt, rules) def split_sections(contents): """Given one string of all file contents, return list of sections Return format is list of pairs, each pair has section title and list of lines. Result is ordered as the original input. """ res = [] cur_section = '' section = [] for ln in contents.split('\n'): if len(ln) > 0 and ln[0] == '=': # remove = formatting from line section_title = sub(r'^\=+ (.*) \=+', r'\1', ln) res.append((cur_section, section)) cur_section = section_title section = [ln] else: section.append(ln) res.append((cur_section, section)) return res def filter_sections(splitinput, removelst): """Take split input and remove sections in removelst""" res = [] for sectname, sectcontents in splitinput: if sectname in removelst: pass else: res.extend(sectcontents) # convert to single string for output return '\n'.join(res) def main(): infile = codecs.open(sys.argv[1], encoding='utf-8') outfile = codecs.open(sys.argv[2], mode='w', encoding='utf-8') contents = infile.read() # Remove first three lines contents = '\n'.join(contents.split('\n')[3:]) # Split sections and filter out some of them sections = split_sections(contents) contents = filter_sections(sections, ['Introduction', 'Prerequisites', 'Simple Example']) # Convert to latex format contents = apply_rules(contents, rules) infile.close() outfile.write(contents) outfile.close() return 0 if __name__ == '__main__': sys.exit(main()) thrust-1.9.5/internal/test/000077500000000000000000000000001344621116200156525ustar00rootroot00000000000000thrust-1.9.5/internal/test/dvstest.lst000077500000000000000000000300351344621116200200760ustar00rootroot00000000000000TestAdjacentDifference TestAdjacentDifferenceDiscardIterator TestAdjacentDifferenceDispatchExplicit TestAdjacentDifferenceDispatchImplicit TestAdjacentDifferenceInPlaceWithRelatedIteratorTypes TestAdjacentDifferenceSimpleDevice TestAdjacentDifferenceSimpleHost TestAllOfDevice TestAllOfDispatchExplicit TestAllOfDispatchImplicit TestAllOfHost TestAnyOfDevice TestAnyOfDispatchExplicit TestAnyOfDispatchImplicit TestAnyOfHost TestComputeCapability TestCopyConstantIteratorToZipIteratorDevice TestCopyConstantIteratorToZipIteratorHost TestCopyCountingIteratorDevice TestCopyCountingIteratorHost TestCopyDispatchExplicit TestCopyDispatchImplicit TestCopyFromConstIterator TestCopyIf TestCopyIfDispatchExplicit TestCopyIfDispatchImplicit TestCopyIfSimpleDevice TestCopyIfSimpleHost TestCopyIfStencil TestCopyIfStencilDispatchExplicit TestCopyIfStencilDispatchImplicit TestCopyIfStencilSimpleDevice TestCopyIfStencilSimpleHost TestCopyListToDevice TestCopyListToHost TestCopyMatchingTypesDevice TestCopyMatchingTypesHost TestCopyMixedTypesDevice TestCopyMixedTypesHost TestCopyToDiscardIterator TestCopyToDiscardIteratorZipped TestCopyVectorBool TestCopyZipIteratorDevice TestCopyZipIteratorHost TestCount TestCountDispatchExplicit TestCountDispatchImplicit TestCountFromConstIteratorSimpleDevice TestCountFromConstIteratorSimpleHost TestCountIf TestCountIfSimpleDevice TestCountIfSimpleHost TestCountSimpleDevice TestCountSimpleHost TestFill TestFillDiscardIterator TestFillDispatchExplicit TestFillDispatchImplicit TestFillMixedTypesDevice TestFillMixedTypesHost TestFillN TestFillNDiscardIterator TestFillNDispatchExplicit TestFillNDispatchImplicit TestFillNMixedTypesDevice TestFillNMixedTypesHost TestFillNSimpleDevice TestFillNSimpleHost TestFillSimpleDevice TestFillSimpleHost TestFillTuple TestFillWithNonTrivialAssignment TestFillWithTrivialAssignment TestFillZipIteratorDevice TestFillZipIteratorHost TestForEach TestForEachDispatchExplicit TestForEachDispatchImplicit TestForEachN TestForEachNDispatchExplicit TestForEachNDispatchImplicit TestForEachNSimpleAnySystem TestForEachNSimpleDevice TestForEachNSimpleHost TestForEachNWithLargeTypes TestForEachSimpleAnySystem TestForEachSimpleDevice TestForEachSimpleHost TestForEachWithLargeTypes TestGather TestGatherCountingIteratorDevice TestGatherCountingIteratorHost TestGatherDispatchExplicit TestGatherDispatchImplicit TestGatherIf TestGatherIfDispatchExplicit TestGatherIfDispatchImplicit TestGatherIfSimpleDevice TestGatherIfSimpleHost TestGatherIfToDiscardIterator TestGatherSimpleDevice TestGatherSimpleHost TestGatherToDiscardIterator TestGenerate TestGenerateDispatchExplicit TestGenerateDispatchImplicit TestGenerateNDispatchExplicit TestGenerateNDispatchImplicit TestGenerateNSimpleDevice TestGenerateNSimpleHost TestGenerateNToDiscardIterator TestGenerateSimpleDevice TestGenerateSimpleHost TestGenerateToDiscardIterator TestGenerateTuple TestGenerateZipIteratorDevice TestGenerateZipIteratorHost TestInnerProduct TestInnerProductDispatchExplicit TestInnerProductDispatchImplicit TestInnerProductSimpleDevice TestInnerProductSimpleHost TestInnerProductWithOperatorDevice TestInnerProductWithOperatorHost TestIsCommutative TestIsPlainOldData TestIsTrivialIterator TestMaxActiveBlocks TestMaxBlocksizeWithHighestOccupancy TestMaxElement TestMaxElementDispatchExplicit TestMaxElementDispatchImplicit TestMaxElementSimpleDevice TestMaxElementSimpleHost TestMerge TestMergeDescending TestMergeDispatchExplicit TestMergeDispatchImplicit TestMergeKeyValue TestMergeKeyValueDescending TestMergeSimpleDevice TestMergeSimpleHost TestMergeToDiscardIterator TestMinElement TestMinElementDispatchExplicit TestMinElementDispatchImplicit TestMinElementSimpleDevice TestMinElementSimpleHost TestMinMaxElement TestMinMaxElementDispatchExplicit TestMinMaxElementDispatchImplicit TestMinMaxElementSimpleDevice TestMinMaxElementSimpleHost TestNoneOfDevice TestNoneOfDispatchExplicit TestNoneOfDispatchImplicit TestNoneOfHost TestPartition TestPartitionCopy TestPartitionCopyDispatchExplicit TestPartitionCopyDispatchImplicit TestPartitionCopySimpleDevice TestPartitionCopySimpleHost TestPartitionCopyStencil TestPartitionCopyStencilDispatchExplicit TestPartitionCopyStencilDispatchImplicit TestPartitionCopyStencilSimpleDevice TestPartitionCopyStencilSimpleHost TestPartitionCopyStencilToDiscardIterator TestPartitionCopyToDiscardIterator TestPartitionDispatchExplicit TestPartitionDispatchImplicit TestPartitionPointDevice TestPartitionPointDispatchExplicit TestPartitionPointDispatchImplicit TestPartitionPointHost TestPartitionPointSimpleDevice TestPartitionPointSimpleHost TestPartitionSimpleDevice TestPartitionSimpleHost TestPartitionStencil TestPartitionStencilDispatchExplicit TestPartitionStencilDispatchImplicit TestPartitionStencilSimpleDevice TestPartitionStencilSimpleHost TestPartitionStencilZipIteratorDevice TestPartitionStencilZipIteratorHost TestPartitionZipIteratorDevice TestPartitionZipIteratorHost TestRadixSort TestRadixSortByKey TestRadixSortKeySimple TestRadixSortKeyValueSimple TestReduce TestReduceByKey TestReduceByKeyDispatchExplicit TestReduceByKeyDispatchImplicit TestReduceByKeySimpleDevice TestReduceByKeySimpleHost TestReduceByKeyToDiscardIterator TestReduceCountingIterator TestReduceDispatchExplicit TestReduceDispatchImplicit TestReduceMixedTypesDevice TestReduceMixedTypesHost TestReduceSimpleDevice TestReduceSimpleHost TestReduceWithIndirectionDevice TestReduceWithIndirectionHost TestReduceWithOperator TestRemove TestRemoveCopy TestRemoveCopyDispatchExplicit TestRemoveCopyDispatchImplicit TestRemoveCopyIf TestRemoveCopyIfDispatchExplicit TestRemoveCopyIfDispatchImplicit TestRemoveCopyIfSimpleDevice TestRemoveCopyIfSimpleHost TestRemoveCopyIfStencil TestRemoveCopyIfStencilDispatchExplicit TestRemoveCopyIfStencilDispatchImplicit TestRemoveCopyIfStencilSimpleDevice TestRemoveCopyIfStencilSimpleHost TestRemoveCopyIfStencilToDiscardIterator TestRemoveCopyIfToDiscardIterator TestRemoveCopySimpleDevice TestRemoveCopySimpleHost TestRemoveCopyToDiscardIterator TestRemoveCopyToDiscardIteratorZipped TestRemoveDispatchExplicit TestRemoveDispatchImplicit TestRemoveIf TestRemoveIfDispatchExplicit TestRemoveIfDispatchImplicit TestRemoveIfSimpleDevice TestRemoveIfSimpleHost TestRemoveIfStencil TestRemoveIfStencilDispatchExplicit TestRemoveIfStencilDispatchImplicit TestRemoveIfStencilSimpleDevice TestRemoveIfStencilSimpleHost TestRemoveSimpleDevice TestRemoveSimpleHost TestReplace TestReplaceCopy TestReplaceCopyDispatchExplicit TestReplaceCopyDispatchImplicit TestReplaceCopyIf TestReplaceCopyIfDispatchExplicit TestReplaceCopyIfDispatchImplicit TestReplaceCopyIfSimpleDevice TestReplaceCopyIfSimpleHost TestReplaceCopyIfStencil TestReplaceCopyIfStencilDispatchExplicit TestReplaceCopyIfStencilDispatchImplicit TestReplaceCopyIfStencilSimpleDevice TestReplaceCopyIfStencilSimpleHost TestReplaceCopyIfStencilToDiscardIterator TestReplaceCopyIfToDiscardIterator TestReplaceCopySimpleDevice TestReplaceCopySimpleHost TestReplaceCopyToDiscardIterator TestReplaceDispatchExplicit TestReplaceDispatchImplicit TestReplaceIf TestReplaceIfDispatchExplicit TestReplaceIfDispatchImplicit TestReplaceIfSimpleDevice TestReplaceIfSimpleHost TestReplaceIfStencil TestReplaceIfStencilDispatchExplicit TestReplaceIfStencilDispatchImplicit TestReplaceIfStencilSimpleDevice TestReplaceIfStencilSimpleHost TestReplaceSimpleDevice TestReplaceSimpleHost TestReverse TestReverseCopy TestReverseCopyDispatchExplicit TestReverseCopyDispatchImplicit TestReverseCopySimpleDevice TestReverseCopySimpleHost TestReverseCopyToDiscardIterator TestReverseDispatchExplicit TestReverseDispatchImplicit TestReverseSimpleDevice TestReverseSimpleHost TestSetIntersection TestSetIntersectionDispatchExplicit TestSetIntersectionDispatchImplicit TestSetIntersectionEquivalentRanges TestSetIntersectionMultiset TestSetIntersectionSimpleDevice TestSetIntersectionSimpleHost TestSetIntersectionToDiscardIterator TestSetSymmetricDifference TestSetSymmetricDifferenceDispatchExplicit TestSetSymmetricDifferenceDispatchImplicit TestSetSymmetricDifferenceEquivalentRanges TestSetSymmetricDifferenceKeyValue TestSetSymmetricDifferenceMultiset TestSetSymmetricDifferenceSimpleDevice TestSetSymmetricDifferenceSimpleHost TestSetUnion TestSetUnionDispatchExplicit TestSetUnionDispatchImplicit TestSetUnionSimpleDevice TestSetUnionSimpleHost TestSetUnionToDiscardIterator TestSetUnionWithEquivalentElementsSimpleDevice TestSetUnionWithEquivalentElementsSimpleHost TestStablePartition TestStablePartitionCopy TestStablePartitionCopyDispatchExplicit TestStablePartitionCopyDispatchImplicit TestStablePartitionCopySimpleDevice TestStablePartitionCopySimpleHost TestStablePartitionCopyStencil TestStablePartitionCopyStencilDispatchExplicit TestStablePartitionCopyStencilDispatchImplicit TestStablePartitionCopyStencilSimpleDevice TestStablePartitionCopyStencilSimpleHost TestStablePartitionCopyStencilToDiscardIterator TestStablePartitionCopyToDiscardIterator TestStablePartitionDispatchExplicit TestStablePartitionDispatchImplicit TestStablePartitionSimpleDevice TestStablePartitionSimpleHost TestStablePartitionStencil TestStablePartitionStencilDispatchExplicit TestStablePartitionStencilDispatchImplicit TestStablePartitionStencilSimpleDevice TestStablePartitionStencilSimpleHost TestStablePartitionStencilZipIteratorDevice TestStablePartitionStencilZipIteratorHost TestStablePartitionZipIteratorDevice TestStablePartitionZipIteratorHost TestTransformBinary TestTransformBinaryCountingIterator TestTransformBinaryDispatchExplicit TestTransformBinaryDispatchImplicit TestTransformBinarySimpleDevice TestTransformBinarySimpleHost TestTransformBinaryToDiscardIterator TestTransformExclusiveScanDispatchExplicit TestTransformExclusiveScanDispatchImplicit TestTransformIfBinary TestTransformIfBinaryDispatchExplicit TestTransformIfBinaryDispatchImplicit TestTransformIfBinarySimpleDevice TestTransformIfBinarySimpleHost TestTransformIfBinaryToDiscardIterator TestTransformIfUnary TestTransformIfUnaryDispatchExplicit TestTransformIfUnaryDispatchImplicit TestTransformIfUnaryNoStencil TestTransformIfUnaryNoStencilDispatchExplicit TestTransformIfUnaryNoStencilDispatchImplicit TestTransformIfUnaryNoStencilSimpleDevice TestTransformIfUnaryNoStencilSimpleHost TestTransformIfUnarySimpleDevice TestTransformIfUnarySimpleHost TestTransformIfUnaryToDiscardIterator TestTransformInclusiveScanDispatchExplicit TestTransformInclusiveScanDispatchImplicit TestTransformScan TestTransformScanCountingIteratorDevice TestTransformScanCountingIteratorHost TestTransformScanSimpleDevice TestTransformScanSimpleHost TestTransformScanToDiscardIterator TestTransformUnary TestTransformUnaryCountingIterator TestTransformUnaryDispatchExplicit TestTransformUnaryDispatchImplicit TestTransformUnarySimpleDevice TestTransformUnarySimpleHost TestTransformUnaryToDiscardIterator TestTransformUnaryToDiscardIteratorZipped TestTransformWithIndirectionDevice TestTransformWithIndirectionHost TestUnique TestUniqueByKey TestUniqueByKeyCopyDispatchExplicit TestUniqueByKeyCopyDispatchImplicit TestUniqueByKeyDispatchExplicit TestUniqueByKeyDispatchImplicit TestUniqueByKeySimpleDevice TestUniqueByKeySimpleHost TestUniqueCopy TestUniqueCopyByKey TestUniqueCopyByKeySimpleDevice TestUniqueCopyByKeySimpleHost TestUniqueCopyByKeyToDiscardIterator TestUniqueCopyDispatchExplicit TestUniqueCopyDispatchImplicit TestUniqueCopySimpleDevice TestUniqueCopySimpleHost TestUniqueCopyToDiscardIterator TestUniqueDispatchExplicit TestUniqueDispatchImplicit TestUniqueSimpleDevice TestUniqueSimpleHost TestUnknownDeviceRobustness TestVectorBinarySearch TestVectorBinarySearchDiscardIterator TestVectorBinarySearchDispatchExplicit TestVectorBinarySearchDispatchImplicit TestVectorBinarySearchSimpleDevice TestVectorBinarySearchSimpleHost TestVectorCppZeroSizeDevice TestVectorCppZeroSizeHost TestVectorLowerBound TestVectorLowerBoundDiscardIterator TestVectorLowerBoundDispatchExplicit TestVectorLowerBoundDispatchImplicit TestVectorLowerBoundSimpleDevice TestVectorLowerBoundSimpleHost TestVectorUpperBound TestVectorUpperBoundDiscardIterator TestVectorUpperBoundDispatchExplicit TestVectorUpperBoundDispatchImplicit TestVectorUpperBoundSimpleDevice TestVectorUpperBoundSimpleHost thrust-1.9.5/internal/test/thrust.example.arbitrary_transformation.filecheck000066400000000000000000000002071344621116200276170ustar00rootroot00000000000000 CHECK: 3 + 6 * 2 = 15 CHECK-NEXT: 4 + 7 * 5 = 39 CHECK-NEXT: 0 + 2 * 7 = 14 CHECK-NEXT: 8 + 1 * 4 = 12 CHECK-NEXT: 2 + 8 * 3 = 26 thrust-1.9.5/internal/test/thrust.example.basic_vector.filecheck000066400000000000000000000002721344621116200251370ustar00rootroot00000000000000 CHECK: H has size 4 CHECK-NEXT: H[0] = 14 CHECK-NEXT: H[1] = 20 CHECK-NEXT: H[2] = 38 CHECK-NEXT: H[3] = 46 CHECK-NEXT: H now has size 2 CHECK-NEXT: D[0] = 99 CHECK-NEXT: D[1] = 88 thrust-1.9.5/internal/test/thrust.example.bounding_box.filecheck000066400000000000000000000001011344621116200251400ustar00rootroot00000000000000 CHECK: bounding box (0.000022,0.037300) (0.967956,0.995085) thrust-1.9.5/internal/test/thrust.example.bucket_sort2d.filecheck000066400000000000000000000033571344621116200252550ustar00rootroot00000000000000 CHECK: bucket (150, 50)'s list of points: CHECK-NEXT: (0.751041,0.505377) CHECK-NEXT: (0.750647,0.505272) CHECK-NEXT: (0.752243,0.509601) CHECK-NEXT: (0.750937,0.503519) CHECK-NEXT: (0.753879,0.506217) CHECK-NEXT: (0.754956,0.501953) CHECK-NEXT: (0.754439,0.502353) CHECK-NEXT: (0.754128,0.501410) CHECK-NEXT: (0.750917,0.502195) CHECK-NEXT: (0.754024,0.507150) CHECK-NEXT: (0.750565,0.502896) CHECK-NEXT: (0.753444,0.509374) CHECK-NEXT: (0.754874,0.506500) CHECK-NEXT: (0.754646,0.508721) CHECK-NEXT: (0.753527,0.504378) CHECK-NEXT: (0.754563,0.502366) CHECK-NEXT: (0.751227,0.502014) CHECK-NEXT: (0.753009,0.508329) CHECK-NEXT: (0.752284,0.500607) CHECK-NEXT: (0.753341,0.503853) CHECK-NEXT: (0.751787,0.501364) CHECK-NEXT: (0.750171,0.500588) CHECK-NEXT: (0.752243,0.501621) CHECK-NEXT: (0.752056,0.509570) CHECK-NEXT: (0.752263,0.507172) CHECK-NEXT: (0.754024,0.501935) CHECK-NEXT: (0.751538,0.500686) CHECK-NEXT: (0.754024,0.508004) CHECK-NEXT: (0.750358,0.506688) CHECK-NEXT: (0.751083,0.505733) CHECK-NEXT: (0.750150,0.505805) CHECK-NEXT: (0.750585,0.505232) CHECK-NEXT: (0.753838,0.508040) CHECK-NEXT: (0.750461,0.501308) CHECK-NEXT: (0.753527,0.501546) CHECK-NEXT: (0.751145,0.508224) CHECK-NEXT: (0.751953,0.506566) CHECK-NEXT: (0.750378,0.502955) CHECK-NEXT: (0.751704,0.507102) CHECK-NEXT: (0.754646,0.502674) CHECK-NEXT: (0.750772,0.501464) CHECK-NEXT: (0.752325,0.502761) CHECK-NEXT: (0.752408,0.502305) CHECK-NEXT: (0.751000,0.508639) CHECK-NEXT: (0.754252,0.506525) CHECK-NEXT: (0.753175,0.504877) CHECK-NEXT: (0.753071,0.502682) CHECK-NEXT: (0.750109,0.503627) CHECK-NEXT: (0.754936,0.506406) CHECK-NEXT: (0.754521,0.500953) CHECK-NEXT: (0.753941,0.509584) CHECK-NEXT: (0.754915,0.504699) CHECK-NEXT: (0.751476,0.509525) CHECK-NEXT: (0.752823,0.507129) thrust-1.9.5/internal/test/thrust.example.constant_iterator.filecheck000066400000000000000000000000741344621116200262360ustar00rootroot00000000000000 CHECK: 13 CHECK-NEXT: 17 CHECK-NEXT: 12 CHECK-NEXT: 15 thrust-1.9.5/internal/test/thrust.example.counting_iterator.filecheck000066400000000000000000000001471344621116200262340ustar00rootroot00000000000000 CHECK: found 4 nonzero values at indices: CHECK-NEXT: 1 CHECK-NEXT: 2 CHECK-NEXT: 5 CHECK-NEXT: 7 thrust-1.9.5/internal/test/thrust.example.cuda.async_reduce.filecheck000066400000000000000000000000001344621116200260400ustar00rootroot00000000000000thrust-1.9.5/internal/test/thrust.example.cuda.custom_temporary_allocation.filecheck000066400000000000000000000020301344621116200312220ustar00rootroot00000000000000 CHECK: cached_allocator::allocate(): num_bytes == {{[0-9]+}} CHECK-NEXT: cached_allocator::allocate(): allocating new block CHECK-NEXT: cached_allocator::deallocate(): ptr == {{(0x)?}}{{[0-9a-z]+}} CHECK-NEXT: cached_allocator::allocate(): num_bytes == {{[0-9]+}} CHECK-NEXT: cached_allocator::allocate(): found a free block CHECK-NEXT: cached_allocator::deallocate(): ptr == {{(0x)?}}{{[0-9a-z]+}} CHECK-NEXT: cached_allocator::allocate(): num_bytes == {{[0-9]+}} CHECK-NEXT: cached_allocator::allocate(): found a free block CHECK-NEXT: cached_allocator::deallocate(): ptr == {{(0x)?}}{{[0-9a-z]+}} CHECK-NEXT: cached_allocator::allocate(): num_bytes == {{[0-9]+}} CHECK-NEXT: cached_allocator::allocate(): found a free block CHECK-NEXT: cached_allocator::deallocate(): ptr == {{(0x)?}}{{[0-9a-z]+}} CHECK-NEXT: cached_allocator::allocate(): num_bytes == {{[0-9]+}} CHECK-NEXT: cached_allocator::allocate(): found a free block CHECK-NEXT: cached_allocator::deallocate(): ptr == {{(0x)?}}{{[0-9a-z]+}} CHECK-NEXT: cached_allocator::free_all() thrust-1.9.5/internal/test/thrust.example.cuda.fallback_allocator.filecheck000066400000000000000000000007111344621116200272040ustar00rootroot00000000000000 CHECK: Testing fallback_allocator on device CHECK-SAME: with {{[0-9]+}} bytes of device memory CHECK: attempting to sort {{[0-9]+}} values CHECK: allocated {{[0-9]+}} bytes of device memory CHECK: allocated {{[0-9]+}} bytes of device memory CHECK: attempting to sort {{[0-9]+}} values CHECK: allocated {{[0-9]+}} bytes of device memory CHECK: allocated {{[0-9]+}} bytes of pinned host memory (fallback successful) thrust-1.9.5/internal/test/thrust.example.cuda.range_view.filecheck000066400000000000000000000001211344621116200255260ustar00rootroot00000000000000 CHECK: z[0]= 7 CHECK-NEXT: z[1]= 8 CHECK-NEXT: z[2]= 9 CHECK-NEXT: z[3]= 10 thrust-1.9.5/internal/test/thrust.example.cuda.unwrap_pointer.filecheck000066400000000000000000000000001344621116200264500ustar00rootroot00000000000000thrust-1.9.5/internal/test/thrust.example.cuda.wrap_pointer.filecheck000066400000000000000000000000001344621116200261050ustar00rootroot00000000000000thrust-1.9.5/internal/test/thrust.example.device_ptr.filecheck000066400000000000000000000001141344621116200246130ustar00rootroot00000000000000 CHECK: device array contains 10 values CHECK-NEXT: sum of values is 45 thrust-1.9.5/internal/test/thrust.example.discrete_voronoi.filecheck000066400000000000000000000005621344621116200260530ustar00rootroot00000000000000 CHECK: [Inititialize {{[0-9]+}}x{{[0-9]+}} Image] CHECK-NEXT: ( {{[0-9.]+}}ms ) CHECK-NEXT: [Copy to Device] CHECK-NEXT: ( {{[0-9.]+}}ms ) CHECK-NEXT: [JFA stepping] CHECK-NEXT: ( {{[0-9.]+}}ms ) CHECK-NEXT: ( {{[0-9.]+}} MPixel/s ) CHECK-NEXT: [Device to Host Copy] CHECK-NEXT: ( {{[0-9.]+}}ms ) CHECK-NEXT: [PGM Export] CHECK-NEXT: ( {{[0-9.]+}}ms ) thrust-1.9.5/internal/test/thrust.example.dot_products_with_zip.filecheck000066400000000000000000000005141344621116200271210ustar00rootroot00000000000000 CHECK: (0.000022,0.000022,0.000022) * (0.000022,0.000022,0.000022) = 0.000000 CHECK-NEXT: (0.085032,0.085032,0.085032) * (0.085032,0.085032,0.085032) = 0.021692 CHECK-NEXT: (0.601353,0.601353,0.601353) * (0.601353,0.601353,0.601353) = 1.084875 CHECK-NEXT: (0.891611,0.891611,0.891611) * (0.891611,0.891611,0.891611) = 2.384912 thrust-1.9.5/internal/test/thrust.example.expand.filecheck000066400000000000000000000003041344621116200237470ustar00rootroot00000000000000 CHECK: Expanding values according to counts CHECK-NEXT: counts 3 5 2 0 1 3 4 2 4 CHECK-NEXT: values 1 2 3 4 5 6 7 8 9 CHECK-NEXT: output 1 1 1 2 2 2 2 2 3 3 5 6 6 6 7 7 7 7 8 8 9 9 9 9 thrust-1.9.5/internal/test/thrust.example.fill_copy_sequence.filecheck000066400000000000000000000003221344621116200263400ustar00rootroot00000000000000 CHECK: D[0] = 0 CHECK-NEXT: D[1] = 1 CHECK-NEXT: D[2] = 2 CHECK-NEXT: D[3] = 3 CHECK-NEXT: D[4] = 4 CHECK-NEXT: D[5] = 9 CHECK-NEXT: D[6] = 9 CHECK-NEXT: D[7] = 1 CHECK-NEXT: D[8] = 1 CHECK-NEXT: D[9] = 1 thrust-1.9.5/internal/test/thrust.example.histogram.filecheck000066400000000000000000000013511344621116200244700ustar00rootroot00000000000000 CHECK: Dense Histogram CHECK-NEXT: initial data 3 4 3 5 8 5 6 6 4 4 5 3 2 5 6 3 1 3 2 3 6 5 3 3 3 2 4 2 3 3 2 5 5 5 8 2 5 6 6 3 CHECK-NEXT: sorted data 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 8 8 CHECK-NEXT: cumulative histogram 0 1 7 19 23 32 38 38 40 CHECK-NEXT: histogram 0 1 6 12 4 9 6 0 2 CHECK-NEXT: Sparse Histogram CHECK-NEXT: initial data 3 4 3 5 8 5 6 6 4 4 5 3 2 5 6 3 1 3 2 3 6 5 3 3 3 2 4 2 3 3 2 5 5 5 8 2 5 6 6 3 CHECK-NEXT: sorted data 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 8 8 CHECK-NEXT: histogram values 1 2 3 4 5 6 8 CHECK-NEXT: histogram counts 1 6 12 4 9 6 2 thrust-1.9.5/internal/test/thrust.example.lambda.filecheck000066400000000000000000000004321344621116200237120ustar00rootroot00000000000000 CHECK: SAXPY (functor method) CHECK-NEXT: 2 * 1 + 1 = 3 CHECK-NEXT: 2 * 2 + 1 = 5 CHECK-NEXT: 2 * 3 + 1 = 7 CHECK-NEXT: 2 * 4 + 1 = 9 CHECK-NEXT: SAXPY (placeholder method) CHECK-NEXT: 2 * 1 + 1 = 3 CHECK-NEXT: 2 * 2 + 1 = 5 CHECK-NEXT: 2 * 3 + 1 = 7 CHECK-NEXT: 2 * 4 + 1 = 9 thrust-1.9.5/internal/test/thrust.example.lexicographical_sort.filecheck000066400000000000000000000015221344621116200267000ustar00rootroot00000000000000 CHECK: Unsorted Keys CHECK-NEXT: (0,2,6) CHECK-NEXT: (0,4,4) CHECK-NEXT: (6,8,5) CHECK-NEXT: (8,6,8) CHECK-NEXT: (9,9,4) CHECK-NEXT: (1,9,7) CHECK-NEXT: (5,1,0) CHECK-NEXT: (3,8,1) CHECK-NEXT: (2,9,2) CHECK-NEXT: (7,2,7) CHECK-NEXT: (0,9,0) CHECK-NEXT: (5,4,1) CHECK-NEXT: (5,3,6) CHECK-NEXT: (8,5,5) CHECK-NEXT: (5,3,7) CHECK-NEXT: (5,7,3) CHECK-NEXT: (8,6,4) CHECK-NEXT: (9,5,4) CHECK-NEXT: (7,5,9) CHECK-NEXT: (9,0,9) CHECK-NEXT: Sorted Keys CHECK-NEXT: (0,2,6) CHECK-NEXT: (0,4,4) CHECK-NEXT: (0,9,0) CHECK-NEXT: (1,9,7) CHECK-NEXT: (2,9,2) CHECK-NEXT: (3,8,1) CHECK-NEXT: (5,1,0) CHECK-NEXT: (5,3,6) CHECK-NEXT: (5,3,7) CHECK-NEXT: (5,4,1) CHECK-NEXT: (5,7,3) CHECK-NEXT: (6,8,5) CHECK-NEXT: (7,2,7) CHECK-NEXT: (7,5,9) CHECK-NEXT: (8,5,5) CHECK-NEXT: (8,6,4) CHECK-NEXT: (8,6,8) CHECK-NEXT: (9,0,9) CHECK-NEXT: (9,5,4) CHECK-NEXT: (9,9,4) thrust-1.9.5/internal/test/thrust.example.max_abs_diff.filecheck000066400000000000000000000000531344621116200250730ustar00rootroot00000000000000 CHECK: maximum absolute difference: 4 thrust-1.9.5/internal/test/thrust.example.minimal_custom_backend.filecheck000066400000000000000000000000631344621116200271610ustar00rootroot00000000000000 CHECK: Hello, world from for_each(my_system)! thrust-1.9.5/internal/test/thrust.example.minmax.filecheck000066400000000000000000000001401344621116200237570ustar00rootroot00000000000000 CHECK: [ 10 17 64 90 97 27 56 45 33 76 ] CHECK-NEXT: minimum = 10 CHECK-NEXT: maximum = 97 thrust-1.9.5/internal/test/thrust.example.mode.filecheck000066400000000000000000000005251344621116200234210ustar00rootroot00000000000000 CHECK: initial data CHECK-NEXT: 0 0 6 8 9 1 5 3 2 7 0 5 5 8 5 5 8 9 7 9 2 4 8 6 9 9 1 8 9 2 CHECK-NEXT: sorted data CHECK-NEXT: 0 0 0 1 1 2 2 2 3 4 5 5 5 5 5 6 6 7 7 8 8 8 8 8 9 9 9 9 9 9 CHECK-NEXT: values CHECK-NEXT: 0 1 2 3 4 5 6 7 8 9 CHECK-NEXT: counts CHECK-NEXT: 3 2 3 1 1 5 2 2 5 6 CHECK-NEXT: Modal value 9 occurs 6 times thrust-1.9.5/internal/test/thrust.example.monte_carlo.filecheck000066400000000000000000000000451344621116200247740ustar00rootroot00000000000000 CHECK: pi is approximately 3.14 thrust-1.9.5/internal/test/thrust.example.monte_carlo_disjoint_sequences.filecheck000066400000000000000000000000411344621116200307460ustar00rootroot00000000000000 CHECK: pi is around 3.14151 thrust-1.9.5/internal/test/thrust.example.mr_basic.filecheck000066400000000000000000000000001344621116200242400ustar00rootroot00000000000000thrust-1.9.5/internal/test/thrust.example.norm.filecheck000066400000000000000000000000341344621116200234430ustar00rootroot00000000000000 CHECK: norm is 5.47723 thrust-1.9.5/internal/test/thrust.example.padded_grid_reduction.filecheck000066400000000000000000000025721344621116200270030ustar00rootroot00000000000000 CHECK: padded grid CHECK-NEXT: 0.2775 0.7256 0.6979 0.9412 0.4131 0.7202 0.3765 0.4136 0.5766 0.6612 0.4672 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.0137 0.6256 0.1003 0.2374 0.0915 0.0455 0.3187 0.0839 0.8173 0.7281 0.5975 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.2990 0.2693 0.4408 0.1262 0.3812 0.8537 0.9962 0.7528 0.9272 0.7873 0.8984 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.3529 0.5803 0.8900 0.4505 0.0477 0.2683 0.8613 0.0877 0.2438 0.4363 0.6292 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.4561 0.7896 0.6662 0.4988 0.4404 0.6277 0.5752 0.6816 0.1240 0.5018 0.8027 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.9527 0.5223 0.9500 0.2376 0.0110 0.7803 0.6221 0.2488 0.7006 0.6347 0.9137 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.0027 0.4972 0.7421 0.4674 0.8961 0.2355 0.9507 0.9211 0.1650 0.4517 0.7143 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.8649 0.2082 0.8464 0.2547 0.4789 0.9534 0.0403 0.6872 0.8964 0.3910 0.2292 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.9017 0.1525 0.9041 0.1460 0.1646 0.3839 0.6994 0.0900 0.1671 0.2587 0.5893 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK-NEXT: 0.9075 0.2186 0.4626 0.8713 0.7073 0.1520 0.9495 0.4137 0.6746 0.7064 0.5609 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 CHECK: minimum value: 0.0027 CHECK-NEXT: maximum value: 0.9962 thrust-1.9.5/internal/test/thrust.example.permutation_iterator.filecheck000066400000000000000000000000271344621116200267520ustar00rootroot00000000000000 CHECK: sum is 130 thrust-1.9.5/internal/test/thrust.example.raw_reference_cast.filecheck000066400000000000000000000002411344621116200263110ustar00rootroot00000000000000 CHECK: Before A->B Copy CHECK-NEXT: A: 0 1 2 3 4 CHECK-NEXT: B: 0 0 0 0 0 CHECK-NEXT: After A->B Copy CHECK-NEXT: A: 0 1 2 3 4 CHECK-NEXT: B: 0 1 2 3 4 thrust-1.9.5/internal/test/thrust.example.remove_points2d.filecheck000066400000000000000000000022261344621116200256140ustar00rootroot00000000000000 CHECK: Generated 20 points CHECK-NEXT: (0.000022,0.085032) CHECK-NEXT: (0.601353,0.891611) CHECK-NEXT: (0.967956,0.189690) CHECK-NEXT: (0.514976,0.398008) CHECK-NEXT: (0.262906,0.743512) CHECK-NEXT: (0.089548,0.560390) CHECK-NEXT: (0.582230,0.809567) CHECK-NEXT: (0.591919,0.511713) CHECK-NEXT: (0.876634,0.995085) CHECK-NEXT: (0.726212,0.966611) CHECK-NEXT: (0.297102,0.426051) CHECK-NEXT: (0.899498,0.652999) CHECK-NEXT: (0.901534,0.961533) CHECK-NEXT: (0.164713,0.857987) CHECK-NEXT: (0.906845,0.294026) CHECK-NEXT: (0.936244,0.414645) CHECK-NEXT: (0.308457,0.514893) CHECK-NEXT: (0.395430,0.789785) CHECK-NEXT: (0.689141,0.544273) CHECK-NEXT: (0.592407,0.093630) CHECK: After stream compaction, 14 points remain CHECK-NEXT: (0.000022,0.085032) CHECK-NEXT: (0.967956,0.189690) CHECK-NEXT: (0.514976,0.398008) CHECK-NEXT: (0.262906,0.743512) CHECK-NEXT: (0.089548,0.560390) CHECK-NEXT: (0.582230,0.809567) CHECK-NEXT: (0.591919,0.511713) CHECK-NEXT: (0.297102,0.426051) CHECK-NEXT: (0.164713,0.857987) CHECK-NEXT: (0.906845,0.294026) CHECK-NEXT: (0.308457,0.514893) CHECK-NEXT: (0.395430,0.789785) CHECK-NEXT: (0.689141,0.544273) CHECK-NEXT: (0.592407,0.093630) thrust-1.9.5/internal/test/thrust.example.repeated_range.filecheck000066400000000000000000000002261344621116200254400ustar00rootroot00000000000000 CHECK: range 10 20 30 40 CHECK-NEXT: repeated x2: 10 10 20 20 30 30 40 40 CHECK-NEXT: repeated x3: 10 10 10 20 20 20 30 30 30 40 40 40 thrust-1.9.5/internal/test/thrust.example.run_length_decoding.filecheck000066400000000000000000000002201344621116200264660ustar00rootroot00000000000000 CHECK: run-length encoded input: CHECK-NEXT: (a,3)(b,5)(c,1)(d,2)(e,9)(f,2) CHECK: decoded output: CHECK-NEXT: aaabbbbbcddeeeeeeeeeff thrust-1.9.5/internal/test/thrust.example.run_length_encoding.filecheck000066400000000000000000000002151344621116200265040ustar00rootroot00000000000000 CHECK: input data: CHECK-NEXT: aaabbbbbcddeeeeeeeeeff CHECK: run-length encoded output: CHECK-NEXT: (a,3)(b,5)(c,1)(d,2)(e,9)(f,2) thrust-1.9.5/internal/test/thrust.example.saxpy.filecheck000066400000000000000000000000001344621116200236250ustar00rootroot00000000000000thrust-1.9.5/internal/test/thrust.example.scan_by_key.filecheck000066400000000000000000000016601344621116200247640ustar00rootroot00000000000000 CHECK: Inclusive Segmented Scan w/ Key Sequence CHECK-NEXT: keys : 0 0 0 1 1 2 2 2 2 3 4 4 5 5 5 CHECK-NEXT: input values : 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 CHECK-NEXT: output values : 2 4 6 2 4 2 4 6 8 2 2 4 2 4 6 CHECK: Inclusive Segmented Scan w/ Head Flag Sequence CHECK-NEXT: head flags : 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 CHECK-NEXT: input values : 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 CHECK-NEXT: output values : 2 4 6 2 4 2 4 6 8 2 2 4 2 4 6 CHECK: Exclusive Segmented Scan w/ Key Sequence CHECK-NEXT: keys : 0 0 0 1 1 2 2 2 2 3 4 4 5 5 5 CHECK-NEXT: input values : 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 CHECK-NEXT: output values : 0 2 4 0 2 0 2 4 6 0 0 2 0 2 4 CHECK: Exclusive Segmented Scan w/ Head Flag Sequence CHECK-NEXT: head flags : 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 CHECK-NEXT: input values : 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 CHECK-NEXT: output values : 0 2 4 0 2 0 2 4 6 0 0 2 0 2 4 thrust-1.9.5/internal/test/thrust.example.set_operations.filecheck000066400000000000000000000005321344621116200255310ustar00rootroot00000000000000 CHECK: Set A [ 0 2 4 5 6 8 9 ] CHECK-NEXT: Set B [ 0 1 2 3 5 7 8 ] CHECK-NEXT: Merge(A,B) [ 0 0 1 2 2 3 4 5 5 6 7 8 8 9 ] CHECK-NEXT: Union(A,B) [ 0 1 2 3 4 5 6 7 8 9 ] CHECK-NEXT: Intersection(A,B) [ 0 2 5 8 ] CHECK-NEXT: Difference(A,B) [ 4 6 9 ] CHECK-NEXT: SymmetricDifference(A,B) [ 1 3 4 6 7 9 ] CHECK-NEXT: SetIntersectionSize(A,B) 4 thrust-1.9.5/internal/test/thrust.example.simple_moving_average.filecheck000066400000000000000000000016031344621116200270350ustar00rootroot00000000000000 CHECK: data series: [ 0 0 6 9 10 2 5 4 2 8 0 6 6 8 6 5 9 10 7 10 3 4 9 7 9 10 1 9 9 3 ] CHECK-NEXT: simple moving averages (window = 4) CHECK-NEXT: [ 0, 4) = 3.75 CHECK-NEXT: [ 1, 5) = 6.25 CHECK-NEXT: [ 2, 6) = 6.75 CHECK-NEXT: [ 3, 7) = 6.5 CHECK-NEXT: [ 4, 8) = 5.25 CHECK-NEXT: [ 5, 9) = 3.25 CHECK-NEXT: [ 6,10) = 4.75 CHECK-NEXT: [ 7,11) = 3.5 CHECK-NEXT: [ 8,12) = 4 CHECK-NEXT: [ 9,13) = 5 CHECK-NEXT: [10,14) = 5 CHECK-NEXT: [11,15) = 6.5 CHECK-NEXT: [12,16) = 6.25 CHECK-NEXT: [13,17) = 7 CHECK-NEXT: [14,18) = 7.5 CHECK-NEXT: [15,19) = 7.75 CHECK-NEXT: [16,20) = 9 CHECK-NEXT: [17,21) = 7.5 CHECK-NEXT: [18,22) = 6 CHECK-NEXT: [19,23) = 6.5 CHECK-NEXT: [20,24) = 5.75 CHECK-NEXT: [21,25) = 7.25 CHECK-NEXT: [22,26) = 8.75 CHECK-NEXT: [23,27) = 6.75 CHECK-NEXT: [24,28) = 7.25 CHECK-NEXT: [25,29) = 7.25 CHECK-NEXT: [26,30) = 5.5 thrust-1.9.5/internal/test/thrust.example.sort.filecheck000066400000000000000000000030221344621116200234570ustar00rootroot00000000000000 CHECK: sorting integers CHECK-NEXT: 79 78 62 78 94 40 86 57 40 16 28 54 77 87 93 98 CHECK-NEXT: 16 28 40 40 54 57 62 77 78 78 79 86 87 93 94 98 CHECK: sorting integers (descending) CHECK-NEXT: 79 78 62 78 94 40 86 57 40 16 28 54 77 87 93 98 CHECK-NEXT: 98 94 93 87 86 79 78 78 77 62 57 54 40 40 28 16 CHECK: sorting integers (user-defined comparison) CHECK-NEXT: 79 78 62 78 94 40 86 57 40 16 28 54 77 87 93 98 CHECK-NEXT: 16 28 40 40 54 62 78 78 86 94 98 57 77 79 87 93 CHECK: sorting floats CHECK-NEXT: 7.5 7.5 6.0 7.5 9.0 4.0 8.5 5.5 4.0 1.5 2.5 5.0 7.5 8.5 9.0 9.5 CHECK-NEXT: 1.5 2.5 4.0 4.0 5.0 5.5 6.0 7.5 7.5 7.5 7.5 8.5 8.5 9.0 9.0 9.5 CHECK: sorting pairs CHECK-NEXT: (7,7) (5,7) (9,3) (8,5) (3,0) (2,4) (7,8) (9,9) (7,1) (1,9) (0,5) (3,6) (8,0) (7,6) (4,2) (8,3) CHECK-NEXT: (0,5) (1,9) (2,4) (3,0) (3,6) (4,2) (5,7) (7,1) (7,6) (7,7) (7,8) (8,0) (8,3) (8,5) (9,3) (9,9) CHECK: key-value sorting CHECK-NEXT: (79, 0) (78, 1) (62, 2) (78, 3) (94, 4) (40, 5) (86, 6) (57, 7) (40, 8) (16, 9) (28,10) (54,11) (77,12) (87,13) (93,14) (98,15) CHECK-NEXT: (16, 9) (28,10) (40, 5) (40, 8) (54,11) (57, 7) (62, 2) (77,12) (78, 1) (78, 3) (79, 0) (86, 6) (87,13) (93,14) (94, 4) (98,15) CHECK: key-value sorting (descending) CHECK-NEXT: (79, 0) (78, 1) (62, 2) (78, 3) (94, 4) (40, 5) (86, 6) (57, 7) (40, 8) (16, 9) (28,10) (54,11) (77,12) (87,13) (93,14) (98,15) CHECK-NEXT: (98,15) (94, 4) (93,14) (87,13) (86, 6) (79, 0) (78, 1) (78, 3) (77,12) (62, 2) (57, 7) (54,11) (40, 5) (40, 8) (28,10) (16, 9) thrust-1.9.5/internal/test/thrust.example.sorting_aos_vs_soa.filecheck000066400000000000000000000001461344621116200263750ustar00rootroot00000000000000 CHECK: AoS sort took {{[0-9.]+}} milliseconds CHECK-NEXT: SoA sort took {{[0-9.]+}} milliseconds thrust-1.9.5/internal/test/thrust.example.sparse_vector.filecheck000066400000000000000000000003371344621116200253550ustar00rootroot00000000000000 CHECK: Computing C = A + B for sparse vectors A and B CHECK-NEXT: A (2,10) (3,60) (5,20) (8,40) CHECK-NEXT: B (1,50) (2,30) (4,80) (5,30) (7,90) (8,10) CHECK-NEXT: C (1,50) (2,40) (3,60) (4,80) (5,50) (7,90) (8,50) thrust-1.9.5/internal/test/thrust.example.stream_compaction.filecheck000066400000000000000000000002141344621116200261770ustar00rootroot00000000000000 CHECK: values: 0 1 2 3 4 5 6 7 8 9 CHECK-NEXT: output: 1 3 5 7 9 CHECK-NEXT: small_output: 1 3 5 7 9 CHECK-NEXT: values: 0 2 4 6 8 thrust-1.9.5/internal/test/thrust.example.strided_range.filecheck000066400000000000000000000002631344621116200253060ustar00rootroot00000000000000 CHECK: data: 10 20 30 40 50 60 70 80 CHECK-NEXT: sum of even indices: 160 CHECK-NEXT: sum of odd indices: 200 CHECK-NEXT: setting odd indices to zero: 10 0 30 0 50 0 70 0 thrust-1.9.5/internal/test/thrust.example.sum.filecheck000066400000000000000000000000321344621116200232720ustar00rootroot00000000000000 CHECK: sum is 509773 thrust-1.9.5/internal/test/thrust.example.sum_rows.filecheck000066400000000000000000000003461344621116200243540ustar00rootroot00000000000000 CHECK: [ 10 17 64 90 97 27 56 45 ] = 406 CHECK-NEXT: [ 33 76 18 60 62 82 63 56 ] = 450 CHECK-NEXT: [ 88 99 75 96 36 48 90 68 ] = 600 CHECK-NEXT: [ 91 96 24 87 91 36 94 47 ] = 566 CHECK-NEXT: [ 37 56 45 81 72 58 63 18 ] = 430 thrust-1.9.5/internal/test/thrust.example.summary_statistics.filecheck000066400000000000000000000005671344621116200264520ustar00rootroot00000000000000 CHECK: ******Summary Statistics Example***** CHECK-NEXT: The data: 4 7 13 16 CHECK-NEXT: Count : 4 CHECK-NEXT: Minimum : 4 CHECK-NEXT: Maximum : 16 CHECK-NEXT: Mean : 10 CHECK-NEXT: Variance : 30 CHECK-NEXT: Standard Deviation : 4.74342 CHECK-NEXT: Skewness : 0 CHECK-NEXT: Kurtosis : 1.36 thrust-1.9.5/internal/test/thrust.example.summed_area_table.filecheck000066400000000000000000000017061344621116200261300ustar00rootroot00000000000000 CHECK: [step 0] initial array CHECK-NEXT: 1 1 1 1 CHECK-NEXT: 1 1 1 1 CHECK-NEXT: 1 1 1 1 CHECK-NEXT: [step 1] scan horizontally CHECK-NEXT: 1 2 3 4 CHECK-NEXT: 1 2 3 4 CHECK-NEXT: 1 2 3 4 CHECK-NEXT: [step 2] transpose array CHECK-NEXT: 1 1 1 CHECK-NEXT: 2 2 2 CHECK-NEXT: 3 3 3 CHECK-NEXT: 4 4 4 CHECK-NEXT: [step 3] scan transpose horizontally CHECK-NEXT: 1 2 3 CHECK-NEXT: 2 4 6 CHECK-NEXT: 3 6 9 CHECK-NEXT: 4 8 12 CHECK-NEXT: [step 4] transpose the transpose CHECK-NEXT: 1 2 3 4 CHECK-NEXT: 2 4 6 8 CHECK-NEXT: 3 6 9 12 thrust-1.9.5/internal/test/thrust.example.tiled_range.filecheck000066400000000000000000000002261344621116200247500ustar00rootroot00000000000000 CHECK: range 10 20 30 40 CHECK-NEXT: two tiles: 10 20 30 40 10 20 30 40 CHECK-NEXT: three tiles: 10 20 30 40 10 20 30 40 10 20 30 40 thrust-1.9.5/internal/test/thrust.example.transform_iterator.filecheck000066400000000000000000000005361344621116200264230ustar00rootroot00000000000000 CHECK: values : 2 5 7 1 6 0 3 8 CHECK-NEXT: clamped values : 2 5 5 1 5 1 3 5 CHECK-NEXT: sum of clamped values : 27 CHECK-NEXT: sequence : 0 1 2 3 4 5 6 7 8 9 CHECK-NEXT: clamped sequence : 1 1 2 3 4 5 5 5 5 5 CHECK-NEXT: negated sequence : -1 -1 -2 -3 -4 -5 -5 -5 -5 -5 CHECK-NEXT: negated values : -2 -5 -7 -1 -6 0 -3 -8 thrust-1.9.5/internal/test/thrust.example.transform_output_iterator.filecheck000066400000000000000000000000561344621116200300400ustar00rootroot00000000000000 CHECK: result= [ -0.666667 -2.66667 2 ] thrust-1.9.5/internal/test/thrust.example.uninitialized_vector.filecheck000066400000000000000000000000001344621116200267130ustar00rootroot00000000000000thrust-1.9.5/internal/test/thrust.example.version.filecheck000066400000000000000000000000701344621116200241550ustar00rootroot00000000000000 CHECK: Thrust v{{[0-9]+[.][0-9]+[.][0-9]+-[0-9]+}} thrust-1.9.5/internal/test/thrust.example.weld_vertices.filecheck000066400000000000000000000007031344621116200253320ustar00rootroot00000000000000 CHECK: Output Representation CHECK-NEXT: vertices[0] = (0,0) CHECK-NEXT: vertices[1] = (0,1) CHECK-NEXT: vertices[2] = (1,0) CHECK-NEXT: vertices[3] = (1,1) CHECK-NEXT: vertices[4] = (2,0) CHECK-NEXT: indices[0] = 0 CHECK-NEXT: indices[1] = 2 CHECK-NEXT: indices[2] = 1 CHECK-NEXT: indices[3] = 2 CHECK-NEXT: indices[4] = 3 CHECK-NEXT: indices[5] = 1 CHECK-NEXT: indices[6] = 2 CHECK-NEXT: indices[7] = 4 CHECK-NEXT: indices[8] = 3 thrust-1.9.5/internal/test/thrust.example.word_count.filecheck000066400000000000000000000007771344621116200246710ustar00rootroot00000000000000 CHECK: Text sample: CHECK-NEXT: But the raven, sitting lonely on the placid bust, spoke only, CHECK-NEXT: That one word, as if his soul in that one word he did outpour. CHECK-NEXT: Nothing further then he uttered - not a feather then he fluttered - CHECK-NEXT: Till I scarcely more than muttered `Other friends have flown before - CHECK-NEXT: On the morrow he will leave me, as my hopes have flown before.' CHECK-NEXT: Then the bird said, `Nevermore.' CHECK: Text sample contains 65 words thrust-1.9.5/internal/test/thrust.sanity.filecheck000066400000000000000000000000231344621116200223430ustar00rootroot00000000000000 CHECK: SANITY thrust-1.9.5/internal/test/thrust_nightly.pl000077500000000000000000000466131344621116200213130ustar00rootroot00000000000000#! /usr/bin/perl ############################################################################### # Copyright (c) 2018 NVIDIA Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ############################################################################### use strict; use warnings; print(`perl --version`); use Getopt::Long; use Cwd; use Cwd "abs_path"; use Config; # For signal names and numbers. use IPC::Open2; use File::Temp; use POSIX "strftime"; my $have_time_hi_res = 0; if (eval { require Time::HiRes }) { printf("#### CONFIG timestamp `gettimeofday`\n"); import Time::HiRes "gettimeofday"; $have_time_hi_res = 1; } else { printf("#### CONFIG timestamp `time`\n"); } sub timestamp() { if ($have_time_hi_res) { return gettimeofday(); } else { return time(); } } my %CmdLineOption; my $arch = ""; my $abi = ""; my $os = ""; my $build = "release"; my $bin_path; my $filecheck_path; my $filecheck_data_path = "internal/test"; my $timeout_min = 15; # https://stackoverflow.com/questions/29862178/name-of-signal-number-2 my @sig_names; @sig_names[ split ' ', $Config{sig_num} ] = split ' ', $Config{sig_name}; my %sig_nums; @sig_nums{ split ' ', $Config{sig_name} } = split ' ', $Config{sig_num}; if (`uname` =~ m/CYGWIN/) { $os = "win32"; } elsif ($^O eq "MSWin32") { $os = "win32"; } else { $os = `uname`; chomp($os); } if ($os eq "win32") { $ENV{'PROCESSOR_ARCHITECTURE'} ||= ""; $ENV{'PROCESSOR_ARCHITEW6432'} ||= ""; if ((lc($ENV{PROCESSOR_ARCHITECTURE}) ne "x86") || (lc($ENV{PROCESSOR_ARCHITECTURE}) eq "amd64") || (lc($ENV{PROCESSOR_ARCHITEW6432}) eq "amd64")) { $arch = "x86_64"; } else { $arch = "i686"; } } else { $arch = `uname -m`; chomp($arch); } sub usage() { printf("Usage: thrust_nightly.pl \n"); printf("Options:\n"); printf(" -help : Print help message\n"); printf(" -forcearch : i686|x86_64|ARMv7|aarch64 (default: $arch)\n"); printf(" -forceabi : Specify abi to be used for arm (gnueabi|gnueabihf)\n"); printf(" -forceos : win32|Linux|Darwin (default: $os)\n"); printf(" -build : (default: debug)\n"); printf(" -bin-path : Specify location of test binaries\n"); printf(" -filecheck-path : Specify location of filecheck binary\n"); printf(" -filecheck-data-path : Specify location of filecheck data (default: $filecheck_data_path)\n"); printf(" -timeout-min : timeout in minutes for each individual test\n"); } GetOptions(\%CmdLineOption, 'help' => sub { usage() and exit 0 }, "forcearch=s" => \$arch, "forceabi=s" => \$abi, "forceos=s" => \$os, "build=s" => \$build, "bin-path=s" => \$bin_path, "filecheck-path=s" => \$filecheck_path, "filecheck-data-path=s" => \$filecheck_data_path, "timeout-min=i" => \$timeout_min, ); my $pwd = getcwd(); my $bin_path_root = abs_path ("${pwd}/.."); if ($arch eq "ARMv7") { if ($abi eq "") { $abi = "_gnueabi"; #Use default abi for arm if not specified } else { $abi = "_${abi}"; } } else { $abi = ""; #Ignore abi for architectures other than arm } my $uname = ""; $uname = $arch; chomp($uname); if (not $bin_path) { $bin_path = "${bin_path_root}/bin/${uname}_${os}${abi}_${build}"; } if (not $filecheck_path) { $filecheck_path = "${bin_path}/nvvm/tools"; } sub process_return_code { my ($name, $ret, $msg) = @_; if ($ret != 0) { my $signal = $ret & 127; my $app_exit = $ret >> 8; my $dumped_core = $ret & 0x80; if (($app_exit != 0) && ($app_exit != 0)) { if ($msg ne "") { printf("#### ERROR $name exited with return value $app_exit. $msg\n"); } else { printf("#### ERROR $name exited with return value $app_exit.\n"); } } if ($signal != 0) { if ($msg ne "") { printf("#### ERROR $name received signal SIG$sig_names[$signal] ($signal). $msg\n"); } else { printf("#### ERROR $name received signal SIG$sig_names[$signal] ($signal).\n"); } if ($sig_nums{'INT'} eq $signal) { die("Terminating testing due to SIGINT."); } } if ($dumped_core != 0) { if ($msg ne "") { printf("#### ERROR $name generated a core dump. $msg\n"); } else { printf("#### ERROR $name generated a core dump.\n"); } } } } my $have_filecheck = 1; sub filecheck_sanity { my $filecheck_cmd = "$filecheck_path/FileCheck $filecheck_data_path/thrust.sanity.filecheck"; my $filecheck_pid = open(my $filecheck_stdin, "|-", "$filecheck_cmd 2>&1"); print $filecheck_stdin "SANITY"; my $filecheck_ret = 0; if (close($filecheck_stdin) == 0) { $filecheck_ret = $?; } if ($filecheck_ret == 0) { printf("#### SANE FileCheck\n"); } else { # Use a temporary file to send the output to # FileCheck so we can get the output this time, # because Perl and bidirectional pipes suck. my $tmp = File::Temp->new(); my $tmp_filename = $tmp->filename; print $tmp "SANITY"; printf("********************************************************************************\n"); print `$filecheck_cmd -input-file $tmp_filename`; printf("********************************************************************************\n"); process_return_code("FileCheck Sanity", $filecheck_ret, ""); printf("#### INSANE FileCheck\n"); $have_filecheck = 0; } } # Wrapper for system that logs the commands so you can see what it did sub run_cmd { my ($cmd) = @_; my $ret = 0; my @executable; my @output; my $syst_cmd; my $start = timestamp(); eval { local $SIG{ALRM} = sub { die("Command timed out (received SIGALRM).\n") }; alarm (60 * $timeout_min); $syst_cmd = $cmd; @executable = split(' ', $syst_cmd, 2); open(my $child, "-|", "$syst_cmd") or die("Could not execute $syst_cmd.\n"); if ($child) { @output = <$child>; } if (close($child) == 0) { $ret = $?; } alarm 0; }; my $elapsed = timestamp() - $start; if ($@) { printf("\n#### ERROR Command timeout reached, killing $executable[0].\n"); system("killall ".$executable[0]); return ($sig_nums{'KILL'}, $elapsed, @output); } return ($ret, $elapsed, @output); } sub current_time { return strftime("%x %X %Z", localtime()); } my $failures = 0; my $known_failures = 0; my $errors = 0; my $passes = 0; sub run_examples { # Get list of tests in binary folder. my $dir = cwd(); chdir $bin_path; my @examplelist; if ($os eq "win32") { @examplelist = glob('thrust.example.*.exe'); } else { @examplelist = glob('thrust.example.*'); } chdir $dir; my $test; foreach $test (@examplelist) { my $test_exe = $test; # Ignore FileCheck files. if ($test =~ /[.]filecheck$/) { next; } if ($os eq "win32") { $test =~ s/\.exe//g; } # Check the test actually exists. if (!-e "${bin_path}/${test_exe}") { next; } my $cmd = "${bin_path}/${test_exe} --verbose 2>&1"; printf("&&&& RUNNING $test\n"); printf("#### CURRENT_TIME " . current_time() . "\n"); my ($ret, $elapsed, @output) = run_cmd($cmd); printf("********************************************************************************\n"); print @output; printf("********************************************************************************\n"); if ($ret != 0) { process_return_code($test, $ret, "Example crash?"); printf("&&&& FAILED $test\n"); printf("#### WALLTIME $test %.2f [s]\n", $elapsed); $errors = $errors + 1; } else { printf("&&&& PASSED $test\n"); printf("#### WALLTIME $test %.2f [s]\n", $elapsed); $passes = $passes + 1; if ($have_filecheck) { # Check output with LLVM FileCheck. printf("&&&& RUNNING FileCheck $test\n"); if (-f "${filecheck_data_path}/${test}.filecheck") { # If the filecheck file is empty, don't use filecheck, just # check if the output file is also empty. if (-z "${filecheck_data_path}/${test}.filecheck") { if (join("", @output) eq "") { printf("&&&& PASSED FileCheck $test\n"); $passes = $passes + 1; } else { printf("#### ERROR Output received but not expected.\n"); printf("&&&& FAILED FileCheck $test\n"); $failures = $failures + 1; } } else { my $filecheck_cmd = "$filecheck_path/FileCheck $filecheck_data_path/$test.filecheck"; my $filecheck_pid = open(my $filecheck_stdin, "|-", "$filecheck_cmd 2>&1"); print $filecheck_stdin @output; my $filecheck_ret = 0; if (close($filecheck_stdin) == 0) { $filecheck_ret = $?; } if ($filecheck_ret == 0) { printf("&&&& PASSED FileCheck $test\n"); $passes = $passes + 1; } else { # Use a temporary file to send the output to # FileCheck so we can get the output this time, # because Perl and bidirectional pipes suck. my $tmp = File::Temp->new(); my $tmp_filename = $tmp->filename; print $tmp @output; printf("********************************************************************************\n"); print `$filecheck_cmd -input-file $tmp_filename`; printf("********************************************************************************\n"); process_return_code("FileCheck $test", $filecheck_ret, ""); printf("&&&& FAILED FileCheck $test\n"); $failures = $failures + 1; } } } else { printf("#### ERROR $test has no FileCheck comparison.\n"); printf("&&&& FAILED FileCheck $test\n"); $errors = $errors + 1; } } } printf("\n"); } } sub run_unit_tests { # Get list of tests in binary folder. my $dir = cwd(); chdir $bin_path; my @unittestlist; if ($os eq "win32") { @unittestlist = glob('thrust.test.*.exe'); } else { @unittestlist = glob('thrust.test.*'); } chdir $dir; my $test; foreach $test (@unittestlist) { my $test_exe = $test; # Ignore FileCheck files. if ($test =~ /[.]filecheck$/) { next; } if ($os eq "win32") { $test =~ s/\.exe//g; } # Check the test actually exists. if (!-e "${bin_path}/${test_exe}") { next; } # Check the test actually exists next unless (-e "${bin_path}/${test_exe}"); my $cmd = "${bin_path}/${test_exe} --verbose 2>&1"; printf("&&&& RUNNING $test\n"); printf("#### CURRENT_TIME " . current_time() . "\n"); my ($ret, $elapsed, @output) = run_cmd($cmd); printf("********************************************************************************\n"); print @output; printf("********************************************************************************\n"); my $fail = 0; my $known_fail = 0; my $error = 0; my $pass = 0; my $found_totals = 0; foreach my $line (@output) { if (($fail, $known_fail, $error, $pass) = $line =~ /Totals: ([0-9]+) failures, ([0-9]+) known failures, ([0-9]+) errors, and ([0-9]+) passes[.]/igs) { $found_totals = 1; $failures = $failures + $fail; $known_failures = $known_failures + $known_fail; $errors = $errors + $error; $passes = $passes + $pass; last; } else { $fail = 0; $known_fail = 0; $error = 0; $pass = 0; } } if ($ret == 0) { if ($found_totals == 0) { $errors = $errors + 1; printf("#### ERROR $test returned 0 and no summary line was found. Invalid test?\n"); printf("&&&& FAILED $test\n"); printf("#### WALLTIME $test %.2f [s]\n", $elapsed); } else { if ($fail != 0 or $error != 0) { $errors = $errors + 1; printf("#### ERROR $test returned 0 and had failures or errors. Test driver error?\n"); printf("&&&& FAILED $test\n"); printf("#### WALLTIME $test %.2f [s]\n", $elapsed); } elsif ($known_fail == 0 and $pass == 0) { printf("#### DISABLED $test returned 0 and had no failures, known failures, errors or passes.\n"); printf("&&&& PASSED $test\n"); printf("#### WALLTIME $test %.2f [s]\n", $elapsed); } else { printf("&&&& PASSED $test\n"); printf("#### WALLTIME $test %.2f [s]\n", $elapsed); if ($have_filecheck) { # Check output with LLVM FileCheck if the test has a FileCheck input. if (-f "${filecheck_data_path}/${test}.filecheck") { printf("&&&& RUNNING FileCheck $test\n"); # If the filecheck file is empty, don't use filecheck, # just check if the output file is also empty. if (! -z "${filecheck_data_path}/${test}.filecheck") { if (@output) { printf("&&&& PASSED FileCheck $test\n"); $passes = $passes + 1; } else { printf("#### Output received but not expected.\n"); printf("&&&& FAILED FileCheck $test\n"); $failures = $failures + 1; } } else { my $filecheck_cmd = "$filecheck_path/FileCheck $filecheck_data_path/$test.filecheck"; my $filecheck_pid = open(my $filecheck_stdin, "|-", "$filecheck_cmd 2>&1"); print $filecheck_stdin @output; my $filecheck_ret = 0; if (close($filecheck_stdin) == 0) { $filecheck_ret = $?; } if ($filecheck_ret == 0) { printf("&&&& PASSED FileCheck $test\n"); $passes = $passes + 1; } else { # Use a temporary file to send the output to # FileCheck so we can get the output this time, # because Perl and bidirectional pipes suck. my $tmp = File::Temp->new(); my $tmp_filename = $tmp->filename; print $tmp @output; printf("********************************************************************************\n"); print `$filecheck_cmd -input-file $tmp_filename`; printf("********************************************************************************\n"); process_return_code("FileCheck $test", $filecheck_ret, ""); printf("&&&& FAILED FileCheck $test\n"); $failures = $failures + 1; } } } } } } } else { $errors = $errors + 1; process_return_code($test, $ret, "Test crash?"); printf("&&&& FAILED $test\n"); printf("#### WALLTIME $test %.2f [s]\n", $elapsed); } printf("\n"); } } sub dvs_summary { my $dvs_score = 0; my $denominator = $failures + $known_failures + $errors + $passes; if ($denominator == 0) { $dvs_score = 0; } else { $dvs_score = 100 * (($passes + $known_failures) / $denominator); } printf("\n"); printf("%*%*%*%* FA!LUR3S $failures\n"); printf("%*%*%*%* KN0WN FA!LUR3S $known_failures\n"); printf("%*%*%*%* 3RR0RS $errors\n"); printf("%*%*%*%* PASS3S $passes\n"); printf("\n"); printf("CUDA DVS BASIC SANITY SCORE : %.1f\n", $dvs_score); if ($failures + $errors > 0) { exit(1); } } ############################################################################### printf("#### CONFIG arch `%s`\n", $arch); printf("#### CONFIG abi `%s`\n", $abi); printf("#### CONFIG os `%s`\n", $os); printf("#### CONFIG build `%s`\n", $build); printf("#### CONFIG bin_path `%s`\n", $bin_path); printf("#### CONFIG have_filecheck `$have_filecheck`\n"); printf("#### CONFIG filecheck_path `%s`\n", $filecheck_path); printf("#### CONFIG filecheck_data_path `%s`\n", $filecheck_data_path); printf("#### CONFIG have_time_hi_res `$have_time_hi_res`\n"); printf("#### CONFIG timeout_min `%s`\n", $timeout_min); printf("#### ENV PATH `%s`\n", defined $ENV{'PATH'} ? $ENV{'PATH'} : ''); printf("#### ENV LD_LIBRARY_PATH `%s`\n", defined $ENV{'LD_LIBRARY_PATH'} ? $ENV{'LD_LIBRARY_PATH'} : ''); printf("\n"); filecheck_sanity(); printf("\n"); my $START_TIME = current_time(); run_examples(); run_unit_tests(); my $STOP_TIME = current_time(); printf("#### START_TIME $START_TIME\n"); printf("#### STOP_TIME $STOP_TIME\n"); dvs_summary(); thrust-1.9.5/internal/test/unittest.lst000066400000000000000000001120531344621116200202570ustar00rootroot00000000000000TestAdjacentDifference TestAdjacentDifferenceCudaStreams TestAdjacentDifferenceDeviceSeq TestAdjacentDifferenceDiscardIterator TestAdjacentDifferenceDispatchExplicit TestAdjacentDifferenceDispatchImplicit TestAdjacentDifferenceInPlaceWithRelatedIteratorTypes TestAdjacentDifferenceSimpleDevice TestAdjacentDifferenceSimpleHost TestAdvanceDevice TestAdvanceHost TestAllOfCudaStreams TestAllOfDevice TestAllOfDeviceSeq TestAllOfDispatchExplicit TestAllOfDispatchImplicit TestAllOfHost TestAllocatorCustomCopyConstruct TestAllocatorCustomDefaultConstruct TestAllocatorCustomDestroy TestAllocatorMinimal TestAnyOfCudaStreams TestAnyOfDevice TestAnyOfDeviceSeq TestAnyOfDispatchExplicit TestAnyOfDispatchImplicit TestAnyOfHost TestAssertEqual TestAssertGEqual TestAssertLEqual TestBitAndFunctionalDevice TestBitAndFunctionalHost TestBitOrFunctionalDevice TestBitOrFunctionalHost TestBitXorFunctionalDevice TestBitXorFunctionalHost TestComplexArithmeticTransform TestComplexBasicArithmetic TestComplexBinaryArithmetic TestComplexConstructors TestComplexExponentialFunctions TestComplexExponentialTransform TestComplexGetters TestComplexMemberOperators TestComplexPlaneTransform TestComplexPowerFunctions TestComplexPowerTransform TestComplexStreamOperators TestComplexTrigonometricFunctions TestComplexTrigonometricTransform TestComplexUnaryArithmetic TestComputeCapability TestConstantIteratorComparison TestConstantIteratorConstructFromConvertibleSystem TestConstantIteratorCopyDevice TestConstantIteratorCopyHost TestConstantIteratorIncrement TestConstantIteratorReduce TestConstantIteratorTransformDevice TestConstantIteratorTransformHost TestCopyConstantIteratorToZipIteratorDevice TestCopyConstantIteratorToZipIteratorHost TestCopyCountingIteratorDevice TestCopyCountingIteratorHost TestCopyDispatchExplicit TestCopyDispatchImplicit TestCopyFromConstIterator TestCopyIf TestCopyIfDispatchExplicit TestCopyIfDispatchImplicit TestCopyIfSimpleDevice TestCopyIfSimpleHost TestCopyIfStencil TestCopyIfStencilDispatchExplicit TestCopyIfStencilDispatchImplicit TestCopyIfStencilSimpleDevice TestCopyIfStencilSimpleHost TestCopyListToDevice TestCopyListToHost TestCopyMatchingTypesDevice TestCopyMatchingTypesHost TestCopyMixedTypesDevice TestCopyMixedTypesHost TestCopyNConstantIteratorToZipIteratorDevice TestCopyNConstantIteratorToZipIteratorHost TestCopyNCountingIteratorDevice TestCopyNCountingIteratorHost TestCopyNDispatchExplicit TestCopyNDispatchImplicit TestCopyNFromConstIterator TestCopyNListToDevice TestCopyNListToHost TestCopyNMatchingTypesDevice TestCopyNMatchingTypesHost TestCopyNMixedTypesDevice TestCopyNMixedTypesHost TestCopyNToDiscardIterator TestCopyNVectorBool TestCopyNZipIteratorDevice TestCopyNZipIteratorHost TestCopyToDiscardIterator TestCopyToDiscardIteratorZipped TestCopyVectorBool TestCopyZipIteratorDevice TestCopyZipIteratorHost TestCount TestCountCudaStreams TestCountDeviceSeq TestCountDispatchExplicit TestCountDispatchImplicit TestCountFromConstIteratorSimpleDevice TestCountFromConstIteratorSimpleHost TestCountIf TestCountIfDeviceSeq TestCountIfSimpleDevice TestCountIfSimpleHost TestCountSimpleDevice TestCountSimpleHost TestCountingIteratorComparison TestCountingIteratorCopyConstructor TestCountingIteratorDifference TestCountingIteratorDistance TestCountingIteratorFloatComparison TestCountingIteratorIncrement TestCountingIteratorLowerBound TestCountingIteratorUnsignedType TestCudaMallocResultAligned TestCudaReduceIntervals TestCudaReduceIntervalsSimple TestDeviceDeleteDestructorInvocation TestDeviceDereferenceCountingIterator TestDeviceDereferenceDevicePtr TestDeviceDereferenceDeviceVectorIterator TestDeviceDereferenceTransformIterator TestDeviceDereferenceTransformedCountingIterator TestDevicePointerManipulation TestDeviceReferenceAssignmentFromDeviceReference TestDeviceReferenceConstructorFromDevicePointer TestDeviceReferenceConstructorFromDeviceReference TestDeviceReferenceManipulation TestDiscardIteratorComparison TestDiscardIteratorIncrement TestDistanceDevice TestDistanceHost TestDividesFunctionalDevice TestDividesFunctionalHost TestEqual TestEqualCudaStreams TestEqualDeviceSeq TestEqualDispatchExplicit TestEqualDispatchImplicit TestEqualSimpleDevice TestEqualSimpleHost TestEqualToFunctionalDevice TestEqualToFunctionalHost TestExclusiveScan32 TestExclusiveScanByKeyCudaStreams TestExclusiveScanByKeyDispatchExplicit TestExclusiveScanByKeyDispatchImplicit TestExclusiveScanByKeySimpleDevice TestExclusiveScanByKeySimpleHost TestExclusiveScanDispatchExplicit TestExclusiveScanDispatchImplicit TestFill TestFillCudaStreams TestFillDeviceSeq TestFillDiscardIterator TestFillDispatchExplicit TestFillDispatchImplicit TestFillMixedTypesDevice TestFillMixedTypesHost TestFillN TestFillNDeviceSeq TestFillNDiscardIterator TestFillNDispatchExplicit TestFillNDispatchImplicit TestFillNMixedTypesDevice TestFillNMixedTypesHost TestFillNSimpleDevice TestFillNSimpleHost TestFillSimpleDevice TestFillSimpleHost TestFillTuple TestFillWithNonTrivialAssignment TestFillWithTrivialAssignment TestFillZipIteratorDevice TestFillZipIteratorHost TestFind TestFindCudaStreams TestFindDeviceSeq TestFindDispatchExplicit TestFindDispatchImplicit TestFindIf TestFindIfDeviceSeq TestFindIfDispatchExplicit TestFindIfDispatchImplicit TestFindIfNot TestFindIfNotDeviceSeq TestFindIfNotDispatchExplicit TestFindIfNotDispatchImplicit TestFindIfNotSimpleDevice TestFindIfNotSimpleHost TestFindIfSimpleDevice TestFindIfSimpleHost TestFindSimpleDevice TestFindSimpleHost TestForEach TestForEachCudaStreams TestForEachDeviceSeq TestForEachDispatchExplicit TestForEachDispatchImplicit TestForEachLargeRegisterFootprint TestForEachN TestForEachNDeviceSeq TestForEachNDispatchExplicit TestForEachNDispatchImplicit TestForEachNLargeRegisterFootprint TestForEachNSimpleAnySystem TestForEachNSimpleDevice TestForEachNSimpleHost TestForEachNWithLargeTypes TestForEachSimpleAnySystem TestForEachSimpleDevice TestForEachSimpleHost TestForEachWithLargeTypes TestFreeDispatchExplicit TestFunctionalPlaceholdersBinaryEqualToDevice TestFunctionalPlaceholdersBinaryEqualToHost TestFunctionalPlaceholdersBinaryGreaterDevice TestFunctionalPlaceholdersBinaryGreaterEqualDevice TestFunctionalPlaceholdersBinaryGreaterEqualHost TestFunctionalPlaceholdersBinaryGreaterHost TestFunctionalPlaceholdersBinaryLessDevice TestFunctionalPlaceholdersBinaryLessEqualDevice TestFunctionalPlaceholdersBinaryLessEqualHost TestFunctionalPlaceholdersBinaryLessHost TestFunctionalPlaceholdersBinaryNotEqualToDevice TestFunctionalPlaceholdersBinaryNotEqualToHost TestFunctionalPlaceholdersBitAnd TestFunctionalPlaceholdersBitAnd TestFunctionalPlaceholdersBitAndEqual TestFunctionalPlaceholdersBitAndEqual TestFunctionalPlaceholdersBitNegateDevice TestFunctionalPlaceholdersBitNegateHost TestFunctionalPlaceholdersBitOr TestFunctionalPlaceholdersBitOr TestFunctionalPlaceholdersBitOrEqual TestFunctionalPlaceholdersBitOrEqual TestFunctionalPlaceholdersBitRshiftEqual TestFunctionalPlaceholdersBitRshiftEqual TestFunctionalPlaceholdersBitXor TestFunctionalPlaceholdersBitXor TestFunctionalPlaceholdersBitXorEqual TestFunctionalPlaceholdersBitXorEqual TestFunctionalPlaceholdersDivides TestFunctionalPlaceholdersDivides TestFunctionalPlaceholdersDividesEqual TestFunctionalPlaceholdersDividesEqual TestFunctionalPlaceholdersLogicalAndDevice TestFunctionalPlaceholdersLogicalAndHost TestFunctionalPlaceholdersLogicalNotDevice TestFunctionalPlaceholdersLogicalNotHost TestFunctionalPlaceholdersLogicalOrDevice TestFunctionalPlaceholdersLogicalOrHost TestFunctionalPlaceholdersMinus TestFunctionalPlaceholdersMinus TestFunctionalPlaceholdersMinusEqual TestFunctionalPlaceholdersMinusEqual TestFunctionalPlaceholdersModulus TestFunctionalPlaceholdersModulus TestFunctionalPlaceholdersModulusEqual TestFunctionalPlaceholdersModulusEqual TestFunctionalPlaceholdersMultiplies TestFunctionalPlaceholdersMultiplies TestFunctionalPlaceholdersMultipliesEqual TestFunctionalPlaceholdersMultipliesEqual TestFunctionalPlaceholdersNegateDevice TestFunctionalPlaceholdersNegateHost TestFunctionalPlaceholdersPlus TestFunctionalPlaceholdersPlus TestFunctionalPlaceholdersPlusEqual TestFunctionalPlaceholdersPlusEqual TestFunctionalPlaceholdersPrefixDecrementDevice TestFunctionalPlaceholdersPrefixDecrementHost TestFunctionalPlaceholdersPrefixIncrementDevice TestFunctionalPlaceholdersPrefixIncrementHost TestFunctionalPlaceholdersSuffixDecrementDevice TestFunctionalPlaceholdersSuffixDecrementHost TestFunctionalPlaceholdersSuffixIncrementDevice TestFunctionalPlaceholdersSuffixIncrementHost TestFunctionalPlaceholdersTransformIterator TestFunctionalPlaceholdersTransformIterator TestFunctionalPlaceholdersUnaryPlusDevice TestFunctionalPlaceholdersUnaryPlusHost TestFunctionalPlaceholdersValue TestFunctionalPlaceholdersValue TestGather TestGatherCountingIteratorDevice TestGatherCountingIteratorHost TestGatherCudaStreams TestGatherDeviceSeq TestGatherDispatchExplicit TestGatherDispatchImplicit TestGatherIf TestGatherIfCudaStreams TestGatherIfDeviceSeq TestGatherIfDispatchExplicit TestGatherIfDispatchImplicit TestGatherIfSimpleDevice TestGatherIfSimpleHost TestGatherIfToDiscardIterator TestGatherSimpleDevice TestGatherSimpleHost TestGatherToDiscardIterator TestGenerate TestGenerateCudaStreams TestGenerateDeviceSeq TestGenerateDispatchExplicit TestGenerateDispatchImplicit TestGenerateNCudaStreams TestGenerateNDeviceSeq TestGenerateNDispatchExplicit TestGenerateNDispatchImplicit TestGenerateNSimpleDevice TestGenerateNSimpleHost TestGenerateNToDiscardIterator TestGenerateSimpleDevice TestGenerateSimpleHost TestGenerateToDiscardIterator TestGenerateTuple TestGenerateZipIteratorDevice TestGenerateZipIteratorHost TestGetTemporaryBuffer TestGetTemporaryBufferDeviceSeq TestGetTemporaryBufferDispatchExplicit TestGetTemporaryBufferDispatchImplicit TestGreaterEqualFunctionalDevice TestGreaterEqualFunctionalHost TestGreaterFunctionalDevice TestGreaterFunctionalHost TestIdentityFunctionalDevice TestIdentityFunctionalHost TestInclusiveScan32 TestInclusiveScanByKeyCudaStreams TestInclusiveScanByKeyDispatchExplicit TestInclusiveScanByKeyDispatchImplicit TestInclusiveScanByKeySimpleDevice TestInclusiveScanByKeySimpleHost TestInclusiveScanByKeyTransformIteratorDevice TestInclusiveScanByKeyTransformIteratorHost TestInclusiveScanDispatchExplicit TestInclusiveScanDispatchImplicit TestInclusiveScanWithIndirectionDevice TestInclusiveScanWithIndirectionHost TestInnerProduct TestInnerProductCudaStreams TestInnerProductDeviceSeq TestInnerProductDispatchExplicit TestInnerProductDispatchImplicit TestInnerProductSimpleDevice TestInnerProductSimpleHost TestInnerProductWithOperatorDevice TestInnerProductWithOperatorHost TestIsCommutative TestIsPartitionedCudaStreams TestIsPartitionedDevice TestIsPartitionedDeviceSeq TestIsPartitionedDispatchExplicit TestIsPartitionedDispatchImplicit TestIsPartitionedHost TestIsPartitionedSimpleDevice TestIsPartitionedSimpleHost TestIsPlainOldData TestIsSortedCudaStreams TestIsSortedDevice TestIsSortedDeviceSeq TestIsSortedDispatchExplicit TestIsSortedDispatchImplicit TestIsSortedHost TestIsSortedRepeatedElementsDevice TestIsSortedRepeatedElementsHost TestIsSortedSimpleDevice TestIsSortedSimpleHost TestIsSortedUntilCudaStreams TestIsSortedUntilDevice TestIsSortedUntilDeviceSeq TestIsSortedUntilExplicit TestIsSortedUntilHost TestIsSortedUntilImplicit TestIsSortedUntilRepeatedElementsDevice TestIsSortedUntilRepeatedElementsHost TestIsSortedUntilSimpleDevice TestIsSortedUntilSimpleHost TestIsTrivialIterator TestLessEqualFunctionalDevice TestLessEqualFunctionalHost TestLessFunctionalDevice TestLessFunctionalHost TestLog2 TestLogicalAndFunctionalDevice TestLogicalAndFunctionalHost TestLogicalNotFunctionalDevice TestLogicalNotFunctionalHost TestLogicalOrFunctionalDevice TestLogicalOrFunctionalHost TestMakeConstantIterator TestMakeDevicePointer TestMakeDiscardIterator TestMakePermutationIteratorDevice TestMakePermutationIteratorHost TestMakeTransformIteratorDevice TestMakeTransformIteratorHost TestMakeTuple TestMalloc TestMallocDeviceSeq TestMallocDispatchExplicit TestMax TestMaxActiveBlocks TestMaxBlocksizeWithHighestOccupancy TestMaxElement TestMaxElementCudaStreams TestMaxElementDeviceSeq TestMaxElementDispatchExplicit TestMaxElementDispatchImplicit TestMaxElementSimpleDevice TestMaxElementSimpleHost TestMaximumFunctionalDevice TestMaximumFunctionalHost TestMerge TestMergeByKey TestMergeByKeyCudaStreams TestMergeByKeyDescending TestMergeByKeyDeviceSeq TestMergeByKeyDispatchExplicit TestMergeByKeyDispatchImplicit TestMergeByKeySimpleDevice TestMergeByKeySimpleHost TestMergeByKeyToDiscardIterator TestMergeCudaStreams TestMergeDescending TestMergeDeviceSeq TestMergeDispatchExplicit TestMergeDispatchImplicit TestMergeKeyValue TestMergeKeyValueDescending TestMergeSimpleDevice TestMergeSimpleHost TestMergeSortAscendingKeyValue TestMergeSortDescendingKey TestMergeSortDescendingKeyValue TestMergeSortKeySimple TestMergeSortKeyValue TestMergeSortKeyValueSimple TestMergeSortStableKeySimple TestMergeToDiscardIterator TestMin TestMinElement TestMinElementCudaStreams TestMinElementDeviceSeq TestMinElementDispatchExplicit TestMinElementDispatchImplicit TestMinElementSimpleDevice TestMinElementSimpleHost TestMinMaxElement TestMinMaxElementCudaStreams TestMinMaxElementDeviceSeq TestMinMaxElementDispatchExplicit TestMinMaxElementDispatchImplicit TestMinMaxElementSimpleDevice TestMinMaxElementSimpleHost TestMinimumFunctionalDevice TestMinimumFunctionalHost TestMinstdRand0Equal TestMinstdRand0Max TestMinstdRand0Min TestMinstdRand0SaveRestore TestMinstdRand0Unequal TestMinstdRand0Validation TestMinstdRandEqual TestMinstdRandMax TestMinstdRandMin TestMinstdRandSaveRestore TestMinstdRandUnequal TestMinstdRandValidation TestMinusFunctionalDevice TestMinusFunctionalHost TestMismatchCudaStreams TestMismatchDeviceSeq TestMismatchDispatchExplicit TestMismatchDispatchImplicit TestMismatchSimpleDevice TestMismatchSimpleHost TestModulusFunctionalDevice TestModulusFunctionalHost TestMultipliesFunctionalDevice TestMultipliesFunctionalHost TestNegateFunctionalDevice TestNegateFunctionalHost TestNoneOfCudaStreams TestNoneOfDevice TestNoneOfDeviceSeq TestNoneOfDispatchExplicit TestNoneOfDispatchImplicit TestNoneOfHost TestNormalDistributionMax TestNormalDistributionMin TestNormalDistributionSaveRestore TestNot1Device TestNot1Host TestNot2Device TestNot2Host TestNotEqualToFunctionalDevice TestNotEqualToFunctionalHost TestPairComparison TestPairGet TestPairManipulation TestPairReduce TestPairScan TestPairScanByKey TestPairStableSort TestPairStableSortByKey TestPairStableSortByKeyDeviceSeq TestPairStableSortDeviceSeq TestPairSwap TestPairTransform TestPairTupleElement TestPairTupleSize TestPartition TestPartitionCopy TestPartitionCopyDeviceSeq TestPartitionCopyDispatchExplicit TestPartitionCopyDispatchImplicit TestPartitionCopySimpleDevice TestPartitionCopySimpleHost TestPartitionCopyStencil TestPartitionCopyStencilDispatchExplicit TestPartitionCopyStencilDispatchImplicit TestPartitionCopyStencilSimpleDevice TestPartitionCopyStencilSimpleHost TestPartitionCopyStencilToDiscardIterator TestPartitionCopyToDiscardIterator TestPartitionCudaStreams TestPartitionDeviceSeq TestPartitionDispatchExplicit TestPartitionDispatchImplicit TestPartitionPointCudaStreams TestPartitionPointDevice TestPartitionPointDeviceSeq TestPartitionPointDispatchExplicit TestPartitionPointDispatchImplicit TestPartitionPointHost TestPartitionPointSimpleDevice TestPartitionPointSimpleHost TestPartitionSimpleDevice TestPartitionSimpleHost TestPartitionStencil TestPartitionStencilDeviceSeq TestPartitionStencilDispatchExplicit TestPartitionStencilDispatchImplicit TestPartitionStencilSimpleDevice TestPartitionStencilSimpleHost TestPartitionStencilZipIteratorDevice TestPartitionStencilZipIteratorHost TestPartitionZipIteratorDevice TestPartitionZipIteratorHost TestPermutationIteratorGatherDevice TestPermutationIteratorGatherHost TestPermutationIteratorHostDeviceGather TestPermutationIteratorHostDeviceScatter TestPermutationIteratorReduceDevice TestPermutationIteratorReduceHost TestPermutationIteratorScatterDevice TestPermutationIteratorScatterHost TestPermutationIteratorSimpleDevice TestPermutationIteratorSimpleHost TestPermutationIteratorWithCountingIteratorDevice TestPermutationIteratorWithCountingIteratorHost TestPinnedAllocatorSimple TestPlusFunctionalDevice TestPlusFunctionalHost TestProject1stFunctionalDevice TestProject1stFunctionalHost TestProject2ndFunctionalDevice TestProject2ndFunctionalHost TestRadixSort TestRadixSortByKey TestRadixSortByKeyLongLongValues TestRadixSortByKeyShortValues TestRadixSortKeySimple TestRadixSortKeyValueSimple TestRanlux24BaseEqual TestRanlux24BaseMax TestRanlux24BaseMin TestRanlux24BaseSaveRestore TestRanlux24BaseUnequal TestRanlux24BaseValidation TestRanlux24Equal TestRanlux24Max TestRanlux24Min TestRanlux24SaveRestore TestRanlux24Unequal TestRanlux24Validation TestRanlux48BaseEqual TestRanlux48BaseMax TestRanlux48BaseMin TestRanlux48BaseSaveRestore TestRanlux48BaseUnequal TestRanlux48BaseValidation TestRanlux48Equal TestRanlux48Max TestRanlux48Min TestRanlux48SaveRestore TestRanlux48Unequal TestRanlux48Validation TestRawPointerCastDevice TestRawPointerCastHost TestReduce TestReduceByKey TestReduceByKeyCudaStreams TestReduceByKeyDeviceSeq TestReduceByKeyDispatchExplicit TestReduceByKeyDispatchImplicit TestReduceByKeySimpleDevice TestReduceByKeySimpleHost TestReduceByKeyToDiscardIterator TestReduceCountingIterator TestReduceCudaStreams TestReduceDeviceSeq TestReduceDispatchExplicit TestReduceDispatchImplicit TestReduceMixedTypesDevice TestReduceMixedTypesHost TestReduceSimpleDevice TestReduceSimpleHost TestReduceWithIndirectionDevice TestReduceWithIndirectionHost TestReduceWithLargeTypes TestReduceWithOperator TestRemove TestRemoveCopy TestRemoveCopyCudaStreams TestRemoveCopyDeviceSeq TestRemoveCopyDispatchExplicit TestRemoveCopyDispatchImplicit TestRemoveCopyIf TestRemoveCopyIfCudaStreams TestRemoveCopyIfDeviceSeq TestRemoveCopyIfDispatchExplicit TestRemoveCopyIfDispatchImplicit TestRemoveCopyIfSimpleDevice TestRemoveCopyIfSimpleHost TestRemoveCopyIfStencil TestRemoveCopyIfStencilCudaStreams TestRemoveCopyIfStencilDeviceSeq TestRemoveCopyIfStencilDispatchExplicit TestRemoveCopyIfStencilDispatchImplicit TestRemoveCopyIfStencilSimpleDevice TestRemoveCopyIfStencilSimpleHost TestRemoveCopyIfStencilToDiscardIterator TestRemoveCopyIfToDiscardIterator TestRemoveCopySimpleDevice TestRemoveCopySimpleHost TestRemoveCopyToDiscardIterator TestRemoveCopyToDiscardIteratorZipped TestRemoveCudaStreams TestRemoveDeviceSeq TestRemoveDispatchExplicit TestRemoveDispatchImplicit TestRemoveIf TestRemoveIfCudaStreams TestRemoveIfDeviceSeq TestRemoveIfDispatchExplicit TestRemoveIfDispatchImplicit TestRemoveIfSimpleDevice TestRemoveIfSimpleHost TestRemoveIfStencil TestRemoveIfStencilCudaStreams TestRemoveIfStencilDeviceSeq TestRemoveIfStencilDispatchExplicit TestRemoveIfStencilDispatchImplicit TestRemoveIfStencilSimpleDevice TestRemoveIfStencilSimpleHost TestRemoveSimpleDevice TestRemoveSimpleHost TestReplace TestReplaceCopy TestReplaceCopyDeviceSeq TestReplaceCopyDispatchExplicit TestReplaceCopyDispatchImplicit TestReplaceCopyIf TestReplaceCopyIfDeviceSeq TestReplaceCopyIfDispatchExplicit TestReplaceCopyIfDispatchImplicit TestReplaceCopyIfSimpleDevice TestReplaceCopyIfSimpleHost TestReplaceCopyIfStencil TestReplaceCopyIfStencilDeviceSeq TestReplaceCopyIfStencilDispatchExplicit TestReplaceCopyIfStencilDispatchImplicit TestReplaceCopyIfStencilSimpleDevice TestReplaceCopyIfStencilSimpleHost TestReplaceCopyIfStencilToDiscardIterator TestReplaceCopyIfToDiscardIterator TestReplaceCopySimpleDevice TestReplaceCopySimpleHost TestReplaceCopyToDiscardIterator TestReplaceCudaStreams TestReplaceDeviceSeq TestReplaceDispatchExplicit TestReplaceDispatchImplicit TestReplaceIf TestReplaceIfDeviceSeq TestReplaceIfDispatchExplicit TestReplaceIfDispatchImplicit TestReplaceIfSimpleDevice TestReplaceIfSimpleHost TestReplaceIfStencil TestReplaceIfStencilDeviceSeq TestReplaceIfStencilDispatchExplicit TestReplaceIfStencilDispatchImplicit TestReplaceIfStencilSimpleDevice TestReplaceIfStencilSimpleHost TestReplaceSimpleDevice TestReplaceSimpleHost TestReverse TestReverseCopy TestReverseCopyDeviceSeq TestReverseCopyDispatchExplicit TestReverseCopyDispatchImplicit TestReverseCopySimpleDevice TestReverseCopySimpleHost TestReverseCopyToDiscardIterator TestReverseCudaStreams TestReverseDeviceSeq TestReverseDispatchExplicit TestReverseDispatchImplicit TestReverseIteratorCopyConstructor TestReverseIteratorCopyDevice TestReverseIteratorCopyHost TestReverseIteratorExclusiveScan TestReverseIteratorExclusiveScanSimple TestReverseIteratorIncrement TestReverseSimpleDevice TestReverseSimpleHost TestScalarBinarySearchDescendingSimpleDevice TestScalarBinarySearchDescendingSimpleHost TestScalarBinarySearchDispatchExplicit TestScalarBinarySearchDispatchImplicit TestScalarBinarySearchSimpleDevice TestScalarBinarySearchSimpleHost TestScalarEqualRangeDescendingSimpleDevice TestScalarEqualRangeDescendingSimpleHost TestScalarEqualRangeDispatchExplicit TestScalarEqualRangeDispatchImplicit TestScalarEqualRangeSimpleDevice TestScalarEqualRangeSimpleHost TestScalarLowerBoundDescendingSimpleDevice TestScalarLowerBoundDescendingSimpleHost TestScalarLowerBoundDispatchExplicit TestScalarLowerBoundDispatchImplicit TestScalarLowerBoundSimpleDevice TestScalarLowerBoundSimpleHost TestScalarUpperBoundDescendingSimpleDevice TestScalarUpperBoundDescendingSimpleHost TestScalarUpperBoundDispatchExplicit TestScalarUpperBoundDispatchImplicit TestScalarUpperBoundSimpleDevice TestScalarUpperBoundSimpleHost TestScan TestScanByKeyDeviceSeq TestScanByKeyHeadFlagsDevice TestScanByKeyHeadFlagsHost TestScanByKeyLargeInput TestScanByKeyMixedTypes TestScanByKeyReusedKeysDevice TestScanByKeyReusedKeysHost TestScanByKeyWithLargeTypes TestScanCudaStreams TestScanDeviceDevice TestScanDeviceSeq TestScanMixedTypes TestScanMixedTypesDevice TestScanMixedTypesHost TestScanSimpleDevice TestScanSimpleHost TestScanToDiscardIterator TestScanWithLargeTypes TestScanWithOperator TestScanWithOperatorToDiscardIterator TestScatter TestScatterCountingIteratorDevice TestScatterCountingIteratorHost TestScatterCudaStreams TestScatterDeviceSeq TestScatterDispatchExplicit TestScatterDispatchImplicit TestScatterIf TestScatterIfCountingIteratorDevice TestScatterIfCountingIteratorHost TestScatterIfCudaStreams TestScatterIfDeviceSeq TestScatterIfDispatchExplicit TestScatterIfDispatchImplicit TestScatterIfSimpleDevice TestScatterIfSimpleHost TestScatterIfToDiscardIterator TestScatterSimpleDevice TestScatterSimpleHost TestScatterToDiscardIterator TestSelectSystemCudaToCpp TestSelectSystemDifferentTypes TestSelectSystemSameTypes TestSequence TestSequenceCudaStreams TestSequenceDeviceSeq TestSequenceDispatchExplicit TestSequenceDispatchImplicit TestSequenceSimpleDevice TestSequenceSimpleHost TestSequenceToDiscardIterator TestSetDifference TestSetDifferenceByKey TestSetDifferenceByKeyCudaStreams TestSetDifferenceByKeyDescending TestSetDifferenceByKeyDescendingSimpleDevice TestSetDifferenceByKeyDescendingSimpleHost TestSetDifferenceByKeyDeviceSeq TestSetDifferenceByKeyDispatchExplicit TestSetDifferenceByKeyDispatchImplicit TestSetDifferenceByKeyEquivalentRanges TestSetDifferenceByKeyMultiset TestSetDifferenceByKeySimpleDevice TestSetDifferenceByKeySimpleHost TestSetDifferenceCudaStreams TestSetDifferenceDescending TestSetDifferenceDescendingSimpleDevice TestSetDifferenceDescendingSimpleHost TestSetDifferenceDeviceSeq TestSetDifferenceDispatchExplicit TestSetDifferenceDispatchImplicit TestSetDifferenceEquivalentRanges TestSetDifferenceKeyValue TestSetDifferenceMultiset TestSetDifferenceSimpleDevice TestSetDifferenceSimpleHost TestSetIntersection TestSetIntersectionByKey TestSetIntersectionByKeyCudaStreams TestSetIntersectionByKeyDescending TestSetIntersectionByKeyDescendingSimpleDevice TestSetIntersectionByKeyDescendingSimpleHost TestSetIntersectionByKeyDeviceSeq TestSetIntersectionByKeyDispatchExplicit TestSetIntersectionByKeyDispatchImplicit TestSetIntersectionByKeyEquivalentRanges TestSetIntersectionByKeyMultiset TestSetIntersectionByKeySimpleDevice TestSetIntersectionByKeySimpleHost TestSetIntersectionCudaStreams TestSetIntersectionDescending TestSetIntersectionDescendingSimpleDevice TestSetIntersectionDescendingSimpleHost TestSetIntersectionDeviceSeq TestSetIntersectionDispatchExplicit TestSetIntersectionDispatchImplicit TestSetIntersectionEquivalentRanges TestSetIntersectionKeyValue TestSetIntersectionMultiset TestSetIntersectionSimpleDevice TestSetIntersectionSimpleHost TestSetIntersectionToDiscardIterator TestSetSymmetricDifference TestSetSymmetricDifferenceByKey TestSetSymmetricDifferenceByKeyCudaStreams TestSetSymmetricDifferenceByKeyDescending TestSetSymmetricDifferenceByKeyDescendingSimpleDevice TestSetSymmetricDifferenceByKeyDescendingSimpleHost TestSetSymmetricDifferenceByKeyDeviceSeq TestSetSymmetricDifferenceByKeyDispatchExplicit TestSetSymmetricDifferenceByKeyDispatchImplicit TestSetSymmetricDifferenceByKeyEquivalentRanges TestSetSymmetricDifferenceByKeyMultiset TestSetSymmetricDifferenceByKeySimpleDevice TestSetSymmetricDifferenceByKeySimpleHost TestSetSymmetricDifferenceCudaStreams TestSetSymmetricDifferenceDescending TestSetSymmetricDifferenceDescendingSimpleDevice TestSetSymmetricDifferenceDescendingSimpleHost TestSetSymmetricDifferenceDeviceSeq TestSetSymmetricDifferenceDispatchExplicit TestSetSymmetricDifferenceDispatchImplicit TestSetSymmetricDifferenceEquivalentRanges TestSetSymmetricDifferenceKeyValue TestSetSymmetricDifferenceMultiset TestSetSymmetricDifferenceSimpleDevice TestSetSymmetricDifferenceSimpleHost TestSetUnion TestSetUnionByKey TestSetUnionByKeyCudaStreams TestSetUnionByKeyDescending TestSetUnionByKeyDescendingSimpleDevice TestSetUnionByKeyDescendingSimpleHost TestSetUnionByKeyDeviceSeq TestSetUnionByKeyDispatchExplicit TestSetUnionByKeyDispatchImplicit TestSetUnionByKeyEquivalentRanges TestSetUnionByKeyMultiset TestSetUnionByKeySimpleDevice TestSetUnionByKeySimpleHost TestSetUnionCudaStreams TestSetUnionDescending TestSetUnionDescendingSimpleDevice TestSetUnionDescendingSimpleHost TestSetUnionDeviceSeq TestSetUnionDispatchExplicit TestSetUnionDispatchImplicit TestSetUnionKeyValue TestSetUnionKeyValueDescending TestSetUnionSimpleDevice TestSetUnionSimpleHost TestSetUnionToDiscardIterator TestSetUnionWithEquivalentElementsSimpleDevice TestSetUnionWithEquivalentElementsSimpleHost TestSortAscendingKey TestSortAscendingKeyValue TestSortBool TestSortBoolDescending TestSortByKeyBool TestSortByKeyBoolDescending TestSortByKeyCudaStreams TestSortByKeyDeviceSeq TestSortByKeyDispatchExplicit TestSortByKeyDispatchImplicit TestSortByKeyPermutationIteratorDevice TestSortByKeyPermutationIteratorHost TestSortByKeySimpleDevice TestSortByKeySimpleHost TestSortByKeyVariableBits TestSortCudaStreams TestSortDescendingKey TestSortDescendingKeyValue TestSortDeviceSeq TestSortDispatchExplicit TestSortDispatchImplicit TestSortPermutationIteratorDevice TestSortPermutationIteratorHost TestSortSimpleDevice TestSortSimpleHost TestSortVariableBits TestStablePartition TestStablePartitionCopy TestStablePartitionCopyDeviceSeq TestStablePartitionCopyDispatchExplicit TestStablePartitionCopyDispatchImplicit TestStablePartitionCopySimpleDevice TestStablePartitionCopySimpleHost TestStablePartitionCopyStencil TestStablePartitionCopyStencilDispatchExplicit TestStablePartitionCopyStencilDispatchImplicit TestStablePartitionCopyStencilSimpleDevice TestStablePartitionCopyStencilSimpleHost TestStablePartitionCopyStencilToDiscardIterator TestStablePartitionCopyToDiscardIterator TestStablePartitionDeviceSeq TestStablePartitionDispatchExplicit TestStablePartitionDispatchImplicit TestStablePartitionSimpleDevice TestStablePartitionSimpleHost TestStablePartitionStencil TestStablePartitionStencilDeviceSeq TestStablePartitionStencilDispatchExplicit TestStablePartitionStencilDispatchImplicit TestStablePartitionStencilSimpleDevice TestStablePartitionStencilSimpleHost TestStablePartitionStencilZipIteratorDevice TestStablePartitionStencilZipIteratorHost TestStablePartitionZipIteratorDevice TestStablePartitionZipIteratorHost TestStableSort TestStableSortByKey TestStableSortByKeyDispatchExplicit TestStableSortByKeyDispatchImplicit TestStableSortByKeyPermutationIteratorDevice TestStableSortByKeyPermutationIteratorHost TestStableSortByKeySemantics TestStableSortByKeySimpleDevice TestStableSortByKeySimpleHost TestStableSortByKeyWithLargeKeys TestStableSortByKeyWithLargeKeysAndValues TestStableSortByKeyWithLargeValues TestStableSortDispatchExplicit TestStableSortDispatchImplicit TestStableSortPermutationIteratorDevice TestStableSortPermutationIteratorHost TestStableSortSemantics TestStableSortSimpleDevice TestStableSortSimpleHost TestStableSortWithIndirectionDevice TestStableSortWithIndirectionHost TestStableSortWithLargeKeys TestStandardIntegerTypes TestSwapRanges TestSwapRangesCudaStreams TestSwapRangesDeviceSeq TestSwapRangesDispatchExplicit TestSwapRangesDispatchImplicit TestSwapRangesSimpleDevice TestSwapRangesSimpleHost TestSwapRangesUserSwap TestTabulate TestTabulateCudaStreams TestTabulateDeviceSeq TestTabulateDispatchExplicit TestTabulateDispatchImplicit TestTabulateSimpleDevice TestTabulateSimpleHost TestTabulateToDiscardIterator TestTaus88Equal TestTaus88Max TestTaus88Min TestTaus88SaveRestore TestTaus88Unequal TestTaus88Validation TestTransformBinary TestTransformBinaryCountingIterator TestTransformBinaryCudaStreams TestTransformBinaryDeviceSeq TestTransformBinaryDispatchExplicit TestTransformBinaryDispatchImplicit TestTransformBinarySimpleDevice TestTransformBinarySimpleHost TestTransformBinaryToDiscardIterator TestTransformExclusiveScanDispatchExplicit TestTransformExclusiveScanDispatchImplicit TestTransformIfBinary TestTransformIfBinaryDeviceSeq TestTransformIfBinaryDispatchExplicit TestTransformIfBinaryDispatchImplicit TestTransformIfBinarySimpleDevice TestTransformIfBinarySimpleHost TestTransformIfBinaryToDiscardIterator TestTransformIfUnary TestTransformIfUnaryDeviceSeq TestTransformIfUnaryDispatchExplicit TestTransformIfUnaryDispatchImplicit TestTransformIfUnaryNoStencil TestTransformIfUnaryNoStencilDeviceSeq TestTransformIfUnaryNoStencilDispatchExplicit TestTransformIfUnaryNoStencilDispatchImplicit TestTransformIfUnaryNoStencilSimpleDevice TestTransformIfUnaryNoStencilSimpleHost TestTransformIfUnarySimpleDevice TestTransformIfUnarySimpleHost TestTransformIfUnaryToDiscardIterator TestTransformInclusiveScanDispatchExplicit TestTransformInclusiveScanDispatchImplicit TestTransformIteratorDevice TestTransformIteratorHost TestTransformIteratorReduce TestTransformReduce TestTransformReduceCountingIteratorDevice TestTransformReduceCountingIteratorHost TestTransformReduceCudaStreams TestTransformReduceDeviceSeq TestTransformReduceDispatchExplicit TestTransformReduceDispatchImplicit TestTransformReduceFromConst TestTransformReduceSimpleDevice TestTransformReduceSimpleHost TestTransformScan TestTransformScanCountingIteratorDevice TestTransformScanCountingIteratorHost TestTransformScanCudaStreams TestTransformScanDeviceSeq TestTransformScanSimpleDevice TestTransformScanSimpleHost TestTransformScanToDiscardIterator TestTransformUnary TestTransformUnaryCountingIterator TestTransformUnaryCudaStreams TestTransformUnaryDeviceSeq TestTransformUnaryDispatchExplicit TestTransformUnaryDispatchImplicit TestTransformUnarySimpleDevice TestTransformUnarySimpleHost TestTransformUnaryToDiscardIterator TestTransformUnaryToDiscardIteratorZipped TestTransformWithIndirectionDevice TestTransformWithIndirectionHost TestTrivialSequenceDevice TestTrivialSequenceHost TestTupleComparison TestTupleConstructor TestTupleGet TestTupleReduce TestTupleScan TestTupleStableSort TestTupleSwap TestTupleTie TestTupleTransform TestTypeName TestUniformDecomposition TestUniformIntDistributionMax TestUniformIntDistributionMin TestUniformIntDistributionSaveRestore TestUniformRealDistributionMax TestUniformRealDistributionMin TestUniformRealDistributionSaveRestore TestUninitializedCopyCudaStreams TestUninitializedCopyDeviceSeq TestUninitializedCopyDispatchExplicit TestUninitializedCopyDispatchImplicit TestUninitializedCopyNCudaStreams TestUninitializedCopyNDeviceSeq TestUninitializedCopyNDispatchExplicit TestUninitializedCopyNDispatchImplicit TestUninitializedCopyNNonPODDevice TestUninitializedCopyNNonPODHost TestUninitializedCopyNSimplePODDevice TestUninitializedCopyNSimplePODHost TestUninitializedCopyNonPODDevice TestUninitializedCopyNonPODHost TestUninitializedCopySimplePODDevice TestUninitializedCopySimplePODHost TestUninitializedFillCudaStreams TestUninitializedFillDeviceSeq TestUninitializedFillDispatchExplicit TestUninitializedFillDispatchImplicit TestUninitializedFillNCudaStreams TestUninitializedFillNDeviceSeq TestUninitializedFillNDispatchExplicit TestUninitializedFillNDispatchImplicit TestUninitializedFillNNonPOD TestUninitializedFillNPODDevice TestUninitializedFillNPODHost TestUninitializedFillNonPOD TestUninitializedFillPODDevice TestUninitializedFillPODHost TestUnique TestUniqueByKey TestUniqueByKeyCopyDispatchExplicit TestUniqueByKeyCopyDispatchImplicit TestUniqueByKeyCudaStreams TestUniqueByKeyDeviceSeq TestUniqueByKeyDispatchExplicit TestUniqueByKeyDispatchImplicit TestUniqueByKeySimpleDevice TestUniqueByKeySimpleHost TestUniqueCopy TestUniqueCopyByKey TestUniqueCopyByKeyCudaStreams TestUniqueCopyByKeyDeviceSeq TestUniqueCopyByKeySimpleDevice TestUniqueCopyByKeySimpleHost TestUniqueCopyByKeyToDiscardIterator TestUniqueCopyCudaStreams TestUniqueCopyDeviceSeq TestUniqueCopyDispatchExplicit TestUniqueCopyDispatchImplicit TestUniqueCopySimpleDevice TestUniqueCopySimpleHost TestUniqueCopyToDiscardIterator TestUniqueCudaStreams TestUniqueDeviceSeq TestUniqueDispatchExplicit TestUniqueDispatchImplicit TestUniqueSimpleDevice TestUniqueSimpleHost TestUnknownDeviceRobustness TestVectorAssignFromBiDirectionalIteratorDevice TestVectorAssignFromBiDirectionalIteratorHost TestVectorAssignFromDeviceVectorDevice TestVectorAssignFromDeviceVectorHost TestVectorAssignFromHostVectorDevice TestVectorAssignFromHostVectorHost TestVectorAssignFromSTLVectorDevice TestVectorAssignFromSTLVectorHost TestVectorBinarySearch TestVectorBinarySearchDescending TestVectorBinarySearchDescendingSimpleDevice TestVectorBinarySearchDescendingSimpleHost TestVectorBinarySearchDiscardIterator TestVectorBinarySearchDispatchExplicit TestVectorBinarySearchDispatchImplicit TestVectorBinarySearchSimpleDevice TestVectorBinarySearchSimpleHost TestVectorBool TestVectorContainingLargeType TestVectorCppZeroSizeDevice TestVectorCppZeroSizeHost TestVectorDataDevice TestVectorDataHost TestVectorElementAssignmentDevice TestVectorElementAssignmentHost TestVectorEquality TestVectorErasePositionDevice TestVectorErasePositionHost TestVectorEraseRangeDevice TestVectorEraseRangeHost TestVectorFillAssignDevice TestVectorFillAssignHost TestVectorFillInsert TestVectorFillInsertSimple TestVectorFillInsertSimple TestVectorFromBiDirectionalIteratorDevice TestVectorFromBiDirectionalIteratorHost TestVectorFromSTLVectorDevice TestVectorFromSTLVectorHost TestVectorFrontBackDevice TestVectorFrontBackHost TestVectorInequality TestVectorLowerBound TestVectorLowerBoundDescending TestVectorLowerBoundDescendingSimpleDevice TestVectorLowerBoundDescendingSimpleHost TestVectorLowerBoundDiscardIterator TestVectorLowerBoundDispatchExplicit TestVectorLowerBoundDispatchImplicit TestVectorLowerBoundSimpleDevice TestVectorLowerBoundSimpleHost TestVectorManipulationDevice TestVectorManipulationHost TestVectorRangeInsert TestVectorRangeInsertSimple TestVectorRangeInsertSimple TestVectorReservingDevice TestVectorReservingHost TestVectorResizingDevice TestVectorResizingHost TestVectorReversedDevice TestVectorReversedHost TestVectorShrinkToFitDevice TestVectorShrinkToFitHost TestVectorSwapDevice TestVectorSwapHost TestVectorToAndFromDeviceVectorDevice TestVectorToAndFromDeviceVectorHost TestVectorToAndFromHostVectorDevice TestVectorToAndFromHostVectorHost TestVectorUpperBound TestVectorUpperBoundDescending TestVectorUpperBoundDescendingSimpleDevice TestVectorUpperBoundDescendingSimpleHost TestVectorUpperBoundDiscardIterator TestVectorUpperBoundDispatchExplicit TestVectorUpperBoundDispatchImplicit TestVectorUpperBoundSimpleDevice TestVectorUpperBoundSimpleHost TestVectorWithInitialValueDevice TestVectorWithInitialValueHost TestVectorZeroSizeDevice TestVectorZeroSizeHost TestZipIteratorCopyAoSToSoA TestZipIteratorCopyDevice TestZipIteratorCopyHost TestZipIteratorCopySoAToAoS TestZipIteratorManipulation TestZipIteratorReduce TestZipIteratorReduceByKey TestZipIteratorReference TestZipIteratorScan TestZipIteratorStableSort TestZipIteratorStableSortByKey TestZipIteratorSystem TestZipIteratorTransform TestZipIteratorTraversal TestZippedDiscardIterator thrust-1.9.5/internal/test/unittest_omp.lst000066400000000000000000000564151344621116200211430ustar00rootroot00000000000000TestAdjacentDifference TestAdjacentDifferenceDiscardIterator TestAdjacentDifferenceInPlaceWithRelatedIteratorTypes TestAdjacentDifferenceSimpleDevice TestAdjacentDifferenceSimpleHost TestAdvanceDevice TestAdvanceHost TestAllOfDevice TestAllOfHost TestAnyOfDevice TestAnyOfHost TestAssertEqual TestAssertGEqual TestAssertLEqual TestBitAndFunctionalDevice TestBitAndFunctionalHost TestBitOrFunctionalDevice TestBitOrFunctionalHost TestBitXorFunctionalDevice TestBitXorFunctionalHost TestComputeCapability TestConstantIteratorComparison TestConstantIteratorConstructFromConvertibleSpace TestConstantIteratorCopyDevice TestConstantIteratorCopyHost TestConstantIteratorIncrement TestConstantIteratorReduce TestConstantIteratorTransformDevice TestConstantIteratorTransformHost TestCopyConstantIteratorToZipIteratorDevice TestCopyConstantIteratorToZipIteratorHost TestCopyCountingIteratorDevice TestCopyCountingIteratorHost TestCopyDeviceThrow TestCopyFromConstIterator TestCopyIf TestCopyIfSimpleDevice TestCopyIfSimpleHost TestCopyIfStencil TestCopyIfStencilSimpleDevice TestCopyIfStencilSimpleHost TestCopyListToDevice TestCopyListToHost TestCopyMatchingTypesDevice TestCopyMatchingTypesHost TestCopyMixedTypesDevice TestCopyMixedTypesHost TestCopyNConstantIteratorToZipIteratorDevice TestCopyNConstantIteratorToZipIteratorHost TestCopyNCountingIteratorDevice TestCopyNCountingIteratorHost TestCopyNFromConstIterator TestCopyNListToDevice TestCopyNListToHost TestCopyNMatchingTypesDevice TestCopyNMatchingTypesHost TestCopyNMixedTypesDevice TestCopyNMixedTypesHost TestCopyNToDiscardIterator TestCopyNVectorBool TestCopyNZipIteratorDevice TestCopyNZipIteratorHost TestCopyToDiscardIterator TestCopyToDiscardIteratorZipped TestCopyVectorBool TestCopyZipIteratorDevice TestCopyZipIteratorHost TestCount TestCountFromConstIteratorSimpleDevice TestCountFromConstIteratorSimpleHost TestCountIf TestCountIfSimpleDevice TestCountIfSimpleHost TestCountSimpleDevice TestCountSimpleHost TestCountingIteratorComparison TestCountingIteratorCopyConstructor TestCountingIteratorDifference TestCountingIteratorDistance TestCountingIteratorIncrement TestCountingIteratorLowerBound TestCountingIteratorUnsignedType TestDeviceDeleteDestructorInvocation TestDeviceDereferenceCountingIterator TestDeviceDereferenceDevicePtr TestDeviceDereferenceDeviceVectorIterator TestDeviceDereferenceTransformIterator TestDeviceDereferenceTransformedCountingIterator TestDevicePointerManipulation TestDeviceReferenceAssignmentFromDeviceReference TestDeviceReferenceConstructorFromDevicePointer TestDeviceReferenceConstructorFromDeviceReference TestDeviceReferenceManipulation TestDiscardIteratorComparison TestDiscardIteratorIncrement TestDistanceDevice TestDistanceHost TestDividesFunctionalDevice TestDividesFunctionalHost TestEqual TestEqualSimpleDevice TestEqualSimpleHost TestEqualToFunctionalDevice TestEqualToFunctionalHost TestExclusiveScan32 TestExclusiveScanByKeySimpleDevice TestExclusiveScanByKeySimpleHost TestExclusiveScanNullPtr TestFill TestFillDiscardIterator TestFillMixedTypesDevice TestFillMixedTypesHost TestFillN TestFillNDiscardIterator TestFillNMixedTypesDevice TestFillNMixedTypesHost TestFillNSimpleDevice TestFillNSimpleHost TestFillSimpleDevice TestFillSimpleHost TestFillTuple TestFillWithNonTrivialAssignment TestFillWithTrivialAssignment TestFillZipIteratorDevice TestFillZipIteratorHost TestFind TestFindIf TestFindIfNot TestFindIfNotSimpleDevice TestFindIfNotSimpleHost TestFindIfSimpleDevice TestFindIfSimpleHost TestFindSimpleDevice TestFindSimpleHost TestForEach TestForEachLargeRegisterFootprint TestForEachSimpleAnySpace TestForEachSimpleDevice TestForEachSimpleHost TestForEachWithLargeTypes TestFunctionalPlaceholdersBinaryEqualToDevice TestFunctionalPlaceholdersBinaryEqualToHost TestFunctionalPlaceholdersBinaryGreaterDevice TestFunctionalPlaceholdersBinaryGreaterEqualDevice TestFunctionalPlaceholdersBinaryGreaterEqualHost TestFunctionalPlaceholdersBinaryGreaterHost TestFunctionalPlaceholdersBinaryLessDevice TestFunctionalPlaceholdersBinaryLessEqualDevice TestFunctionalPlaceholdersBinaryLessEqualHost TestFunctionalPlaceholdersBinaryLessHost TestFunctionalPlaceholdersBinaryNotEqualToDevice TestFunctionalPlaceholdersBinaryNotEqualToHost TestFunctionalPlaceholdersBitAnd TestFunctionalPlaceholdersBitAnd TestFunctionalPlaceholdersBitAndEqual TestFunctionalPlaceholdersBitAndEqual TestFunctionalPlaceholdersBitNegateDevice TestFunctionalPlaceholdersBitNegateHost TestFunctionalPlaceholdersBitOr TestFunctionalPlaceholdersBitOr TestFunctionalPlaceholdersBitOrEqual TestFunctionalPlaceholdersBitOrEqual TestFunctionalPlaceholdersBitRshiftEqual TestFunctionalPlaceholdersBitRshiftEqual TestFunctionalPlaceholdersBitXor TestFunctionalPlaceholdersBitXor TestFunctionalPlaceholdersBitXorEqual TestFunctionalPlaceholdersBitXorEqual TestFunctionalPlaceholdersDivides TestFunctionalPlaceholdersDivides TestFunctionalPlaceholdersDividesEqual TestFunctionalPlaceholdersDividesEqual TestFunctionalPlaceholdersLogicalAndDevice TestFunctionalPlaceholdersLogicalAndHost TestFunctionalPlaceholdersLogicalNotDevice TestFunctionalPlaceholdersLogicalNotHost TestFunctionalPlaceholdersLogicalOrDevice TestFunctionalPlaceholdersLogicalOrHost TestFunctionalPlaceholdersMinus TestFunctionalPlaceholdersMinus TestFunctionalPlaceholdersMinusEqual TestFunctionalPlaceholdersMinusEqual TestFunctionalPlaceholdersModulus TestFunctionalPlaceholdersModulus TestFunctionalPlaceholdersModulusEqual TestFunctionalPlaceholdersModulusEqual TestFunctionalPlaceholdersMultiplies TestFunctionalPlaceholdersMultiplies TestFunctionalPlaceholdersMultipliesEqual TestFunctionalPlaceholdersMultipliesEqual TestFunctionalPlaceholdersNegateDevice TestFunctionalPlaceholdersNegateHost TestFunctionalPlaceholdersPlus TestFunctionalPlaceholdersPlus TestFunctionalPlaceholdersPlusEqual TestFunctionalPlaceholdersPlusEqual TestFunctionalPlaceholdersPrefixDecrementDevice TestFunctionalPlaceholdersPrefixDecrementHost TestFunctionalPlaceholdersPrefixIncrementDevice TestFunctionalPlaceholdersPrefixIncrementHost TestFunctionalPlaceholdersSuffixDecrementDevice TestFunctionalPlaceholdersSuffixDecrementHost TestFunctionalPlaceholdersSuffixIncrementDevice TestFunctionalPlaceholdersSuffixIncrementHost TestFunctionalPlaceholdersTransformIterator TestFunctionalPlaceholdersTransformIterator TestFunctionalPlaceholdersUnaryPlusDevice TestFunctionalPlaceholdersUnaryPlusHost TestFunctionalPlaceholdersValue TestFunctionalPlaceholdersValue TestGather TestGatherCountingIteratorDevice TestGatherCountingIteratorHost TestGatherIf TestGatherIfSimpleDevice TestGatherIfSimpleHost TestGatherIfToDiscardIterator TestGatherSimpleDevice TestGatherSimpleHost TestGatherToDiscardIterator TestGenerate TestGenerateNSimpleDevice TestGenerateNSimpleHost TestGenerateNToDiscardIterator TestGenerateSimpleDevice TestGenerateSimpleHost TestGenerateToDiscardIterator TestGenerateTuple TestGenerateZipIteratorDevice TestGenerateZipIteratorHost TestGreaterEqualFunctionalDevice TestGreaterEqualFunctionalHost TestGreaterFunctionalDevice TestGreaterFunctionalHost TestIdentityFunctionalDevice TestIdentityFunctionalHost TestInclusiveScan32 TestInclusiveScanByKeySimpleDevice TestInclusiveScanByKeySimpleHost TestInclusiveScanByKeyTransformIteratorDevice TestInclusiveScanByKeyTransformIteratorHost TestInclusiveScanWithIndirectionDevice TestInclusiveScanWithIndirectionHost TestInnerProduct TestInnerProductSimpleDevice TestInnerProductSimpleHost TestInnerProductWithOperatorDevice TestInnerProductWithOperatorHost TestIsCommutative TestIsPartitionedDevice TestIsPartitionedHost TestIsPartitionedSimpleDevice TestIsPartitionedSimpleHost TestIsPlainOldData TestIsSortedDevice TestIsSortedHost TestIsSortedRepeatedElementsDevice TestIsSortedRepeatedElementsHost TestIsSortedSimpleDevice TestIsSortedSimpleHost TestIsSortedUntilDevice TestIsSortedUntilHost TestIsSortedUntilRepeatedElementsDevice TestIsSortedUntilRepeatedElementsHost TestIsSortedUntilSimpleDevice TestIsSortedUntilSimpleHost TestIsTrivialIterator TestLessEqualFunctionalDevice TestLessEqualFunctionalHost TestLessFunctionalDevice TestLessFunctionalHost TestLog2 TestLogicalAndFunctionalDevice TestLogicalAndFunctionalHost TestLogicalNotFunctionalDevice TestLogicalNotFunctionalHost TestLogicalOrFunctionalDevice TestLogicalOrFunctionalHost TestMakeConstantIterator TestMakeDevicePointer TestMakeDiscardIterator TestMakePermutationIteratorDevice TestMakePermutationIteratorHost TestMakeTransformIteratorDevice TestMakeTransformIteratorHost TestMakeTuple TestMax TestMaxActiveBlocks TestMaxBlocksize TestMaxBlocksizeWithHighestOccupancy TestMaxElement TestMaxElementSimpleDevice TestMaxElementSimpleHost TestMaximumFunctionalDevice TestMaximumFunctionalHost TestMerge TestMergeDescending TestMergeKeyValue TestMergeKeyValueDescending TestMergeSimpleDevice TestMergeSimpleHost TestMergeSortAscendingKey TestMergeSortAscendingKeyValue TestMergeSortDescendingKey TestMergeSortDescendingKeyValue TestMergeSortKeySimple TestMergeSortKeyValueSimple TestMergeSortStableKeySimple TestMergeToDiscardIterator TestMin TestMinElement TestMinElementSimpleDevice TestMinElementSimpleHost TestMinMaxElement TestMinMaxElementSimpleDevice TestMinMaxElementSimpleHost TestMinimumFunctionalDevice TestMinimumFunctionalHost TestMinstdRand0Equal TestMinstdRand0Max TestMinstdRand0Min TestMinstdRand0SaveRestore TestMinstdRand0Unequal TestMinstdRand0Validation TestMinstdRandEqual TestMinstdRandMax TestMinstdRandMin TestMinstdRandSaveRestore TestMinstdRandUnequal TestMinstdRandValidation TestMinusFunctionalDevice TestMinusFunctionalHost TestMismatchSimpleDevice TestMismatchSimpleHost TestModulusFunctionalDevice TestModulusFunctionalHost TestMultipliesFunctionalDevice TestMultipliesFunctionalHost TestNegateFunctionalDevice TestNegateFunctionalHost TestNoneOfDevice TestNoneOfHost TestNot1Device TestNot1Host TestNot2Device TestNot2Host TestNotEqualToFunctionalDevice TestNotEqualToFunctionalHost TestNullPtrDereferenceYieldsError TestPairComparison TestPairGet TestPairManipulation TestPairReduce TestPairScan TestPairScanByKey TestPairStableSort TestPairStableSortByKey TestPairTransform TestPairTupleElement TestPairTupleSize TestPartition TestPartitionCopy TestPartitionCopySimpleDevice TestPartitionCopySimpleHost TestPartitionCopyToDiscardIterator TestPartitionPointDevice TestPartitionPointHost TestPartitionPointSimpleDevice TestPartitionPointSimpleHost TestPartitionSimpleDevice TestPartitionSimpleHost TestPartitionZipIteratorDevice TestPartitionZipIteratorHost TestPermutationIteratorGatherDevice TestPermutationIteratorGatherHost TestPermutationIteratorHostDeviceGather TestPermutationIteratorHostDeviceScatter TestPermutationIteratorReduceDevice TestPermutationIteratorReduceHost TestPermutationIteratorScatterDevice TestPermutationIteratorScatterHost TestPermutationIteratorSimpleDevice TestPermutationIteratorSimpleHost TestPermutationIteratorWithCountingIteratorDevice TestPermutationIteratorWithCountingIteratorHost TestPlusFunctionalDevice TestPlusFunctionalHost TestProject1stFunctionalDevice TestProject1stFunctionalHost TestProject2ndFunctionalDevice TestProject2ndFunctionalHost TestRadixSort TestRadixSortByKey TestRadixSortByKeyLongLongValues TestRadixSortByKeyShortValues TestRadixSortByKeyUnaligned TestRadixSortKeySimple TestRadixSortKeyValueSimple TestRanlux24BaseEqual TestRanlux24BaseMax TestRanlux24BaseMin TestRanlux24BaseSaveRestore TestRanlux24BaseUnequal TestRanlux24BaseValidation TestRanlux24Equal TestRanlux24Max TestRanlux24Min TestRanlux24SaveRestore TestRanlux24Unequal TestRanlux24Validation TestRanlux48BaseEqual TestRanlux48BaseMax TestRanlux48BaseMin TestRanlux48BaseSaveRestore TestRanlux48BaseUnequal TestRanlux48BaseValidation TestRanlux48Equal TestRanlux48Max TestRanlux48Min TestRanlux48SaveRestore TestRanlux48Unequal TestRanlux48Validation TestRawPointerCastDevice TestRawPointerCastHost TestReduce TestReduceByKey TestReduceByKeySimpleDevice TestReduceByKeySimpleHost TestReduceByKeyToDiscardIterator TestReduceIntervals TestReduceIntervalsSimpleDevice TestReduceIntervalsSimpleHost TestReduceMixedTypesDevice TestReduceMixedTypesHost TestReduceNullPtr TestReduceSimpleDevice TestReduceSimpleHost TestReduceWithIndirectionDevice TestReduceWithIndirectionHost TestReduceWithLargeTypes TestReduceWithOperator TestRemove TestRemoveCopy TestRemoveCopyIf TestRemoveCopyIfSimpleDevice TestRemoveCopyIfSimpleHost TestRemoveCopyIfStencil TestRemoveCopyIfStencilSimpleDevice TestRemoveCopyIfStencilSimpleHost TestRemoveCopyIfStencilToDiscardIterator TestRemoveCopyIfToDiscardIterator TestRemoveCopySimpleDevice TestRemoveCopySimpleHost TestRemoveCopyToDiscardIterator TestRemoveCopyToDiscardIteratorZipped TestRemoveIf TestRemoveIfSimpleDevice TestRemoveIfSimpleHost TestRemoveIfStencil TestRemoveIfStencilSimpleDevice TestRemoveIfStencilSimpleHost TestRemoveSimpleDevice TestRemoveSimpleHost TestReplace TestReplaceCopy TestReplaceCopyIf TestReplaceCopyIfSimpleDevice TestReplaceCopyIfSimpleHost TestReplaceCopyIfStencil TestReplaceCopyIfStencilSimpleDevice TestReplaceCopyIfStencilSimpleHost TestReplaceCopyIfStencilToDiscardIterator TestReplaceCopyIfToDiscardIterator TestReplaceCopySimpleDevice TestReplaceCopySimpleHost TestReplaceCopyToDiscardIterator TestReplaceIf TestReplaceIfSimpleDevice TestReplaceIfSimpleHost TestReplaceIfStencil TestReplaceIfStencilSimpleDevice TestReplaceIfStencilSimpleHost TestReplaceSimpleDevice TestReplaceSimpleHost TestReverse TestReverseCopy TestReverseCopySimpleDevice TestReverseCopySimpleHost TestReverseCopyToDiscardIterator TestReverseIteratorCopyConstructor TestReverseIteratorCopyDevice TestReverseIteratorCopyHost TestReverseIteratorExclusiveScan TestReverseIteratorExclusiveScanSimple TestReverseIteratorIncrement TestReverseSimpleDevice TestReverseSimpleHost TestScalarBinarySearchDescendingSimpleDevice TestScalarBinarySearchDescendingSimpleHost TestScalarBinarySearchSimpleDevice TestScalarBinarySearchSimpleHost TestScalarEqualRangeDescendingSimpleDevice TestScalarEqualRangeDescendingSimpleHost TestScalarEqualRangeSimpleDevice TestScalarEqualRangeSimpleHost TestScalarLowerBoundDescendingSimpleDevice TestScalarLowerBoundDescendingSimpleHost TestScalarLowerBoundSimpleDevice TestScalarLowerBoundSimpleHost TestScalarUpperBoundDescendingSimpleDevice TestScalarUpperBoundDescendingSimpleHost TestScalarUpperBoundSimpleDevice TestScalarUpperBoundSimpleHost TestScan TestScanByKeyHeadFlagsDevice TestScanByKeyHeadFlagsHost TestScanByKeyLargeInput TestScanByKeyMixedTypes TestScanByKeyReusedKeysDevice TestScanByKeyReusedKeysHost TestScanByKeyWithLargeTypes TestScanMixedTypes TestScanMixedTypesDevice TestScanMixedTypesHost TestScanSimpleDevice TestScanSimpleHost TestScanToDiscardIterator TestScanWithLargeTypes TestScanWithOperator TestScanWithOperatorToDiscardIterator TestScatter TestScatterCountingIteratorDevice TestScatterCountingIteratorHost TestScatterIf TestScatterIfCountingIteratorDevice TestScatterIfCountingIteratorHost TestScatterIfSimpleDevice TestScatterIfSimpleHost TestScatterIfToDiscardIterator TestScatterSimpleDevice TestScatterSimpleHost TestScatterToDiscardIterator TestSelect TestSelectKeyValue TestSelectSemantics TestSequence TestSequenceSimpleDevice TestSequenceSimpleHost TestSequenceToDiscardIterator TestSetDifference TestSetDifferenceDescending TestSetDifferenceDescendingSimpleDevice TestSetDifferenceDescendingSimpleHost TestSetDifferenceEquivalentRanges TestSetDifferenceKeyValue TestSetDifferenceMultiset TestSetDifferenceSimpleDevice TestSetDifferenceSimpleHost TestSetIntersection TestSetIntersectionDescending TestSetIntersectionDescendingSimpleDevice TestSetIntersectionDescendingSimpleHost TestSetIntersectionEquivalentRanges TestSetIntersectionKeyValue TestSetIntersectionMultiset TestSetIntersectionSimpleDevice TestSetIntersectionSimpleHost TestSetIntersectionToDiscardIterator TestSetSymmetricDifference TestSetSymmetricDifferenceDescending TestSetSymmetricDifferenceDescendingSimpleDevice TestSetSymmetricDifferenceDescendingSimpleHost TestSetSymmetricDifferenceEquivalentRanges TestSetSymmetricDifferenceKeyValue TestSetSymmetricDifferenceMultiset TestSetSymmetricDifferenceSimpleDevice TestSetSymmetricDifferenceSimpleHost TestSetUnion TestSetUnionDescending TestSetUnionKeyValue TestSetUnionKeyValueDescending TestSetUnionSimpleDevice TestSetUnionSimpleHost TestSetUnionToDiscardIterator TestSetUnionWithEquivalentElementsSimpleDevice TestSetUnionWithEquivalentElementsSimpleHost TestSortAscendingKey TestSortAscendingKeyValue TestSortByKeySimpleDevice TestSortByKeySimpleHost TestSortByKeyVariableBits TestSortDescendingKey TestSortDescendingKeyValue TestSortNullPtr TestSortSimpleDevice TestSortSimpleHost TestSortVariableBits TestStablePartition TestStablePartitionCopy TestStablePartitionCopySimpleDevice TestStablePartitionCopySimpleHost TestStablePartitionCopyToDiscardIterator TestStablePartitionSimpleDevice TestStablePartitionSimpleHost TestStablePartitionZipIteratorDevice TestStablePartitionZipIteratorHost TestStableSort TestStableSortByKey TestStableSortByKeySemantics TestStableSortByKeySimpleDevice TestStableSortByKeySimpleHost TestStableSortByKeyWithLargeKeys TestStableSortByKeyWithLargeKeysAndValues TestStableSortByKeyWithLargeValues TestStableSortSemantics TestStableSortSimpleDevice TestStableSortSimpleHost TestStableSortWithIndirectionDevice TestStableSortWithIndirectionHost TestStableSortWithLargeKeys TestStandardIntegerTypes TestSwapRanges TestSwapRangesSimpleDevice TestSwapRangesSimpleHost TestSwapRangesUserSwap TestTaus88Equal TestTaus88Max TestTaus88Min TestTaus88SaveRestore TestTaus88Unequal TestTaus88Validation TestTransformBinary TestTransformBinaryCountingIteratorDevice TestTransformBinaryCountingIteratorHost TestTransformBinarySimpleDevice TestTransformBinarySimpleHost TestTransformBinaryToDiscardIterator TestTransformIfBinary TestTransformIfBinarySimpleDevice TestTransformIfBinarySimpleHost TestTransformIfBinaryToDiscardIterator TestTransformIfUnary TestTransformIfUnaryNoStencil TestTransformIfUnaryNoStencilSimpleDevice TestTransformIfUnaryNoStencilSimpleHost TestTransformIfUnarySimpleDevice TestTransformIfUnarySimpleHost TestTransformIfUnaryToDiscardIterator TestTransformIteratorDevice TestTransformIteratorHost TestTransformIteratorReduce TestTransformNullPtr TestTransformReduce TestTransformReduceCountingIteratorDevice TestTransformReduceCountingIteratorHost TestTransformReduceFromConst TestTransformReduceSimpleDevice TestTransformReduceSimpleHost TestTransformScan TestTransformScanCountingIteratorDevice TestTransformScanCountingIteratorHost TestTransformScanSimpleDevice TestTransformScanSimpleHost TestTransformScanToDiscardIterator TestTransformUnary TestTransformUnaryCountingIteratorDevice TestTransformUnaryCountingIteratorHost TestTransformUnarySimpleDevice TestTransformUnarySimpleHost TestTransformUnaryToDiscardIterator TestTransformUnaryToDiscardIteratorZipped TestTransformWithIndirectionDevice TestTransformWithIndirectionHost TestTrivialSequenceDevice TestTrivialSequenceHost TestTupleComparison TestTupleConstructor TestTupleGet TestTupleReduce TestTupleScan TestTupleStableSort TestTupleTie TestTupleTransform TestTypeName TestUniformDecomposition TestUniformIntDistributionMax TestUniformIntDistributionMin TestUniformIntDistributionSaveRestore TestUniformRealDistributionMax TestUniformRealDistributionMin TestUniformRealDistributionSaveRestore TestUninitializedCopyNonPODDevice TestUninitializedCopyNonPODHost TestUninitializedCopySimplePODDevice TestUninitializedCopySimplePODHost TestUninitializedFillNNonPOD TestUninitializedFillNPODDevice TestUninitializedFillNPODHost TestUninitializedFillNonPOD TestUninitializedFillPODDevice TestUninitializedFillPODHost TestUnique TestUniqueByKey TestUniqueByKeySimpleDevice TestUniqueByKeySimpleHost TestUniqueCopy TestUniqueCopyByKey TestUniqueCopyByKeySimpleDevice TestUniqueCopyByKeySimpleHost TestUniqueCopyByKeyToDiscardIterator TestUniqueCopySimpleDevice TestUniqueCopySimpleHost TestUniqueCopyToDiscardIterator TestUniqueSimpleDevice TestUniqueSimpleHost TestUnknownDeviceRobustness TestVectorAssignFromBiDirectionalIteratorDevice TestVectorAssignFromBiDirectionalIteratorHost TestVectorAssignFromDeviceVectorDevice TestVectorAssignFromDeviceVectorHost TestVectorAssignFromHostVectorDevice TestVectorAssignFromHostVectorHost TestVectorAssignFromSTLVectorDevice TestVectorAssignFromSTLVectorHost TestVectorBinarySearch TestVectorBinarySearchDescending TestVectorBinarySearchDescendingSimpleDevice TestVectorBinarySearchDescendingSimpleHost TestVectorBinarySearchDiscardIterator TestVectorBinarySearchSimpleDevice TestVectorBinarySearchSimpleHost TestVectorBool TestVectorContainingLargeType TestVectorCppZeroSizeDevice TestVectorCppZeroSizeHost TestVectorDataDevice TestVectorDataHost TestVectorElementAssignmentDevice TestVectorElementAssignmentHost TestVectorEquality TestVectorErasePositionDevice TestVectorErasePositionHost TestVectorEraseRangeDevice TestVectorEraseRangeHost TestVectorFillAssignDevice TestVectorFillAssignHost TestVectorFillInsert TestVectorFillInsertSimple TestVectorFillInsertSimple TestVectorFromBiDirectionalIteratorDevice TestVectorFromBiDirectionalIteratorHost TestVectorFromSTLVectorDevice TestVectorFromSTLVectorHost TestVectorFrontBackDevice TestVectorFrontBackHost TestVectorInequality TestVectorLowerBound TestVectorLowerBoundDescending TestVectorLowerBoundDescendingSimpleDevice TestVectorLowerBoundDescendingSimpleHost TestVectorLowerBoundDiscardIterator TestVectorLowerBoundSimpleDevice TestVectorLowerBoundSimpleHost TestVectorManipulationDevice TestVectorManipulationHost TestVectorRangeInsert TestVectorRangeInsertSimple TestVectorRangeInsertSimple TestVectorReservingDevice TestVectorReservingHost TestVectorResizingDevice TestVectorResizingHost TestVectorReversedDevice TestVectorReversedHost TestVectorShrinkToFitDevice TestVectorShrinkToFitHost TestVectorSwapDevice TestVectorSwapHost TestVectorToAndFromDeviceVectorDevice TestVectorToAndFromDeviceVectorHost TestVectorToAndFromHostVectorDevice TestVectorToAndFromHostVectorHost TestVectorUpperBound TestVectorUpperBoundDescending TestVectorUpperBoundDescendingSimpleDevice TestVectorUpperBoundDescendingSimpleHost TestVectorUpperBoundDiscardIterator TestVectorUpperBoundSimpleDevice TestVectorUpperBoundSimpleHost TestVectorWithInitialValueDevice TestVectorWithInitialValueHost TestVectorZeroSizeDevice TestVectorZeroSizeHost TestZipIteratorCopyAoSToSoA TestZipIteratorCopyDevice TestZipIteratorCopyHost TestZipIteratorCopySoAToAoS TestZipIteratorManipulation TestZipIteratorReduce TestZipIteratorReduceByKey TestZipIteratorReference TestZipIteratorScan TestZipIteratorSpace TestZipIteratorStableSort TestZipIteratorStableSortByKey TestZipIteratorTransform TestZipIteratorTraversal TestZippedDiscardIterator thrust-1.9.5/internal/test/warningstester.cu000066400000000000000000000001311344621116200212550ustar00rootroot00000000000000//#include "cuda_runtime_api.h" #include "warningstester.h" int main() { return 0; } thrust-1.9.5/perf_test/000077500000000000000000000000001344621116200150525ustar00rootroot00000000000000thrust-1.9.5/perf_test/adjacent_difference.h000066400000000000000000000014511344621116200211470ustar00rootroot00000000000000#include template > struct AdjacentDifference { Policy policy; Container1 A; Container2 B; BinaryFunction binary_op; template AdjacentDifference(Policy policy, const Range1& X, const Range2& Y, BinaryFunction binary_op = BinaryFunction()) : policy(policy), A(X.begin(), X.end()), B(Y.begin(), Y.end()), binary_op(binary_op) {} void operator()(void) { thrust::adjacent_difference(policy, A.begin(), A.end(), B.begin(), binary_op); } }; thrust-1.9.5/perf_test/binary_search.h000066400000000000000000000053361344621116200200430ustar00rootroot00000000000000#include #include template > struct LowerBound { Policy policy; Container1 A; // haystack Container2 B; // needles Container3 C; // positions StrictWeakOrdering comp; template LowerBound(Policy policy, const Range1& X, const Range2& Y, const Range3& Z, StrictWeakOrdering comp = StrictWeakOrdering()) : policy(policy), A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp) { thrust::stable_sort(policy, A.begin(), A.end(), comp); } void operator()(void) { thrust::lower_bound(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp); } }; template > struct UpperBound { Policy policy; Container1 A; // haystack Container2 B; // needles Container3 C; // positions StrictWeakOrdering comp; template UpperBound(Policy policy, const Range1& X, const Range2& Y, const Range3& Z, StrictWeakOrdering comp = StrictWeakOrdering()) : policy(policy), A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp) { thrust::stable_sort(policy, A.begin(), A.end(), comp); } void operator()(void) { thrust::upper_bound(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp); } }; template > struct BinarySearch { Policy policy; Container1 A; // haystack Container2 B; // needles Container3 C; // booleans StrictWeakOrdering comp; template BinarySearch(Policy policy,const Range1& X, const Range2& Y, const Range3& Z, StrictWeakOrdering comp = StrictWeakOrdering()) : policy(policy), A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp) { thrust::stable_sort(policy, A.begin(), A.end(), comp); } void operator()(void) { thrust::binary_search(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp); } }; thrust-1.9.5/perf_test/clock_timer.h000066400000000000000000000004211344621116200175130ustar00rootroot00000000000000#pragma once #include struct clock_timer { std::clock_t start; clock_timer() : start(std::clock()) {} void restart() { start = std::clock(); } double elapsed_seconds() { return double(std::clock() - start) / CLOCKS_PER_SEC; } }; thrust-1.9.5/perf_test/copy.h000066400000000000000000000031611344621116200161760ustar00rootroot00000000000000#include template struct Copy { Container1 A; Container2 B; Policy policy; template Copy(Policy policy, const Range1& X, const Range2& Y) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), policy(policy) {} void operator()(void) { thrust::copy(policy, A.begin(), A.end(), B.begin()); } }; template struct CopyN { Container1 A; Container2 B; Policy policy; template CopyN(Policy policy, const Range1& X, const Range2& Y) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), policy(policy) {} void operator()(void) { thrust::copy_n(policy, A.begin(), A.size(), B.begin()); } }; template > struct CopyIf { Container1 A; // values Container2 B; // stencil Container3 C; // output Predicate pred; Policy policy; template CopyIf(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, Predicate pred = Predicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::copy_if(policy, A.begin(), A.end(), B.begin(), C.begin(), pred); } }; thrust-1.9.5/perf_test/count.h000066400000000000000000000017141344621116200163560ustar00rootroot00000000000000#include template struct Count { Container A; EqualityComparable value; Policy policy; template Count(Policy policy_, const Range& X, EqualityComparable value = EqualityComparable()) : A(X.begin(), X.end()), value(value), policy(policy_) {} void operator()(void) { thrust::count(policy, A.begin(), A.end(), value); } }; template > struct CountIf { Container A; Predicate pred; Policy policy; template CountIf(Policy policy_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(policy_) {} void operator()(void) { thrust::count_if(policy, A.begin(), A.end(), pred); } }; thrust-1.9.5/perf_test/cuda_timer.h000066400000000000000000000023031344621116200173350ustar00rootroot00000000000000#include // do not attempt to compile this code, which relies on // CUDART, without system support #if THRUST_DEVICE_COMPILER == THRUST_DEVICE_COMPILER_NVCC #include #if THRUST_VERSION < 100600 #include #else #include #endif #include #include void cuda_safe_call(cudaError_t error, const std::string& message = "") { if(error) throw thrust::system_error(error, thrust::cuda_category(), message); } struct cuda_timer { cudaEvent_t start; cudaEvent_t end; cuda_timer(void) { cuda_safe_call(cudaEventCreate(&start)); cuda_safe_call(cudaEventCreate(&end)); restart(); } ~cuda_timer(void) { cuda_safe_call(cudaEventDestroy(start)); cuda_safe_call(cudaEventDestroy(end)); } void restart(void) { cuda_safe_call(cudaEventRecord(start, 0)); } double elapsed_seconds(void) { cuda_safe_call(cudaEventRecord(end, 0)); cuda_safe_call(cudaEventSynchronize(end)); float ms_elapsed; cuda_safe_call(cudaEventElapsedTime(&ms_elapsed, start, end)); return ms_elapsed / 1e3; } }; #endif // THRUST_DEVICE_COMPILER_NVCC thrust-1.9.5/perf_test/demangle.hpp000066400000000000000000000010031344621116200173310ustar00rootroot00000000000000#pragma once #include #include #ifdef __GNUC__ // see http://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html #include std::string demangle(const std::string &mangled) { int status; char *realname = abi::__cxa_demangle(mangled.c_str(), 0, 0, &status); std::string result(realname); std::free(realname); return result; } #else // MSVC doesn't mangle the result of typeid().name() std::string demangle(const std::string &mangled) { return mangled; } #endif thrust-1.9.5/perf_test/device_timer.h000066400000000000000000000005031344621116200176600ustar00rootroot00000000000000#include #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA #include "cuda_timer.h" typedef cuda_timer device_timer; #elif THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_TBB #include "tbb_timer.h" typedef tbb_timer device_timer; #else #include "clock_timer.h" typedef clock_timer device_timer; #endif thrust-1.9.5/perf_test/driver.cu000066400000000000000000000253031344621116200167010ustar00rootroot00000000000000#include #include #include #include #include #include #include "device_timer.h" #include "random.h" #include "demangle.hpp" // Algos #include "adjacent_difference.h" #include "binary_search.h" #include "copy.h" #include "count.h" #include "equal.h" #include "extrema.h" #include "fill.h" #include "find.h" #include "for_each.h" #include "gather.h" #include "generate.h" #include "inner_product.h" #include "logical.h" #include "merge.h" #include "mismatch.h" #include "partition.h" #include "reduce.h" #include "remove.h" #include "replace.h" #include "reverse.h" #include "scan.h" #include "scatter.h" #include "sequence.h" #include "set_operations.h" #include "set_operations_by_key.h" #include "sort.h" #include "swap.h" #include "transform.h" #include "transform_reduce.h" #include "transform_scan.h" #include "uninitialized_copy.h" #include "uninitialized_fill.h" #include "unique.h" #if THRUST_VERSION >= 100700 #include "tabulate.h" #endif template std::string name_of_type() { return std::string(demangle(typeid(T).name())); } template void report(const Test& test, double time) { std::string test_name = name_of_type(); if (test_name.find("<") != std::string::npos) { test_name.resize(test_name.find("<")); } std::cout << test_name << ", " << time << ", " << std::endl; } __THRUST_DEFINE_HAS_MEMBER_FUNCTION(has_reset, reset); template typename thrust::detail::enable_if< has_reset::value >::type benchmark(Test& test, size_t iterations = 100) { // run one iteration (warm up) for (int i = 0; i < 3; ++i) { test(); test.reset(); } thrust::host_vector times(iterations); // the test has a reset function so we have to // be careful not to include the time it takes for (size_t i = 0; i < iterations; i++) { cudaDeviceSynchronize(); device_timer timer; test(); cudaDeviceSynchronize(); times[i] = timer.elapsed_seconds(); test.reset(); } double mean = thrust::reduce(times.begin(), times.end()) / times.size(); report(test, mean); }; template typename thrust::detail::disable_if< has_reset::value >::type benchmark(Test& test, size_t iterations = 100) { // run one iteration (warm up) for (int i = 0; i < 3; ++i) { test(); } // the test doesn't have a reset function so we can // just take the average time cudaDeviceSynchronize(); device_timer timer; for (size_t i = 0; i < iterations; i++) { test(); } cudaDeviceSynchronize(); double time = timer.elapsed_seconds()/ iterations; report(test, time); }; int main(int argc, char **argv) { size_t N = 16 << 20; if(argc > 1) { N = atoi(argv[1]); } else if(argc > 2) { std::cerr << "usage: driver [datasize]" << std::endl; exit(-1); } typedef thrust::device_vector Vector; typedef testing::random_integers RandomIntegers; typedef testing::random_integers RandomBooleans; RandomIntegers A(N, 123); RandomIntegers B(N, 234); RandomIntegers C(N, 345); RandomBooleans D(N, 456); Vector T(N, 1); Vector F(N, 0); Vector S(N); thrust::sequence(S.begin(), S.end()); Vector U1(2*N, 0); Vector U2(2*N, 0); thrust::identity I; { AdjacentDifference temp(A,B); benchmark(temp); } // adjacent_difference { LowerBound temp(A,B,C); benchmark(temp); } // binary_search { UpperBound temp(A,B,C); benchmark(temp); } { BinarySearch temp(A,B,C); benchmark(temp); } { Copy temp(A,B); benchmark(temp); } // copy { CopyN temp(A,B); benchmark(temp); } { CopyIf temp(A,D,B); benchmark(temp); } { Count temp(D); benchmark(temp); } // count { CountIf temp(D); benchmark(temp); } { Equal temp(A,A); benchmark(temp); } // equal { MinElement temp(A); benchmark(temp); } // extrema { MaxElement temp(A); benchmark(temp); } { MinMaxElement temp(A); benchmark(temp); } { Fill temp(A); benchmark(temp); } // fill { FillN temp(A); benchmark(temp); } { Find temp(F,1); benchmark(temp); } // find { FindIf temp(F); benchmark(temp); } { FindIfNot temp(T); benchmark(temp); } { ForEach temp(A); benchmark(temp); } // for_each { Gather temp(S,A,B); benchmark(temp); } // gather { GatherIf temp(S,D,A,B); benchmark(temp); } { Generate temp(A); benchmark(temp); } // generate { GenerateN temp(A); benchmark(temp); } { InnerProduct temp(A,B); benchmark(temp); } // inner_product { AllOf temp(T); benchmark(temp); } // logical { AnyOf temp(F); benchmark(temp); } { NoneOf temp(F); benchmark(temp); } { Merge temp(A,B,U1); benchmark(temp); } // merge { Mismatch temp(A,A); benchmark(temp); } // mismatch { Partition temp(A); benchmark(temp); } // partition { PartitionCopy temp(D,A,B); benchmark(temp); } { StablePartition temp(A); benchmark(temp); } { StablePartitionCopy temp(D,A,B); benchmark(temp); } { IsPartitioned temp(T); benchmark(temp); } { PartitionPoint temp(T); benchmark(temp); } { Reduce temp(A); benchmark(temp); } // reduce { ReduceByKey temp(D,A,B,C); benchmark(temp); } { Remove temp(D,0); benchmark(temp); } // remove { RemoveCopy temp(D,A,0); benchmark(temp); } { RemoveIf temp(A,D); benchmark(temp); } { RemoveCopyIf temp(A,D,B); benchmark(temp); } { Replace temp(D,0,2); benchmark(temp); } // replace { ReplaceCopy temp(D,A,0,2); benchmark(temp); } { ReplaceIf temp(A,D,I,0); benchmark(temp); } { ReplaceCopyIf temp(A,D,B,I,0); benchmark(temp); } { Reverse temp(A); benchmark(temp); } { ReverseCopy temp(A,B); benchmark(temp); } { InclusiveScan temp(A,B); benchmark(temp); } { ExclusiveScan temp(A,B); benchmark(temp); } { InclusiveScanByKey temp(D,A,B); benchmark(temp); } { ExclusiveScanByKey temp(D,A,B); benchmark(temp); } { Scatter temp(A,S,B); benchmark(temp); } // scatter { ScatterIf temp(A,S,D,B); benchmark(temp); } { Sequence temp(A); benchmark(temp); } // sequence { SetDifference temp(A,B,U1); benchmark(temp); } // set_operations { SetIntersection temp(A,B,U1); benchmark(temp); } { SetSymmetricDifference temp(A,B,U1); benchmark(temp); } { SetUnion temp(A,B,U1); benchmark(temp); } { Sort temp(A); benchmark(temp); } // sort { SortByKey temp(A,B); benchmark(temp); } { StableSort temp(A); benchmark(temp); } { StableSortByKey temp(A,B); benchmark(temp); } { ComparisonSort temp(A); benchmark(temp); } { ComparisonSortByKey temp(A,B); benchmark(temp); } { IsSorted temp(S); benchmark(temp); } { IsSortedUntil temp(S); benchmark(temp); } { SwapRanges temp(A,B); benchmark(temp); } // swap { UnaryTransform temp(A,B); benchmark(temp); } // transform { BinaryTransform temp(A,B,C); benchmark(temp); } { UnaryTransformIf temp(A,D,B); benchmark(temp); } { BinaryTransformIf temp(A,B,D,C); benchmark(temp); } { TransformReduce temp(A); benchmark(temp); } // transform_reduce { TransformInclusiveScan temp(A,B); benchmark(temp); } // transform_scan { TransformExclusiveScan temp(A,B); benchmark(temp); } { UninitializedCopy temp(A,B); benchmark(temp); } // uninitialized_copy { UninitializedFill temp(A); benchmark(temp); } // fill { UninitializedFillN temp(A); benchmark(temp); } { Unique temp(D); benchmark(temp); } // unique { UniqueCopy temp(D,A); benchmark(temp); } { UniqueByKey temp(D,A); benchmark(temp); } { UniqueByKeyCopy temp(D,A,B,C); benchmark(temp); } #if THRUST_VERSION > 100700 { MergeByKey temp(A,B,C,D,U1,U2); benchmark(temp); } // merge_by_key { SetDifferenceByKey temp(A,B,C,D,U1,U2); benchmark(temp); } // set_operations by_key { SetIntersectionByKey temp(A,B,C,U1,U2); benchmark(temp); } { SetSymmetricDifferenceByKey temp(A,B,C,D,U1,U2); benchmark(temp); } { SetUnionByKey temp(A,B,C,D,U1,U2); benchmark(temp); } { Tabulate temp(A); benchmark(temp); } // tabulate #endif // host<->device copy return 0; } thrust-1.9.5/perf_test/equal.h000066400000000000000000000012631344621116200163340ustar00rootroot00000000000000#include template > struct Equal { Container1 A; Container2 B; BinaryPredicate binary_pred; Policy policy; template Equal(Policy policy_, const Range1& X, const Range2& Y, BinaryPredicate binary_pred = BinaryPredicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), binary_pred(binary_pred), policy(policy_) {} void operator()(void) { thrust::equal(policy, A.begin(), A.end(), B.begin(), binary_pred); } }; thrust-1.9.5/perf_test/extrema.h000066400000000000000000000031531344621116200166720ustar00rootroot00000000000000#include template > struct MinElement { Container A; BinaryPredicate binary_pred; Policy policy; template MinElement(Policy policy_, const Range& X, BinaryPredicate binary_pred = BinaryPredicate()) : A(X.begin(), X.end()), binary_pred(binary_pred), policy(policy_) {} void operator()(void) { thrust::min_element(policy,A.begin(), A.end(), binary_pred); } }; template > struct MaxElement { Container A; BinaryPredicate binary_pred; Policy policy; template MaxElement(Policy policy_, const Range& X, BinaryPredicate binary_pred = BinaryPredicate()) : A(X.begin(), X.end()), binary_pred(binary_pred), policy(policy_) {} void operator()(void) { thrust::max_element(policy,A.begin(), A.end(), binary_pred); } }; template > struct MinMaxElement { Container A; BinaryPredicate binary_pred; Policy policy; template MinMaxElement(Policy policy_, const Range& X, BinaryPredicate binary_pred = BinaryPredicate()) : A(X.begin(), X.end()), binary_pred(binary_pred), policy(policy_) {} void operator()(void) { thrust::minmax_element(policy,A.begin(), A.end(), binary_pred); } }; thrust-1.9.5/perf_test/fill.h000066400000000000000000000015371344621116200161570ustar00rootroot00000000000000#include template struct Fill { Container A; T value; Policy policy; template Fill(Policy policy_, const Range& X, T value = T()) : A(X.begin(), X.end()), value(value), policy(policy_) {} void operator()(void) { thrust::fill(policy, A.begin(), A.end(), value); } }; template struct FillN { Container A; T value; Policy policy; template FillN(Policy policy_, const Range& X, T value = T()) : A(X.begin(), X.end()), value(value), policy(policy_) {} void operator()(void) { thrust::fill_n(policy, A.begin(), A.size(), value); } }; thrust-1.9.5/perf_test/find.h000066400000000000000000000026261344621116200161510ustar00rootroot00000000000000#include template struct Find { Container A; EqualityComparable value; Policy policy; template Find(Policy policy_, const Range& X, EqualityComparable value) : A(X.begin(), X.end()), value(value), policy(policy_) {} void operator()(void) { thrust::find(policy,A.begin(), A.end(), value); } }; template > struct FindIf { Container A; Predicate pred; Policy policy; template FindIf(Policy policy_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(policy_) {} void operator()(void) { thrust::find_if(policy,A.begin(), A.end(), pred); } }; template > struct FindIfNot { Container A; Predicate pred; Policy policy; template FindIfNot(Policy policy_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(policy_) {} void operator()(void) { thrust::find_if_not(policy,A.begin(), A.end(), pred); } }; thrust-1.9.5/perf_test/for_each.h000066400000000000000000000011751344621116200167750ustar00rootroot00000000000000#include struct default_for_each_function { template __host__ __device__ void operator()(T& x) { x = T(); } }; template struct ForEach { Container A; UnaryFunction unary_op; Policy policy; template ForEach(Policy policy_, const Range& X, UnaryFunction unary_op = UnaryFunction()) : A(X.begin(), X.end()), unary_op(unary_op), policy(policy_) {} void operator()(void) { thrust::for_each(policy, A.begin(), A.end(), unary_op); } }; thrust-1.9.5/perf_test/gather.h000066400000000000000000000030271344621116200164770ustar00rootroot00000000000000#include template struct Gather { Container1 A; // map Container2 B; // source Container3 C; // output Policy policy; template Gather(Policy policy_, const Range1& X, const Range2& Y, const Range3& Z) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), policy(policy_) {} void operator()(void) { thrust::gather(policy, A.begin(), A.end(), B.begin(), C.begin()); } }; template > struct GatherIf { Container1 A; // map Container2 B; // stencil Container3 C; // source Container4 D; // output Predicate pred; Policy policy; template GatherIf(Policy policy_, const Range1& X, const Range2& Y, const Range3& Z, const Range4& W, Predicate pred = Predicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), D(W.begin(), W.end()), pred(pred), policy(policy_) {} void operator()(void) { thrust::gather_if(policy, A.begin(), A.end(), B.begin(), C.begin(), D.begin(), pred); } }; thrust-1.9.5/perf_test/generate.h000066400000000000000000000022571344621116200170230ustar00rootroot00000000000000#include template struct default_generate_function { __host__ __device__ T operator()(void) { return T(); } }; template > struct Generate { Container A; UnaryFunction unary_op; Policy policy; template Generate(Policy policy_, const Range& X, UnaryFunction unary_op = UnaryFunction()) : A(X.begin(), X.end()), unary_op(unary_op), policy(policy_) {} void operator()(void) { thrust::generate(policy, A.begin(), A.end(), unary_op); } }; template > struct GenerateN { Container A; UnaryFunction unary_op; Policy policy; template GenerateN(Policy policy_, const Range& X, UnaryFunction unary_op = UnaryFunction()) : A(X.begin(), X.end()), unary_op(unary_op), policy(policy_) {} void operator()(void) { thrust::generate_n(policy, A.begin(), A.size(), unary_op); } }; thrust-1.9.5/perf_test/inner_product.h000066400000000000000000000017121344621116200200770ustar00rootroot00000000000000#include template , typename BinaryFunction2 = thrust::multiplies > struct InnerProduct { Container1 A; Container2 B; T value; BinaryFunction1 binary_op1; BinaryFunction2 binary_op2; Policy policy; template InnerProduct(Policy policy_, const Range1& X, const Range2& Y, T value = T(0), BinaryFunction1 binary_op1 = BinaryFunction1(), BinaryFunction2 binary_op2 = BinaryFunction2()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), value(value), binary_op1(binary_op1), binary_op2(binary_op2), policy(policy_) {} void operator()(void) { thrust::inner_product(policy, A.begin(), A.end(), B.begin(), value, binary_op1, binary_op2); } }; thrust-1.9.5/perf_test/logical.h000066400000000000000000000025661344621116200166460ustar00rootroot00000000000000#include template > struct AllOf { Container A; Predicate pred; Policy policy; template AllOf(Policy p_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::all_of(policy, A.begin(), A.end(), pred); } }; template > struct AnyOf { Container A; Predicate pred; Policy policy; template AnyOf(Policy p_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::any_of(policy, A.begin(), A.end(), pred); } }; template > struct NoneOf { Container A; Predicate pred; Policy policy; template NoneOf(Policy p_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::none_of(policy, A.begin(), A.end(), pred); } }; thrust-1.9.5/perf_test/merge.h000066400000000000000000000052461344621116200163310ustar00rootroot00000000000000#include #include #include template > struct Merge { Container1 A; Container2 B; Container3 C; StrictWeakCompare comp; Policy policy; template Merge(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, StrictWeakCompare comp = StrictWeakCompare()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp), policy(p_) { thrust::stable_sort(policy, A.begin(), A.end(), comp); thrust::stable_sort(policy, B.begin(), B.end(), comp); } void operator()(void) { thrust::merge(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp); } }; #if THRUST_VERSION >= 100700 template > struct MergeByKey { Container1 keys1; Container2 keys2; Container3 values1; Container4 values2; Container5 out_keys; Container6 out_values; StrictWeakCompare comp; Policy policy; template MergeByKey(Policy p_, const Range1& keys1_, const Range2& keys2_, const Range3& values1_, const Range4& values2_, Range5 &out_keys_, Range6 &out_values_, StrictWeakCompare comp_ = StrictWeakCompare()) : keys1(keys1_.begin(), keys1_.end()), keys2(keys2_.begin(), keys2_.end()), values1(values1_.begin(), values1_.end()), values2(values2_.begin(), values2_.end()), out_keys(out_keys_.begin(), out_keys_.end()), out_values(out_values_.begin(), out_values_.end()), comp(comp_), policy(p_) { thrust::stable_sort(policy, keys1.begin(), keys1.end(), comp); thrust::stable_sort(policy, keys2.begin(), keys2.end(), comp); } void operator()(void) { thrust::merge_by_key(policy, keys1.begin(), keys1.end(), keys2.begin(), keys2.end(), values1.begin(), values2.begin(), out_keys.begin(), out_values.begin(), comp); } }; #endif // THRUST_VERSION thrust-1.9.5/perf_test/mismatch.h000066400000000000000000000012641344621116200170330ustar00rootroot00000000000000#include template > struct Mismatch { Container1 A; Container2 B; BinaryPredicate binary_pred; Policy policy; template Mismatch(Policy p_, const Range1& X, const Range2& Y, BinaryPredicate binary_pred = BinaryPredicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), binary_pred(binary_pred), policy(p_) {} void operator()(void) { thrust::mismatch(policy, A.begin(), A.end(), B.begin(), binary_pred); } }; thrust-1.9.5/perf_test/partition.h000066400000000000000000000116221344621116200172360ustar00rootroot00000000000000#include template > struct Partition { Container A; Container B; // copy of initial data Predicate pred; Policy policy; template Partition(Policy p_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), B(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::partition(policy, A.begin(), A.end(), pred); } void reset(void) { // restore initial data thrust::copy(policy, B.begin(), B.end(), A.begin()); } }; template > struct PartitionCopy { Container1 A; Container2 B; Container3 C; Predicate pred; Policy policy; template PartitionCopy(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, Predicate pred = Predicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::partition_copy(policy, A.begin(), A.end(), B.begin(), C.begin(), pred); } }; template > struct StablePartition { Container A; Container B; // copy of initial data Predicate pred; Policy policy; template StablePartition(Policy p_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), B(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::stable_partition(policy, A.begin(), A.end(), pred); } void reset(void) { // restore initial data thrust::copy(policy, B.begin(), B.end(), A.begin()); } }; template > struct StablePartitionCopy { Container1 A; Container2 B; Container3 C; Predicate pred; Policy policy; template StablePartitionCopy(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, Predicate pred = Predicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::stable_partition_copy(policy, A.begin(), A.end(), B.begin(), C.begin(), pred); } }; template > struct IsPartitioned { Container A; Predicate pred; Policy policy; template IsPartitioned(Policy p_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::is_partitioned(policy, A.begin(), A.end(), pred); } }; template > struct PartitionPoint { Container A; Predicate pred; Policy policy; template PartitionPoint(Policy p_, const Range& X, Predicate pred = Predicate()) : A(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::partition_point(policy, A.begin(), A.end(), pred); } }; // is_partitioned / partition / stable_partition / partition_copy / stable_partition_copy //template //thrust::pair< OutputIterator1, //OutputIterator2 > thrust::partition_copy (InputIterator first, InputIterator last, OutputIterator1 out_true, OutputIterator2 out_false, Predicate pred) //template //ForwardIterator thrust::stable_partition (ForwardIterator first, ForwardIterator last, Predicate pred) //template //thrust::pair< OutputIterator1, //OutputIterator2 > thrust::stable_partition_copy (InputIterator first, InputIterator last, OutputIterator1 out_true, OutputIterator2 out_false, Predicate pred) //template //ForwardIterator thrust::partition_point (ForwardIterator first, ForwardIterator last, Predicate pred) //template //bool thrust::is_partitioned (InputIterator first, InputIterator last, Predicate pred) thrust-1.9.5/perf_test/perf_test.cu000066400000000000000000000346001344621116200174010ustar00rootroot00000000000000#include #include #include #include #include #include #include #include "device_timer.h" #include "random.h" #include "demangle.hpp" // Algos #include "adjacent_difference.h" #include "binary_search.h" #include "copy.h" #include "count.h" #include "equal.h" #include "extrema.h" #include "fill.h" #include "find.h" #include "for_each.h" #include "gather.h" #include "generate.h" #include "inner_product.h" #include "logical.h" #include "merge.h" #include "mismatch.h" #include "partition.h" #include "reduce.h" #include "remove.h" #include "replace.h" #include "reverse.h" #include "scan.h" #include "scatter.h" #include "sequence.h" #include "set_operations.h" #include "set_operations_by_key.h" #include "sort.h" #include "swap.h" #include "transform.h" #include "transform_reduce.h" #include "transform_scan.h" #include "uninitialized_copy.h" #include "uninitialized_fill.h" #include "unique.h" #if THRUST_VERSION >= 100700 #include "tabulate.h" #endif struct caching_device_allocator { typedef char value_type; typedef char *allocator_pointer; typedef std::multimap free_blocks_type; typedef std::map allocated_blocks_type; free_blocks_type free_blocks; allocated_blocks_type allocated_blocks; void free_all() { // deallocate all outstanding blocks in both lists for (free_blocks_type::iterator i = free_blocks.begin(); i != free_blocks.end(); ++i) { cudaError_t status = cudaFree(i->second); assert(cudaSuccess == status); } for (allocated_blocks_type::iterator i = allocated_blocks.begin(); i != allocated_blocks.end(); ++i) { cudaError_t status = cudaFree(i->first); assert(cudaSuccess == status); } } caching_device_allocator() {} ~caching_device_allocator() { // free all allocations when cached_allocator goes out of scope free_all(); } char *allocate(std::ptrdiff_t num_bytes) { void *result = 0; // search the cache for a free block free_blocks_type::iterator free_block = free_blocks.find(num_bytes); if (free_block != free_blocks.end()) { // get the pointer result = free_block->second; // erase from the free_blocks map free_blocks.erase(free_block); } else { // no allocation of the right size exists // create a new one with m_base_allocator // allocate memory and convert to raw pointer cudaError_t status = cudaMalloc(&result, num_bytes); assert(cudaSuccess == status); } // insert the allocated pointer into the allocated_blocks map allocated_blocks.insert(std::make_pair(result, num_bytes)); return (char*)result; } void deallocate(char *ptr, size_t n) { // erase the allocated block from the allocated blocks map allocated_blocks_type::iterator iter = allocated_blocks.find(ptr); std::ptrdiff_t num_bytes = iter->second; allocated_blocks.erase(iter); // insert the block into the free blocks map free_blocks.insert(std::make_pair(num_bytes, ptr)); } }; template std::string name_of_type() { return std::string(demangle(typeid(T).name())); } template void report(const Test& test, double time) { std::string test_name = name_of_type(); if (test_name.find("<") != std::string::npos) { test_name.resize(test_name.find("<")); } std::cout << test_name << ", " << time << ", " << std::endl; } __THRUST_DEFINE_HAS_MEMBER_FUNCTION(has_reset, reset); template typename thrust::detail::enable_if< has_reset::value >::type benchmark(Test& test, size_t iterations = 20) { // run one iteration (warm up) for (int i = 0; i < 3; ++i) { test(); test.reset(); } thrust::host_vector times(iterations); // the test has a reset function so we have to // be careful not to include the time it takes for (size_t i = 0; i < iterations; i++) { cudaDeviceSynchronize(); device_timer timer; test(); cudaDeviceSynchronize(); times[i] = timer.elapsed_seconds(); test.reset(); } double mean = thrust::reduce(times.begin(), times.end()) / times.size(); report(test, mean); }; template typename thrust::detail::disable_if< has_reset::value >::type benchmark(Test& test, size_t iterations = 20) { // run one iteration (warm up) for (int i = 0; i < 3; ++i) { test(); } // the test doesn't have a reset function so we can // just take the average time cudaDeviceSynchronize(); device_timer timer; for (size_t i = 0; i < iterations; i++) { test(); } cudaDeviceSynchronize(); double time = timer.elapsed_seconds()/ iterations; report(test, time); }; template void doit(P p, size_t N, size_t seed) { typedef thrust::device_vector Vector; typedef thrust::host_vector hVector; typedef testing::random_integers RandomIntegers; typedef testing::random_integers RandomBooleans; RandomIntegers A_(N, 1235630645667); RandomIntegers B_(N, 234339572634); RandomIntegers C_(N, 345); RandomBooleans D(N, 456); Vector T(N, 1); Vector F(N, 0); Vector S(N); thrust::sequence(S.begin(), S.end()); Vector U1(2*N, 0); Vector U2(2*N, 0); hVector hA(N); hVector hB(N); hVector hC(N); srand48(seed); for (int i = 0; i < N; ++i) { hA[i] = drand48()*N; hB[i] = drand48()*N; hC[i] = drand48()*N; } Vector A = hA; Vector B = hB; Vector C = hC; #ifndef _ALL { ComparisonSort temp(p,A); benchmark(temp); } { ComparisonSortByKey temp(p,A,B); benchmark(temp); } #else thrust::identity I; { AdjacentDifference temp(p,A,B); benchmark(temp); } // adjacent_difference { LowerBound temp(p,A,B,C); benchmark(temp); } // binary_search { UpperBound temp(p,A,B,C); benchmark(temp); } { BinarySearch temp(p,A,B,C); benchmark(temp); } { Copy temp(p,A,B); benchmark(temp); } // copy { CopyN temp(p,A,B); benchmark(temp); } { CopyIf temp(p,A,D,B); benchmark(temp); } { Count temp(p,D); benchmark(temp); } // count { CountIf temp(p,D); benchmark(temp); } { Equal temp(p,A,A); benchmark(temp); } // equal { MinElement temp(p,A); benchmark(temp); } // extrema { MaxElement temp(p,A); benchmark(temp); } { MinMaxElement temp(p,A); benchmark(temp); } { Fill temp(p,A); benchmark(temp); } // fill { FillN temp(p,A); benchmark(temp); } { Find temp(p,F,1); benchmark(temp); } // find { FindIf temp(p,F); benchmark(temp); } { FindIfNot temp(p,T); benchmark(temp); } { ForEach temp(p,A); benchmark(temp); } // for_each { Gather temp(p,S,A,B); benchmark(temp); } // gather { GatherIf temp(p,S,D,A,B); benchmark(temp); } { Generate temp(p,A); benchmark(temp); } // generate { GenerateN temp(p,A); benchmark(temp); } { InnerProduct temp(p,A,B); benchmark(temp); } // inner_product { AllOf temp(p,T); benchmark(temp); } // logical { AnyOf temp(p,F); benchmark(temp); } { NoneOf temp(p,F); benchmark(temp); } { Merge temp(p,A,B,U1); benchmark(temp); } // merge { Mismatch temp(p,A,A); benchmark(temp); } // mismatch { Partition temp(p,A); benchmark(temp); } // partition { PartitionCopy temp(p,D,A,B); benchmark(temp); } { StablePartition temp(p,A); benchmark(temp); } { StablePartitionCopy temp(p,D,A,B); benchmark(temp); } { IsPartitioned temp(p,T); benchmark(temp); } { PartitionPoint temp(p,T); benchmark(temp); } { Reduce temp(p,A); benchmark(temp); } // reduce { ReduceByKey temp(p,D,A,B,C); benchmark(temp); } { Remove temp(p,D,0); benchmark(temp); } // remove { RemoveCopy temp(p,D,A,0); benchmark(temp); } { RemoveIf temp(p,A,D); benchmark(temp); } { RemoveCopyIf temp(p,A,D,B); benchmark(temp); } { Replace temp(p,D,0,2); benchmark(temp); } // replace { ReplaceCopy temp(p,D,A,0,2); benchmark(temp); } { ReplaceIf temp(p,A,D,I,0); benchmark(temp); } { ReplaceCopyIf temp(p,A,D,B,I,0); benchmark(temp); } { Reverse temp(p,A); benchmark(temp); } { ReverseCopy temp(p,A,B); benchmark(temp); } { InclusiveScan temp(p,A,B); benchmark(temp); } { ExclusiveScan temp(p,A,B); benchmark(temp); } { InclusiveScanByKey temp(p,D,A,B); benchmark(temp); } { ExclusiveScanByKey temp(p,D,A,B); benchmark(temp); } { Scatter temp(p,A,S,B); benchmark(temp); } // scatter { ScatterIf temp(p,A,S,D,B); benchmark(temp); } { Sequence temp(p,A); benchmark(temp); } // sequence { SetDifference temp(p,A,B,U1); benchmark(temp); } // set_operations { SetIntersection temp(p,A,B,U1); benchmark(temp); } { SetSymmetricDifference temp(p,A,B,U1); benchmark(temp); } { SetUnion temp(p,A,B,U1); benchmark(temp); } { Sort temp(p,A); benchmark(temp); } // sort { SortByKey temp(p,A,B); benchmark(temp); } { StableSort temp(p,A); benchmark(temp); } { StableSortByKey temp(p,A,B); benchmark(temp); } { ComparisonSort temp(p,A); benchmark(temp); } { ComparisonSortByKey temp(p,A,B); benchmark(temp); } { IsSorted temp(p,S); benchmark(temp); } { IsSortedUntil temp(p,S); benchmark(temp); } { SwapRanges temp(p,A,B); benchmark(temp); } // swap { UnaryTransform temp(p,A,B); benchmark(temp); } // transform { BinaryTransform temp(p,A,B,C); benchmark(temp); } { UnaryTransformIf temp(p,A,D,B); benchmark(temp); } { BinaryTransformIf temp(p,A,B,D,C); benchmark(temp); } { TransformReduce temp(p,A); benchmark(temp); } // transform_reduce { TransformInclusiveScan temp(p,A,B); benchmark(temp); } // transform_scan { TransformExclusiveScan temp(p,A,B); benchmark(temp); } { UninitializedCopy temp(p,A,B); benchmark(temp); } // uninitialized_copy { UninitializedFill temp(p,A); benchmark(temp); } // fill { UninitializedFillN temp(p,A); benchmark(temp); } { Unique temp(p,D); benchmark(temp); } // unique { UniqueCopy temp(p,D,A); benchmark(temp); } { UniqueByKey temp(p,D,A); benchmark(temp); } { UniqueByKeyCopy temp(p,D,A,B,C); benchmark(temp); } { MergeByKey temp(p,A,B,C,D,U1,U2); benchmark(temp); } // merge_by_key { SetDifferenceByKey temp(p,A,B,C,D,U1,U2); benchmark(temp); } // set_operations by_key { SetIntersectionByKey temp(p,A,B,C,U1,U2); benchmark(temp); } { SetSymmetricDifferenceByKey temp(p,A,B,C,D,U1,U2); benchmark(temp); } { SetUnionByKey temp(p,A,B,C,D,U1,U2); benchmark(temp); } { Tabulate temp(p,A); benchmark(temp); } // tabulate #endif // host<->device copy } int main(int argc, char **argv) { size_t N = 16 << 20; if(argc > 1) { N = atoi(argv[1]); } else if(argc > 2) { std::cerr << "usage: driver [datasize]" << std::endl; exit(-1); } std::cerr << "N= " << N << std::endl; size_t seed = (size_t)main; seed = 12345; #if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA_BULK #define _CUDA cuda_bulk #else #define _CUDA cuda #endif #ifdef USE_CUDA_MALLOC #define _PAR par #else caching_device_allocator alloc; #define _PAR par(alloc) #endif { std::cout << "Ty = usigned int" << std::endl; std::cout << "-----------------" << std::endl; typedef unsigned int Ty; doit(thrust::_CUDA::_PAR, N, seed); } { std::cout << std::endl; std::cout << "Ty = usigned long long" << std::endl; std::cout << "--------------------" << std::endl; typedef unsigned long long Ty; doit(thrust::_CUDA::_PAR, N, seed); } return 0; } thrust-1.9.5/perf_test/random.h000066400000000000000000000015561344621116200165120ustar00rootroot00000000000000/* * Copyright 2008-2009 NVIDIA Corporation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #pragma once namespace testing { // range containing random integers template class random_integers; // range containing random real numbers in [0,1) template class random_reals; } // end namespace testing #include "random.inl" thrust-1.9.5/perf_test/random.inl000066400000000000000000000124601344621116200170410ustar00rootroot00000000000000/* * Copyright 2008-2009 NVIDIA Corporation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include namespace testing { namespace detail { // Integer hash functions template struct random_integer_functor : public thrust::unary_function { size_t seed; random_integer_functor(const size_t seed) : seed(seed) {} // source: http://www.concentric.net/~ttwang/tech/inthash.htm __host__ __device__ T hash(const IndexType i, thrust::detail::false_type) const { unsigned int h = (unsigned int) i ^ (unsigned int) seed; h = ~h + (h << 15); h = h ^ (h >> 12); h = h + (h << 2); h = h ^ (h >> 4); h = h + (h << 3) + (h << 11); h = h ^ (h >> 16); return T(h); } __host__ __device__ T hash(const IndexType i, thrust::detail::true_type) const { unsigned long long h = (unsigned long long) i ^ (unsigned long long) seed; h = ~h + (h << 21); h = h ^ (h >> 24); h = (h + (h << 3)) + (h << 8); h = h ^ (h >> 14); h = (h + (h << 2)) + (h << 4); h = h ^ (h >> 28); h = h + (h << 31); return T(h); } __host__ __device__ T operator()(const IndexType i) const { return hash(i, typename thrust::detail::integral_constant::type()); } }; template struct integer_to_real : public thrust::unary_function { __host__ __device__ Real operator()(const UnsignedInteger i) const { const Real integer_bound = Real(UnsignedInteger(1) << (4 * sizeof(UnsignedInteger))) * Real(UnsignedInteger(1) << (4 * sizeof(UnsignedInteger))); return Real(i) / integer_bound; } }; template struct random_integer_iterator { public: typedef ptrdiff_t IndexType; typedef typename thrust::counting_iterator CountingIterator; typedef random_integer_functor Functor; typedef typename thrust::transform_iterator TransformIterator; typedef TransformIterator type; static type make(const size_t seed) { return type(CountingIterator(0), Functor(seed)); } }; template struct random_real_iterator {}; template <> struct random_real_iterator { typedef random_integer_iterator::type RandomIterator; typedef integer_to_real Functor; typedef thrust::transform_iterator TransformIterator; typedef TransformIterator type; static type make(const size_t seed) { return type(random_integer_iterator::make(seed), Functor()); } }; template <> struct random_real_iterator { typedef random_integer_iterator::type RandomIterator; typedef integer_to_real Functor; typedef thrust::transform_iterator TransformIterator; typedef TransformIterator type; static type make(const size_t seed) { return type(random_integer_iterator::make(seed), Functor()); } }; } // end namespace detail ///////////////////// // Implicit Ranges // ///////////////////// template class random_integers { typedef typename detail::random_integer_iterator::type iterator; typedef typename thrust::iterator_difference difference_type; typedef T value_type; protected: iterator m_begin; iterator m_end; public: random_integers(const size_t n, const size_t seed = 0) : m_begin(testing::detail::random_integer_iterator::make(seed)), m_end (testing::detail::random_integer_iterator::make(seed) + n) {} iterator begin(void) const { return m_begin; } iterator end (void) const { return m_end; } difference_type size(void) const { return m_end - m_begin; } }; //template //class random_reals : public cusp::array1d_view::type> //{ // protected: // typedef typename detail::random_real_iterator::type Iterator; // typedef typename cusp::array1d_view Parent; // // public: // random_reals(const size_t n, const size_t seed = 0) // : Parent(detail::random_real_iterator::make(seed), // detail::random_real_iterator::make(seed) + n) // {} //}; } // end namespace testing thrust-1.9.5/perf_test/reduce.h000066400000000000000000000041611344621116200164740ustar00rootroot00000000000000#include template > struct Reduce { Policy policy; Container A; T init; BinaryFunction binary_op; template Reduce(Policy policy_, const Range& X, T init = T(0), BinaryFunction binary_op = BinaryFunction()) : policy(policy_), A(X.begin(), X.end()), init(init), binary_op(binary_op) {} void operator()(void) { thrust::reduce(policy, A.begin(), A.end(), init, binary_op); } }; template , typename BinaryFunction = thrust::plus > struct ReduceByKey { Policy policy; Container1 A; Container2 B; Container3 C; Container4 D; BinaryPredicate binary_pred; BinaryFunction binary_op; template ReduceByKey(Policy policy_, const Range1& X, const Range2& Y, const Range3& Z, const Range4& W, BinaryPredicate binary_pred = BinaryPredicate(), BinaryFunction binary_op = BinaryFunction()) : policy(policy_), A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), D(W.begin(), W.end()), binary_pred(binary_pred), binary_op(binary_op) {} void operator()(void) { thrust::reduce_by_key(policy, A.begin(), A.end(), B.begin(), C.begin(), D.begin(), binary_pred, binary_op); } }; thrust-1.9.5/perf_test/remove.h000066400000000000000000000056241344621116200165270ustar00rootroot00000000000000#include template struct Remove { Container A; Container B; // copy of initial data T value; Policy policy; template Remove(Policy p_, const Range& X, T value) : A(X.begin(), X.end()), B(X.begin(), X.end()), value(value), policy(p_) {} void operator()(void) { thrust::remove(policy, A.begin(), A.end(), value); } void reset(void) { // restore initial data thrust::copy(policy, B.begin(), B.end(), A.begin()); } }; template struct RemoveCopy { Container1 A; Container2 B; T value; Policy policy; template RemoveCopy(Policy p_, const Range1& X, const Range2& Y, T value) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), value(value), policy(p_) {} void operator()(void) { thrust::remove_copy(policy, A.begin(), A.end(), B.begin(), value); } void reset(void) { // restore initial data thrust::copy(policy, B.begin(), B.end(), A.begin()); } }; template > struct RemoveIf { Container1 A, A_copy; Container2 B; Predicate pred; Policy policy; template RemoveIf(Policy p_, const Range1& X, const Range2& Y, Predicate pred = Predicate()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), B(Y.begin(), Y.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::remove_if(policy, A.begin(), A.end(), B.begin(), pred); } void reset(void) { // restore initial data thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; template > struct RemoveCopyIf { Container1 A, A_copy; Container2 B; Container3 C; Predicate pred; Policy policy; template RemoveCopyIf(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, Predicate pred = Predicate()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::remove_copy_if(policy, A.begin(), A.end(), B.begin(), C.begin(), pred); } void reset(void) { // restore initial data thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; thrust-1.9.5/perf_test/replace.h000066400000000000000000000061311344621116200166370ustar00rootroot00000000000000#include template struct Replace { Container A, A_copy; T old_value, new_value; Policy policy; template Replace(Policy p_, const Range& X, const T& old_value, const T& new_value) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), old_value(old_value), new_value(new_value), policy(p_) {} void operator()(void) { thrust::replace(policy, A.begin(), A.end(), old_value, new_value); } void reset(void) { // restore initial data thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; template , typename T = typename Container1::value_type> struct ReplaceIf { Container1 A, A_copy; Container2 B; Predicate pred; T new_value; Policy policy; template ReplaceIf(Policy p_, const Range1& X, const Range2& Y, Predicate pred, const T& new_value) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), B(Y.begin(), Y.end()), pred(pred), new_value(new_value), policy(p_) {} void operator()(void) { thrust::replace_if(policy, A.begin(), A.end(), B.begin(), pred, new_value); } void reset(void) { // restore initial data thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; template struct ReplaceCopy { Container1 A; Container2 B; T old_value, new_value; Policy policy; template ReplaceCopy(Policy p_, const Range1& X, const Range2& Y, const T& old_value, const T& new_value) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), old_value(old_value), new_value(new_value), policy(p_) {} void operator()(void) { thrust::replace_copy(policy, A.begin(), A.end(), B.begin(), old_value, new_value); } }; template , typename T = typename Container1::value_type> struct ReplaceCopyIf { Container1 A, A_copy; // input Container2 B; // stencil Container3 C; // output Predicate pred; T new_value; Policy policy; template ReplaceCopyIf(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, Predicate pred, const T& new_value) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), pred(pred), new_value(new_value), policy(p_) {} void operator()(void) { thrust::replace_copy_if(policy, A.begin(), A.end(), B.begin(), C.begin(), pred, new_value); } }; thrust-1.9.5/perf_test/reverse.h000066400000000000000000000017561344621116200167070ustar00rootroot00000000000000#include template struct Reverse { Container A, A_copy; Policy policy; template Reverse(Policy p_, const Range& X) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), policy(p_) {} void operator()(void) { thrust::reverse(policy, A.begin(), A.end()); } void reset(void) { // restore initial data thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; template struct ReverseCopy { Container1 A; Container2 B; Policy policy; template ReverseCopy(Policy p_, const Range1& X, const Range2& Y) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), policy(p_) {} void operator()(void) { thrust::reverse_copy(policy, A.begin(), A.end(), B.begin()); } }; thrust-1.9.5/perf_test/scan.h000066400000000000000000000071561344621116200161600ustar00rootroot00000000000000#include template > struct InclusiveScan { Container1 A; Container2 B; BinaryFunction binary_op; Policy policy; template InclusiveScan(Policy p_, const Range1& X, const Range2& Y, BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::inclusive_scan(policy, A.begin(), A.end(), B.begin(), binary_op); } }; template > struct ExclusiveScan { Container1 A; Container2 B; T init; BinaryFunction binary_op; Policy policy; template ExclusiveScan(Policy p_, const Range1& X, const Range2& Y, T init = T(0), BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), init(init), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::exclusive_scan(policy, A.begin(), A.end(), B.begin(), init, binary_op); } }; template , typename BinaryFunction = thrust::plus > struct InclusiveScanByKey { Container1 A; Container2 B; Container3 C; BinaryPredicate binary_pred; BinaryFunction binary_op; Policy policy; template InclusiveScanByKey(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, BinaryPredicate binary_pred = BinaryPredicate(), BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), binary_pred(binary_pred), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::inclusive_scan_by_key(policy, A.begin(), A.end(), B.begin(), C.begin(), binary_pred, binary_op); } }; template , typename BinaryFunction = thrust::plus > struct ExclusiveScanByKey { Container1 A; Container2 B; Container3 C; T init; BinaryPredicate binary_pred; BinaryFunction binary_op; Policy policy; template ExclusiveScanByKey(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, T init = T(0), BinaryPredicate binary_pred = BinaryPredicate(), BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), init(init), binary_pred(binary_pred), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::exclusive_scan_by_key(policy, A.begin(), A.end(), B.begin(), C.begin(), init, binary_pred, binary_op); } }; thrust-1.9.5/perf_test/scatter.h000066400000000000000000000030111344621116200166630ustar00rootroot00000000000000#include template struct Scatter { Container1 A; // map Container2 B; // source Container3 C; // output Policy policy; template Scatter(Policy p_, const Range1& X, const Range2& Y, const Range3& Z) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), policy(p_) {} void operator()(void) { thrust::scatter(policy, A.begin(), A.end(), B.begin(), C.begin()); } }; template > struct ScatterIf { Container1 A; // map Container2 B; // stencil Container3 C; // source Container4 D; // output Predicate pred; Policy policy; template ScatterIf(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, const Range4& W, Predicate pred = Predicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), D(W.begin(), W.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::scatter_if(policy, A.begin(), A.end(), B.begin(), C.begin(), D.begin(), pred); } }; thrust-1.9.5/perf_test/sequence.h000066400000000000000000000005031344621116200170310ustar00rootroot00000000000000#include template struct Sequence { Container A; Policy policy; template Sequence(Policy p_, const Range& X) : A(X.begin(), X.end()), policy(p_) {} void operator()(void) { thrust::sequence(policy, A.begin(), A.end()); } }; thrust-1.9.5/perf_test/set_operations.h000066400000000000000000000106631344621116200202670ustar00rootroot00000000000000#include #include template > struct SetDifference { Container1 A; Container2 B; Container3 C; StrictWeakCompare comp; Policy policy; template SetDifference(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, StrictWeakCompare comp = StrictWeakCompare()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp), policy(p_) { thrust::stable_sort(policy, A.begin(), A.end(), comp); thrust::stable_sort(policy, B.begin(), B.end(), comp); } void operator()(void) { size_t size = thrust::set_difference(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp) - C.begin(); #ifdef _PRINT static bool print = true; #else static bool print = false; #endif if (print) { printf("diff= %d\n", (int)size); print = false; } } }; template > struct SetIntersection { Container1 A; Container2 B; Container3 C; StrictWeakCompare comp; Policy policy; template SetIntersection(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, StrictWeakCompare comp = StrictWeakCompare()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp), policy(p_) { thrust::stable_sort(policy, A.begin(), A.end(), comp); thrust::stable_sort(policy, B.begin(), B.end(), comp); } void operator()(void) { size_t size = thrust::set_intersection(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp) - C.begin(); #ifdef _PRINT static bool print = true; #else static bool print = false; #endif if (print) { printf("inter= %d\n", (int)size); print = false; } } }; template > struct SetSymmetricDifference { Container1 A; Container2 B; Container3 C; StrictWeakCompare comp; Policy policy; template SetSymmetricDifference(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, StrictWeakCompare comp = StrictWeakCompare()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp), policy(p_) { thrust::stable_sort(policy, A.begin(), A.end(), comp); thrust::stable_sort(policy, B.begin(), B.end(), comp); } void operator()(void) { size_t size = thrust::set_symmetric_difference(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp) - C.begin(); #ifdef _PRINT static bool print = true; #else static bool print = false; #endif if (print) { printf("sym_dif= %d\n", (int)size); print = false; } } }; template > struct SetUnion { Container1 A; Container2 B; Container3 C; StrictWeakCompare comp; Policy policy; template SetUnion(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, StrictWeakCompare comp = StrictWeakCompare()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), comp(comp), policy(p_) { thrust::stable_sort(policy, A.begin(), A.end(), comp); thrust::stable_sort(policy, B.begin(), B.end(), comp); } void operator()(void) { size_t size = thrust::set_union(policy, A.begin(), A.end(), B.begin(), B.end(), C.begin(), comp) - C.begin(); #ifdef _PRINT static bool print = true; #else static bool print = false; #endif if (print) { printf("union= %d\n", (int)size); print = false; } } }; thrust-1.9.5/perf_test/set_operations_by_key.h000066400000000000000000000161251344621116200216300ustar00rootroot00000000000000#include #include #include #if THRUST_VERSION > 100700 template > struct SetDifferenceByKey { Container1 keys1; Container2 keys2; Container3 values1; Container4 values2; Container5 out_keys; Container6 out_values; StrictWeakCompare comp; Policy policy; template SetDifferenceByKey(Policy p_, const Range1& keys1_, const Range2& keys2_, const Range3& values1_, const Range4& values2_, Range5 &out_keys_, Range6 &out_values_, StrictWeakCompare comp_ = StrictWeakCompare()) : keys1(keys1_.begin(), keys1_.end()), keys2(keys2_.begin(), keys2_.end()), values1(values1_.begin(), values1_.end()), values2(values2_.begin(), values2_.end()), out_keys(out_keys_.begin(), out_keys_.end()), out_values(out_values_.begin(), out_values_.end()), comp(comp_), policy(p_) { thrust::stable_sort(policy, keys1.begin(), keys1.end(), comp); thrust::stable_sort(policy, keys2.begin(), keys2.end(), comp); } void operator()(void) { thrust::set_difference_by_key(policy, keys1.begin(), keys1.end(), keys2.begin(), keys2.end(), values1.begin(), values2.begin(), out_keys.begin(), out_values.begin(), comp); } }; template > struct SetIntersectionByKey { Container1 keys1; Container2 keys2; Container3 values; Container4 out_keys; Container5 out_values; StrictWeakCompare comp; Policy policy; template SetIntersectionByKey(Policy p_, const Range1& keys1_, const Range2& keys2_, const Range3& values_, Range4 &out_keys_, Range5 &out_values_, StrictWeakCompare comp_ = StrictWeakCompare()) : keys1(keys1_.begin(), keys1_.end()), keys2(keys2_.begin(), keys2_.end()), values(values_.begin(), values_.end()), out_keys(out_keys_.begin(), out_keys_.end()), out_values(out_values_.begin(), out_values_.end()), comp(comp_), policy(p_) { thrust::stable_sort(policy, keys1.begin(), keys1.end(), comp); thrust::stable_sort(policy, keys2.begin(), keys2.end(), comp); } void operator()(void) { thrust::set_intersection_by_key(policy, keys1.begin(), keys1.end(), keys2.begin(), keys2.end(), values.begin(), out_keys.begin(), out_values.begin(), comp); } }; template > struct SetUnionByKey { Container1 keys1; Container2 keys2; Container3 values1; Container4 values2; Container5 out_keys; Container6 out_values; StrictWeakCompare comp; Policy policy; template SetUnionByKey(Policy p_, const Range1& keys1_, const Range2& keys2_, const Range3& values1_, const Range4& values2_, Range5 &out_keys_, Range6 &out_values_, StrictWeakCompare comp_ = StrictWeakCompare()) : keys1(keys1_.begin(), keys1_.end()), keys2(keys2_.begin(), keys2_.end()), values1(values1_.begin(), values1_.end()), values2(values2_.begin(), values2_.end()), out_keys(out_keys_.begin(), out_keys_.end()), out_values(out_values_.begin(), out_values_.end()), comp(comp_), policy(p_) { thrust::stable_sort(policy, keys1.begin(), keys1.end(), comp); thrust::stable_sort(policy, keys2.begin(), keys2.end(), comp); } void operator()(void) { thrust::set_union_by_key(policy, keys1.begin(), keys1.end(), keys2.begin(), keys2.end(), values1.begin(), values2.begin(), out_keys.begin(), out_values.begin(), comp); } }; template > struct SetSymmetricDifferenceByKey { Container1 keys1; Container2 keys2; Container3 values1; Container4 values2; Container5 out_keys; Container6 out_values; StrictWeakCompare comp; Policy policy; template SetSymmetricDifferenceByKey(Policy p_, const Range1& keys1_, const Range2& keys2_, const Range3& values1_, const Range4& values2_, Range5 &out_keys_, Range6 &out_values_, StrictWeakCompare comp_ = StrictWeakCompare()) : keys1(keys1_.begin(), keys1_.end()), keys2(keys2_.begin(), keys2_.end()), values1(values1_.begin(), values1_.end()), values2(values2_.begin(), values2_.end()), out_keys(out_keys_.begin(), out_keys_.end()), out_values(out_values_.begin(), out_values_.end()), comp(comp_), policy(p_) { thrust::stable_sort(policy, keys1.begin(), keys1.end(), comp); thrust::stable_sort(policy, keys2.begin(), keys2.end(), comp); } void operator()(void) { thrust::set_symmetric_difference_by_key(policy, keys1.begin(), keys1.end(), keys2.begin(), keys2.end(), values1.begin(), values2.begin(), out_keys.begin(), out_values.begin(), comp); } }; #endif // THRUST_VERSION thrust-1.9.5/perf_test/sort.h000066400000000000000000000117301344621116200162140ustar00rootroot00000000000000#include template > struct Sort { Container A, A_copy; StrictWeakOrdering comp; Policy policy; template Sort(Policy p_, const Range& X, StrictWeakOrdering comp = StrictWeakOrdering()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), comp(comp), policy(p_) {} void operator()(void) { thrust::sort(policy, A.begin(), A.end(), comp); } void reset(void) { thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; template struct MyCompare : private thrust::less { inline __host__ __device__ bool operator()(const T& x, const T &y) const { return thrust::less::operator()(x,y); } }; template struct ComparisonSort : Sort > { typedef Sort > super_t; template ComparisonSort(Policy p_, const Range& X) : super_t(p_, X) {} }; template > struct StableSort { Container A, A_copy; StrictWeakOrdering comp; Policy policy; template StableSort(Policy p_, const Range& X, StrictWeakOrdering comp = StrictWeakOrdering()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), comp(comp), policy(p_) {} void operator()(void) { thrust::stable_sort(policy, A.begin(), A.end(), comp); } void reset(void) { thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; template > struct SortByKey { Container1 A, A_copy; // keys Container2 B, B_copy; // values StrictWeakOrdering comp; Policy policy; template SortByKey(Policy p_, const Range1& X, const Range2& Y, StrictWeakOrdering comp = StrictWeakOrdering()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), B(Y.begin(), Y.end()), B_copy(Y.begin(), Y.end()), comp(comp), policy(p_) {} void operator()(void) { thrust::sort_by_key(A.begin(), A.end(), B.begin(), comp); } void reset(void) { thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); thrust::copy(policy, B_copy.begin(), B_copy.end(), B.begin()); } }; template struct ComparisonSortByKey : SortByKey > { typedef SortByKey > super_t; template ComparisonSortByKey(Policy p_, const Range1& X, const Range2& Y) : super_t(p_, X,Y) {} }; template > struct StableSortByKey { Container1 A, A_copy; // keys Container2 B, B_copy; // values StrictWeakOrdering comp; Policy policy; template StableSortByKey(Policy p_, const Range1& X, const Range2& Y, StrictWeakOrdering comp = StrictWeakOrdering()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), B(Y.begin(), Y.end()), B_copy(Y.begin(), Y.end()), comp(comp), policy(p_) {} void operator()(void) { thrust::stable_sort_by_key(policy, A.begin(), A.end(), B.begin(), comp); } void reset(void) { thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); thrust::copy(policy, B_copy.begin(), B_copy.end(), B.begin()); } }; template > struct IsSorted { Container A; StrictWeakOrdering comp; Policy policy; template IsSorted(Policy p_, const Range& X, StrictWeakOrdering comp = StrictWeakOrdering()) : A(X.begin(), X.end()), comp(comp), policy(p_) {} void operator()(void) { thrust::is_sorted(policy, A.begin(), A.end(), comp); } }; template > struct IsSortedUntil { Container A; StrictWeakOrdering comp; Policy policy; template IsSortedUntil(Policy p_, const Range& X, StrictWeakOrdering comp = StrictWeakOrdering()) : A(X.begin(), X.end()), comp(comp), policy(p_) {} void operator()(void) { thrust::is_sorted_until(policy, A.begin(), A.end(), comp); } }; thrust-1.9.5/perf_test/swap.h000066400000000000000000000007411344621116200161770ustar00rootroot00000000000000#include template struct SwapRanges { Container1 A; Container2 B; Policy policy; template SwapRanges(Policy p_, const Range1& X, const Range2& Y) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), policy(p_) {} void operator()(void) { thrust::swap_ranges(policy, A.begin(), A.end(), B.begin()); } }; thrust-1.9.5/perf_test/tabulate.h000066400000000000000000000010721344621116200170240ustar00rootroot00000000000000#include #include template > struct Tabulate { Container A; UnaryFunction unary_op; Policy policy; template Tabulate(Policy p_, const Range& X, UnaryFunction unary_op = UnaryFunction()) : A(X.begin(), X.end()), unary_op(unary_op), policy(p_) {} void operator()(void) { thrust::tabulate(policy, A.begin(), A.end(), unary_op); } }; thrust-1.9.5/perf_test/tbb_timer.h000066400000000000000000000004321344621116200171710ustar00rootroot00000000000000#pragma once #include struct tbb_timer { tbb::tick_count start; tbb_timer() { restart(); } void restart() { start = tbb::tick_count::now(); } double elapsed_seconds() { return (tbb::tick_count::now() - start).seconds(); } }; thrust-1.9.5/perf_test/transform.h000066400000000000000000000071501344621116200172410ustar00rootroot00000000000000#include template > struct UnaryTransform { Container1 A; Container2 B; UnaryFunction unary_op; Policy policy; template UnaryTransform(Policy p_, const Range1& X, const Range2& Y, UnaryFunction unary_op = UnaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), unary_op(unary_op), policy(p_) {} void operator()(void) { thrust::transform(policy, A.begin(), A.end(), B.begin(), unary_op); } }; template , typename UnaryFunction = thrust::negate > struct UnaryTransformIf { Container1 A; // input Container2 B; // stencil Container3 C; // output Predicate pred; UnaryFunction unary_op; Policy policy; template UnaryTransformIf(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, Predicate pred = Predicate(), UnaryFunction unary_op = UnaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), pred(pred), unary_op(unary_op), policy(p_) {} void operator()(void) { thrust::transform_if(policy, A.begin(), A.end(), B.begin(), C.begin(), unary_op, pred); } }; template > struct BinaryTransform { Container1 A; Container2 B; Container3 C; BinaryFunction binary_op; Policy policy; template BinaryTransform(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::transform(policy, A.begin(), A.end(), B.begin(), C.begin(), binary_op); } }; template , typename BinaryFunction = thrust::plus > struct BinaryTransformIf { Container1 A; // input Container2 B; // input Container3 C; // stencil Container4 D; // output Predicate pred; BinaryFunction binary_op; Policy policy; template BinaryTransformIf(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, const Range4& W, Predicate pred = Predicate(), BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), D(W.begin(), W.end()), pred(pred), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::transform_if(policy, A.begin(), A.end(), B.begin(), C.begin(), D.begin(), binary_op, pred); } }; thrust-1.9.5/perf_test/transform_reduce.h000066400000000000000000000014731344621116200205720ustar00rootroot00000000000000#include template , typename T = typename Container::value_type, typename BinaryFunction = thrust::plus > struct TransformReduce { Container A; UnaryFunction unary_op; T init; BinaryFunction binary_op; Policy policy; template TransformReduce(Policy p_, const Range& X, UnaryFunction unary_op = UnaryFunction(), T init = T(0), BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), unary_op(unary_op), init(init), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::transform_reduce(policy, A.begin(), A.end(), unary_op, init, binary_op); } }; thrust-1.9.5/perf_test/transform_scan.h000066400000000000000000000037021344621116200202440ustar00rootroot00000000000000#include template , typename BinaryFunction = thrust::plus > struct TransformInclusiveScan { Container1 A; Container2 B; UnaryFunction unary_op; BinaryFunction binary_op; Policy policy; template TransformInclusiveScan(Policy p_, const Range1& X, const Range2& Y, UnaryFunction unary_op = UnaryFunction(), BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), unary_op(unary_op), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::transform_inclusive_scan(policy, A.begin(), A.end(), B.begin(), unary_op, binary_op); } }; template , typename T = typename Container1::value_type, typename BinaryFunction = thrust::plus > struct TransformExclusiveScan { Container1 A; Container2 B; T init; UnaryFunction unary_op; BinaryFunction binary_op; Policy policy; template TransformExclusiveScan(Policy p_, const Range1& X, const Range2& Y, UnaryFunction unary_op = UnaryFunction(), T init = T(0), BinaryFunction binary_op = BinaryFunction()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), init(init), unary_op(unary_op), binary_op(binary_op), policy(p_) {} void operator()(void) { thrust::transform_exclusive_scan(policy, A.begin(), A.end(), B.begin(), unary_op, init, binary_op); } }; thrust-1.9.5/perf_test/uninitialized_copy.h000066400000000000000000000007671344621116200211370ustar00rootroot00000000000000#include template struct UninitializedCopy { Container1 A; Container2 B; Policy policy; template UninitializedCopy(Policy p_, const Range1& X, const Range2& Y) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), policy(p_) {} void operator()(void) { thrust::uninitialized_copy(policy, A.begin(), A.end(), B.begin()); } }; thrust-1.9.5/perf_test/uninitialized_fill.h000066400000000000000000000016471344621116200211110ustar00rootroot00000000000000#include template struct UninitializedFill { Container A; T value; Policy policy; template UninitializedFill(Policy p_, const Range& X, T value = T()) : A(X.begin(), X.end()), value(value), policy(p_) {} void operator()(void) { thrust::uninitialized_fill(policy, A.begin(), A.end(), value); } }; template struct UninitializedFillN { Container A; T value; Policy policy; template UninitializedFillN(Policy p_, const Range& X, T value = T()) : A(X.begin(), X.end()), value(value), policy(p_) {} void operator()(void) { thrust::uninitialized_fill_n(policy, A.begin(), A.size(), value); } }; thrust-1.9.5/perf_test/unique.h000066400000000000000000000061061344621116200165340ustar00rootroot00000000000000#include template > struct Unique { Container A, A_copy; BinaryPredicate pred; Policy policy; template Unique(Policy p_, const Range& X, BinaryPredicate pred = BinaryPredicate()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::unique(policy, A.begin(), A.end(), pred); } void reset(void) { thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); } }; template > struct UniqueCopy { Container1 A; Container2 B; BinaryPredicate pred; Policy policy; template UniqueCopy(Policy p_, const Range1& X, const Range2& Y, BinaryPredicate pred = BinaryPredicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::unique_copy(policy, A.begin(), A.end(), B.begin(), pred); } }; template > struct UniqueByKey { Container1 A, A_copy; // keys Container2 B, B_copy; // values BinaryPredicate pred; Policy policy; template UniqueByKey(Policy p_, const Range1& X, const Range2& Y, BinaryPredicate pred = BinaryPredicate()) : A(X.begin(), X.end()), A_copy(X.begin(), X.end()), B(Y.begin(), Y.end()), B_copy(Y.begin(), Y.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::unique_by_key(policy, A.begin(), A.end(), B.begin(), pred); } void reset(void) { thrust::copy(policy, A_copy.begin(), A_copy.end(), A.begin()); thrust::copy(policy, B_copy.begin(), B_copy.end(), B.begin()); } }; template > struct UniqueByKeyCopy { Container1 A; // input keys Container2 B; // input values Container3 C; // output keys Container4 D; // output values BinaryPredicate pred; Policy policy; template UniqueByKeyCopy(Policy p_, const Range1& X, const Range2& Y, const Range3& Z, const Range4& W, BinaryPredicate pred = BinaryPredicate()) : A(X.begin(), X.end()), B(Y.begin(), Y.end()), C(Z.begin(), Z.end()), D(W.begin(), W.end()), pred(pred), policy(p_) {} void operator()(void) { thrust::unique_by_key_copy(policy, A.begin(), A.end(), B.begin(), C.begin(), D.begin(), pred); } }; thrust-1.9.5/performance/000077500000000000000000000000001344621116200153605ustar00rootroot00000000000000thrust-1.9.5/performance/CMakeLists.txt000066400000000000000000000035731344621116200201300ustar00rootroot00000000000000# message(STATUS "Adding \"testing\"") FILE(GLOB SOURCES_TEST *.test) list(LENGTH SOURCES_TEST index) message(STATUS "Found ${index} performance tests") find_package(PythonInterp) if (NOT ${PYTHONINTERP_FOUND}) message("** Python is not found. Skipping performance tests") return() endif() set(CMAKE_INCLUDE_CURRENT_DIR ON) cuda_include_directories(${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_SOURCE_DIR}) cuda_include_directories(${CMAKE_SOURCE_DIR}/testing) include_directories(${CMAKE_SOURCE_DIR}/testing) set(compile_source "${CMAKE_CURRENT_BINARY_DIR}/compile_source.py") FILE(WRITE ${compile_source} "import sys\n" "sys.path.append(\"${CMAKE_CURRENT_SOURCE_DIR}\")\n" "from build.perftest import compile_test\n" "compile_test(str(sys.argv[1]),str(sys.argv[2]))\n" ) set(targets "") set(perf_sources "") foreach(src ${SOURCES_TEST}) get_filename_component(exec_name ${src} NAME_WE) set(target perf-${exec_name}) set(dst ${CMAKE_CURRENT_BINARY_DIR}/${exec_name}.cu) add_custom_command( OUTPUT ${dst} DEPENDS ${src} COMMAND "${PYTHON_EXECUTABLE}" ARGS ${compile_source}$ "" ${src} "" ${dst}$ "" ${dst} COMMENT "Generate perforfmance test \"${dst}\" from \"${src}\" " ) set(cuda_src ${dst}) thrust_add_executable(${target} ${cuda_src}) set_target_properties(${target} PROPERTIES OUTPUT_NAME ${exec_name}) install(TARGETS ${target} DESTINATION "performance/${HOST_BACKEND}_host_${DEVICE_BACKEND}_device_${THRUST_MODE}" OPTIONAL COMPONENT performance-bin) list(APPEND targets ${target}) list(APPEND perf_sources ${cuda_src}) endforeach() add_custom_target(performance-bin DEPENDS ${targets}) add_custom_target(install-performance-bin COMMAND "${CMAKE_COMMAND}" -DCMAKE_INSTALL_COMPONENT=performance-bin -P "${CMAKE_BINARY_DIR}/cmake_install.cmake" ) # install(FILES ${perf_sources} DESTINATION "performance" COMPONENT performance) thrust-1.9.5/performance/SConscript000066400000000000000000000037171344621116200174020ustar00rootroot00000000000000import sys # enable python to find the module module_path = Dir('.').srcnode().abspath sys.path.append(module_path) from build.perftest import compile_test import os Import('env') my_env = env.Clone() def cu_build_function(source, target, env): compile_test(str(source[0]), str(target[0])) # define a rule to build a .cu from a .test cu_builder = Builder(action = cu_build_function, suffix = '.cu', src_suffix = '.test') my_env.Append(BUILDERS = {'CUFile' : cu_builder}) # define a rule to build a report from an executable xml_builder = Builder(action = os.path.join('"' + str(my_env.Dir('.')), '$SOURCE" > $TARGET'), suffix = '.xml', src_suffix = my_env['PROGSUFFIX']) my_env.Append(BUILDERS = {'XMLFile' : xml_builder}) my_env.Append(CPPPATH = [Dir('.').srcnode(), Dir('#/testing')]) cu_list = [] program_list = [] xml_list = [] build_files = [os.path.join('build', f) for f in ['perftest.py', 'test_function_template.cxx']] # describe dependency graph: # xml -> program -> .cu -> .test for test in my_env.Glob('*.test'): cu = my_env.CUFile(test) my_env.Depends(cu, build_files) cu_list.append(cu) prog = my_env.Program(cu) program_list.append(prog) xml = my_env.XMLFile(prog) xml_list.append(xml) # make aliases for groups of targets run_performance_tests_alias = my_env.Alias("run_performance_tests", xml_list) performance_tests_alias = my_env.Alias("performance_tests", program_list) # when no build target is specified, by default we build the programs my_env.Default(performance_tests_alias) # output a help message my_env.Help(""" Type: 'scons' to build all performance test programs. Type: 'scons run_performance_tests' to run all performance tests and output reports. Type: 'scons ' to build a single performance test program of interest. Type: 'scons .xml' to run a single performance test of interest and output a report in an XML file. """) thrust-1.9.5/performance/adjacent_difference.test000066400000000000000000000017171344621116200222120ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input = unittest::random_integers<$InputType>($InputSize); thrust::device_vector<$InputType> d_input = h_input; thrust::host_vector<$InputType> h_output($InputSize); thrust::device_vector<$InputType> d_output($InputSize); thrust::adjacent_difference(h_input.begin(), h_input.end(), h_output.begin()); thrust::adjacent_difference(d_input.begin(), d_input.end(), d_output.begin()); ASSERT_EQUAL(h_output, d_output); """ TIME = \ """ thrust::adjacent_difference(d_input.begin(), d_input.end(), d_output.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(2*sizeof($InputType) * double($InputSize)); """ InputTypes = ['int'] InputSizes = [2**24] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/axpy.test000066400000000000000000000037261344621116200172520ustar00rootroot00000000000000PREAMBLE = \ """ #include #include //#include #include template struct axpy { T a; axpy(T a) : a(a) {} __host__ __device__ T operator()(T x, T y) const { return a * x + y; } }; template void axpy_fast(const typename Vector::value_type a, const Vector& x, Vector& y) { typedef typename Vector::value_type T; thrust::transform(x.begin(), x.end(), y.begin(), y.begin(), axpy(a)); } template void axpy_slow(const typename Vector::value_type a, const Vector& x, Vector& y) { typedef typename Vector::value_type T; // temp <- a Vector temp(x.size(), a); // temp <- a * x thrust::transform(x.begin(), x.end(), temp.begin(), temp.begin(), thrust::multiplies()); // y <- a * x + y thrust::transform(temp.begin(), temp.end(), y.begin(), y.begin(), thrust::plus()); } """ INITIALIZE = \ """ //cublasInit(); thrust::host_vector<$InputType> h_x = unittest::random_samples<$InputType>($InputSize); thrust::host_vector<$InputType> h_y = unittest::random_samples<$InputType>($InputSize); thrust::device_vector<$InputType> d_x = h_x; thrust::device_vector<$InputType> d_y = h_y; $InputType a = 2.0; $Method(a, h_x, h_y); $Method(a, d_x, d_y); ASSERT_EQUAL(h_x, d_x); ASSERT_EQUAL(h_y, d_y); """ TIME = \ """ $Method(a, d_x, d_y); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(2 * double($InputSize)); RECORD_BANDWIDTH(3* sizeof($InputType) * double($InputSize)); """ InputTypes = ['float', 'double'] InputSizes = [2**24] Methods = ['axpy_fast', 'axpy_slow'] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes), ('Method', Methods)] thrust-1.9.5/performance/binary_search.test000066400000000000000000000024161344621116200210750ustar00rootroot00000000000000PREAMBLE = \ """ #include #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::sort(h_keys.begin(), h_keys.end()); thrust::sort(d_keys.begin(), d_keys.end()); ASSERT_EQUAL(d_keys, h_keys); thrust::host_vector<$KeyType> h_search = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_search = h_search; thrust::host_vector h_output($InputSize); thrust::device_vector d_output($InputSize); thrust::binary_search(h_keys.begin(), h_keys.end(), h_search.begin(), h_search.end(), h_output.begin()); thrust::binary_search(d_keys.begin(), d_keys.end(), d_search.begin(), d_search.end(), d_output.begin()); ASSERT_EQUAL(d_output, h_output); """ TIME = \ """ thrust::binary_search(d_keys.begin(), d_keys.end(), d_search.begin(), d_search.end(), d_output.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); """ KeyTypes = ['int'] InputSizes = [2**24] TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/build/000077500000000000000000000000001344621116200164575ustar00rootroot00000000000000thrust-1.9.5/performance/build/__init__.py000066400000000000000000000001041344621116200205630ustar00rootroot00000000000000from perftest import * from testsuite import * from report import * thrust-1.9.5/performance/build/perftest.h000066400000000000000000000124011344621116200204620ustar00rootroot00000000000000#include #include #include #include //#include //#include #define RECORD_RESULT(name, value, units) { std::cout << " " << std::endl; } #define RECORD_TIME() RECORD_RESULT("Time", best_time, "seconds") #define RECORD_RATE(name, value, units) RECORD_RESULT(name, (double(value)/best_time), units) #define RECORD_BANDWIDTH(bytes) RECORD_RATE("Bandwidth", double(bytes) / 1e9, "GBytes/s") #define RECORD_THROUGHPUT(value) RECORD_RATE("Throughput", double(value) / 1e9, "GOp/s") #define RECORD_SORTING_RATE(size) RECORD_RATE("Sorting", double(size) / 1e6, "MKeys/s") #define RECORD_VARIABLE(name, value) { std::cout << " " << std::endl; } #define RECORD_TEST_STATUS(result, message) { std::cout << " " << std::endl; } #define RECORD_TEST_SUCCESS() RECORD_TEST_STATUS("Success", "") #define RECORD_TEST_FAILURE(message) RECORD_TEST_STATUS("Failure", message) #define BEGIN_TEST(name) { std::cout << "" << std::endl; } #define END_TEST() { std::cout << "" << std::endl; } #define BEGIN_TESTSUITE(name) { std::cout << "" << std::endl << "" << std::endl; } #define END_TESTSUITE() { std::cout << "" << std::endl; } #if defined(__GNUC__) // GCC #define __HOST_COMPILER_NAME__ "GCC" # if defined(__GNUC_PATCHLEVEL__) #define __HOST_COMPILER_VERSION__ (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) # else #define __HOST_COMPILER_VERSION__ (__GNUC__ * 10000 + __GNUC_MINOR__ * 100) # endif #elif defined(_MSC_VER) // Microsoft Visual C++ #define __HOST_COMPILER_NAME__ "MSVC" #define __HOST_COMPILER_VERSION__ _MSC_VER #elif defined(__INTEL_COMPILER) // Intel Compiler #define __HOST_COMPILER_NAME__ "ICC" #define __HOST_COMPILER_VERSION__ __INTEL_COMPILER #else // Unknown #define __HOST_COMPILER_NAME__ "UNKNOWN" #define __HOST_COMPILER_VERSION__ 0 #endif inline void RECORD_PLATFORM_INFO(void) { #if THRUST_DEVICE_SYSTEM==THRUST_DEVICE_SYSTEM_CUDA int deviceCount; cudaGetDeviceCount(&deviceCount); if (deviceCount == 0){ std::cerr << "There is no device supporting CUDA" << std::endl; exit(1); } int dev; cudaGetDevice(&dev); cudaDeviceProp deviceProp; cudaGetDeviceProperties(&deviceProp, dev); if (dev == 0 && deviceProp.major == 9999 && deviceProp.minor == 9999){ std::cerr << "There is no device supporting CUDA" << std::endl; exit(1); } std::cout << "" << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << " " << std::endl; std::cout << "" << std::endl; #endif } inline void PROCESS_ARGUMENTS(int argc, char **argv) { for(int i = 1; i < argc; ++i) { if(std::string(argv[i]) == "--device") { ++i; if(i == argc) { std::cerr << "usage: --device n" << std::endl; exit(-1); } #if THRUST_DEVICE_SYSTEM==THRUST_DEVICE_SYSTEM_CUDA int device_index = atoi(argv[i]); cudaSetDevice(device_index); #endif } } } thrust-1.9.5/performance/build/perftest.py000066400000000000000000000112511344621116200206650ustar00rootroot00000000000000def product(*iterables): """compute the cartesian product of a list of iterables >>> for i in product(['a','b','c'],[1,2]): ... print i ... ['a', 1] ['a', 2] ['b', 1] ['b', 2] ['c', 1] ['c', 2] """ if iterables: for head in iterables[0]: for remainder in product(*iterables[1:]): yield [head] + remainder else: yield [] #### # Function generators def make_test_function_template(INITIALIZE, TIME, FINALIZE): import string import os function_template_file = os.path.join( os.path.split(__file__)[0], 'test_function_template.cxx') # test_function_template has locations for $PREAMBLE $INITIALIZE etc. test_template = string.Template(open(function_template_file).read()) sections = {'INITIALIZE' : INITIALIZE, 'TIME' : TIME, 'FINALIZE' : FINALIZE} # skeleton has supplied definitions for $INCLUDE and $PREAMBLE # and has locations for $InputType and $InputSize etc. skeleton = test_template.safe_substitute(sections) return string.Template(skeleton) def make_test_function(fname, TestVariablePairs, ftemplate): VariableDescription = '\n'.join(['RECORD_VARIABLE("%s","%s");' % pair for pair in TestVariablePairs]) fmap = dict(TestVariablePairs) fmap['DESCRIPTION'] = VariableDescription fmap['FUNCTION'] = fname return ftemplate.substitute(fmap) def generate_functions(pname, TestVariables, INITIALIZE, TIME, FINALIZE): ftemplate = make_test_function_template(INITIALIZE, TIME, FINALIZE) TestVariableNames = [ pair[0] for pair in TestVariables] TestVariableRanges = [ pair[1] for pair in TestVariables] for n,values in enumerate(product(*TestVariableRanges)): converted_values = [] for v in values: v = str(v) v = v.replace(" ","_") # C++ tokens we don't want v = v.replace(".","_") v = v.replace("<","_") v = v.replace(">","_") v = v.replace(",","_") v = v.replace(":","_") converted_values.append(v) fname = '_'.join( [pname] + converted_values ) TestVariablePairs = zip(TestVariableNames, values) yield (fname, make_test_function(fname, TestVariablePairs, ftemplate)) #### # Program generators def make_test_program(pname, functions, PREAMBLE = ""): parts = [] parts.append("#include ") parts.append(PREAMBLE) for fname,fcode in functions: parts.append(fcode) #TODO output TestVariables in somewhere parts.append("int main(int argc, char **argv)") parts.append("{") parts.append("PROCESS_ARGUMENTS(argc, argv);") parts.append("BEGIN_TESTSUITE(\"" + pname + "\");") parts.append("RECORD_PLATFORM_INFO();") for fname,fcode in functions: parts.append(fname + "();") parts.append("END_TESTSUITE();") parts.append("}") parts.append("\n") return "\n".join(parts) def generate_program(pname, TestVariables, PREAMBLE, INITIALIZE, TIME, FINALIZE): functions = list(generate_functions(pname, TestVariables, INITIALIZE, TIME, FINALIZE)) return make_test_program(pname, functions, PREAMBLE) ### # Test Input File -> Test Program def process_test_file(filename): import os pname = os.path.splitext(os.path.split(filename)[1])[0] test_env_file = os.path.join( os.path.split(__file__)[0], 'test_env.py') # XXX why does execfile() not give us the right namespace? exec open(test_env_file) exec open(filename) return generate_program(pname, TestVariables, PREAMBLE, INITIALIZE, TIME, FINALIZE) def compile_test(input_name, output_name): """Compiles a .test file into a .cu file""" open(output_name, 'w').write( process_test_file(input_name) ) ## # Simple Driver script if __name__ == '__main__': import os, sys if len(sys.argv) not in [2,3]: print "usage: %s test_input.py [test_output.cu]" % (sys.argv[0],) os.exit() input_name = sys.argv[1] if len(sys.argv) == 2: # reduce.test -> reduce.cu output_name = os.path.splitext(os.path.split(filename)[1])[0] + '.cu' else: output_name = sys.argv[2] # process_test_file returns a string containing # the whole test program (i.e. the text of a .cu file) compile_test(input_name, output_name) # this is just for show, scons integration would do this differently #import subprocess #subprocess.call('scons') #subprocess.call('./' + pname) #print "collecting data..." #output = subprocess.Popen(['./' + pname], stdout=subprocess.PIPE).communicate()[0] #print output thrust-1.9.5/performance/build/report.py000066400000000000000000000075451344621116200203570ustar00rootroot00000000000000from build import parse_testsuite_xml __all__ = ['plot_results','print_results'] #TODO add print_results which outputs a CSV file def full_label(name): known_labels = {'Throughput' : 'Throughput (GOp/s)', 'Sorting' : 'Sorting Rate (MKey/s)', 'Bandwidth' : 'Memory Bandwidth (GByte/s)', 'InputSize' : 'Input Size', 'KeyType' : 'Key Type' } if name in known_labels: return known_labels[name] else: return name def print_results(input_file, series_key, x_axis, y_axis, title=None, format=None, **kwargs): """Plot performance data stored in an XML file if format is None then the figure is shown, otherwise it is written to a file with the specified extension Example ------- input_file = 'reduce.xml' series_key = 'InputType' x_axis = 'InputSize' y_axis = 'Throughput' format = 'pdf' """ try: fid = open(input_file) except IOError: print "unable to open file '%s'" % input_file return TS = parse_testsuite_xml(fid) series_titles = set([test.variables[series_key] for (testname,test) in TS.tests.items()]) series = dict( zip(series_titles, [list() for s_title in series_titles]) ) for testname,test in TS.tests.items(): if x_axis in test.variables and y_axis in test.results: series[test.variables[series_key]].append( (test.variables[x_axis], test.results[y_axis]) ) print 'title,' + str(title) print 'x_axis_label,' + full_label(x_axis) print 'y_axis_label,' + full_label(y_axis) x_axis = set() for series_title,series_data in series.items(): x_axis.update([t[0] for t in series_data]) x_axis = sorted(x_axis) print ','.join( ['x_axis'] + [str(v) for v in x_axis]) for series_title,series_data in series.items(): series_data = dict(series_data) y_values = [] for x_value in x_axis: if x_value in series_data: y_values.append(str(series_data[x_value])) else: y_values.append('') print ','.join( [series_title] + [str(v) for v in y_values]) def plot_results(input_file, series_key, x_axis, y_axis, plot='loglog', dpi=72, title=None, format=None): """Plot performance data stored in an XML file if format is None then the figure is shown, otherwise it is written to a file with the specified extension Example ------- input_file = 'reduce.xml' series_key = 'InputType' x_axis = 'InputSize' y_axis = 'Throughput' format = 'pdf' """ try: fid = open(input_file) except IOError: print "unable to open file '%s'" % input_file return TS = parse_testsuite_xml(fid) series_titles = set([test.variables[series_key] for (testname,test) in TS.tests.items()]) series = dict( zip(series_titles, [list() for s_title in series_titles]) ) for testname,test in TS.tests.items(): if x_axis in test.variables and y_axis in test.results: series[test.variables[series_key]].append( (test.variables[x_axis], test.results[y_axis]) ) if title is None: title = TS.name import pylab pylab.figure() pylab.title(title) pylab.xlabel(full_label(x_axis)) pylab.ylabel(full_label(y_axis)) plotter = getattr(pylab, plot) for series_title,series_data in series.items(): series_data.sort() x_values = [val[0] for val in series_data] y_values = [val[1] for val in series_data] plotter(x_values, y_values, label=series_title) if len(series) >= 2: pylab.legend(loc=0) if format is None: pylab.show() else: import os fname = os.path.splitext(input_file)[0] + '.' + format pylab.savefig(fname, dpi=dpi) thrust-1.9.5/performance/build/test_env.py000066400000000000000000000005261344621116200206630ustar00rootroot00000000000000StandardTypes = ['char', 'unsigned char', 'short', 'unsigned short', 'int', 'unsigned int', 'long', 'unsigned long', 'float'] SignedIntegerTypes = ['char', 'short', 'int', 'long'] FloatingPointTypes = ['float','double'] StandardSizes = [2**k for k in range(4, 24)] TestVariables = [] PREAMBLE = "" INITIALIZE = "" TIME = "" FINALIZE = "" thrust-1.9.5/performance/build/test_function_template.cxx000066400000000000000000000044311344621116200237640ustar00rootroot00000000000000void $FUNCTION(void) { BEGIN_TEST(__FUNCTION__); $DESCRIPTION try { /************ BEGIN INITIALIZATION SECTION ************/ $INITIALIZE /************* END INITIALIZATION SECTION *************/ double warmup_time; { timer t; /************ BEGIN TIMING SECTION ************/ $TIME /************* END TIMING SECTION *************/ warmup_time = t.elapsed(); } // only verbose //std::cout << "warmup_time: " << warmup_time << " seconds" << std::endl; static const size_t NUM_TRIALS = 5; static const size_t MAX_ITERATIONS = 1000; static const double MAX_TEST_TIME = 0.5; //TODO allow to be set by user size_t NUM_ITERATIONS; if (warmup_time == 0) NUM_ITERATIONS = MAX_ITERATIONS; else NUM_ITERATIONS = std::min(MAX_ITERATIONS, std::max( (size_t) 1, (size_t) (MAX_TEST_TIME / warmup_time))); double trial_times[NUM_TRIALS]; for(size_t trial = 0; trial < NUM_TRIALS; trial++) { timer t; for(size_t i = 0; i < NUM_ITERATIONS; i++){ /************ BEGIN TIMING SECTION ************/ $TIME /************* END TIMING SECTION *************/ } trial_times[trial] = t.elapsed() / double(NUM_ITERATIONS); } // only verbose //for(size_t trial = 0; trial < NUM_TRIALS; trial++){ // std::cout << "trial[" << trial << "] : " << trial_times[trial] << " seconds\n"; //} double best_time = *std::min_element(trial_times, trial_times + NUM_TRIALS); /************ BEGIN FINALIZE SECTION ************/ $FINALIZE /************* END FINALIZE SECTION *************/ #if THRUST_DEVICE_SYSTEM==THRUST_DEVICE_SYSTEM_CUDA cudaError_t error = cudaGetLastError(); if(error){ RECORD_TEST_FAILURE(cudaGetErrorString(error)); } else { RECORD_TEST_SUCCESS(); } #else RECORD_TEST_SUCCESS(); #endif } // end try catch (std::bad_alloc) { RECORD_TEST_FAILURE("std::bad_alloc"); } catch (unittest::UnitTestException e) { RECORD_TEST_FAILURE(e); } END_TEST(); } thrust-1.9.5/performance/build/test_program_template.cxx000066400000000000000000000006541344621116200236110ustar00rootroot00000000000000#include /*********** BEGIN PREAMBLE SECTION ***********/ $PREAMBLE /************ END PREAMBLE SECTION ************/ /*********** BEGIN FUNCTIONS SECTION ***********/ $FUNCTIONS /************ END FUNCTIONS SECTION ************/ int main(void) { //TODO process basic arguments /*********** BEGIN FUNCTIONCALLS SECTION ***********/ $FUNCTIONCALLS /************ END FUNCTIONCALLS SECTION ************/ } thrust-1.9.5/performance/build/testsuite.py000066400000000000000000000047111344621116200210650ustar00rootroot00000000000000"""functions that generate reports and figures using the .xml output from the performance tests""" __all__ = ['TestSuite', 'parse_testsuite_xml'] class TestSuite: def __init__(self, name, platform, tests): self.name = name self.platform = platform self.tests = tests def __repr__(self): import pprint return 'TestSuite' + pprint.pformat( (self.name, self.platform, self.tests) ) class Test: def __init__(self, name, variables, results): self.name = name self.variables = variables self.results = results def __repr__(self): return 'Test' + repr( (self.name, self.variables, self.results) ) def scalar_element(element): value = element.get('value') try: return int(value) except: try: return float(value) except: return value def parse_testsuite_platform(et): testsuite_platform = {} platform_element = et.find('platform') device_element = platform_element.find('device') device = {} device['name'] = device_element.get('name') for property_element in device_element.findall('property'): device[property_element.get('name')] = scalar_element(property_element) testsuite_platform['device'] = device return testsuite_platform def parse_testsuite_tests(et): testsuite_tests = {} for test_element in et.findall('test'): # test name test_name = test_element.get('name') # test variables: name -> value test_variables = {} for variable_element in test_element.findall('variable'): test_variables[variable_element.get('name')] = scalar_element(variable_element) # test results: name -> (value, units) test_results = {} for result_element in test_element.findall('result'): # TODO make this a thing that can be converted to its first element when treated like a number test_results[result_element.get('name')] = scalar_element(result_element) testsuite_tests[test_name] = Test(test_name, test_variables, test_results) return testsuite_tests def parse_testsuite_xml(filename): import xml.etree.ElementTree as ET et = ET.parse(filename) testsuite_name = et.getroot().get('name') testsuite_platform = parse_testsuite_platform(et) testsuite_tests = parse_testsuite_tests(et) return TestSuite(testsuite_name, testsuite_platform, testsuite_tests) thrust-1.9.5/performance/build/timer.h000066400000000000000000000050271344621116200177540ustar00rootroot00000000000000/* * Copyright 2008-2009 NVIDIA Corporation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #pragma once // A simple timer class #ifdef __CUDACC__ // use CUDA's high-resolution timers when possible #include #include #include #include void cuda_safe_call(cudaError_t error, const std::string& message = "") { if(error) throw thrust::system_error(error, thrust::cuda_category(), message); } struct timer { cudaEvent_t start; cudaEvent_t end; timer(void) { cuda_safe_call(cudaEventCreate(&start)); cuda_safe_call(cudaEventCreate(&end)); restart(); } ~timer(void) { cuda_safe_call(cudaEventDestroy(start)); cuda_safe_call(cudaEventDestroy(end)); } void restart(void) { cuda_safe_call(cudaEventRecord(start, 0)); } double elapsed(void) { cuda_safe_call(cudaEventRecord(end, 0)); cuda_safe_call(cudaEventSynchronize(end)); float ms_elapsed; cuda_safe_call(cudaEventElapsedTime(&ms_elapsed, start, end)); return ms_elapsed / 1e3; } double epsilon(void) { return 0.5e-6; } }; #elif defined(__linux__) #include struct timer { timeval start; timeval end; timer(void) { restart(); } ~timer(void) { } void restart(void) { gettimeofday(&start, NULL); } double elapsed(void) { gettimeofday(&end, NULL); return static_cast(end.tv_sec - start.tv_sec) + 1e-6 * static_cast((int)end.tv_usec - (int)start.tv_usec); } double epsilon(void) { return 0.5e-6; } }; #else // fallback to clock() #include struct timer { clock_t start; clock_t end; timer(void) { restart(); } ~timer(void) { } void restart(void) { start = clock(); } double elapsed(void) { end = clock(); return static_cast(end - start) / static_cast(CLOCKS_PER_SEC); } double epsilon(void) { return 1.0 / static_cast(CLOCKS_PER_SEC); } }; #endif thrust-1.9.5/performance/comparison_sort_by_key.test000066400000000000000000000027671344621116200230600ustar00rootroot00000000000000PREAMBLE = \ """ #include #include template struct my_less { __host__ __device__ bool operator()(const T &x, const T& y) const { return x < y; } }; """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::host_vector<$ValueType> h_values($InputSize); thrust::device_vector<$ValueType> d_values($InputSize); thrust::sequence(h_values.begin(), h_values.end()); thrust::sequence(d_values.begin(), d_values.end()); thrust::device_vector<$KeyType> d_keys_copy = d_keys; // test sort thrust::stable_sort_by_key(h_keys.begin(), h_keys.end(), h_values.begin()); thrust::stable_sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin(), my_less<$KeyType>()); ASSERT_EQUAL(d_keys, h_keys); ASSERT_EQUAL(d_values, h_values); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::stable_sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin(), my_less<$KeyType>()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['char', 'short', 'int', 'long long', 'float', 'double'] ValueTypes = ['unsigned int'] InputSizes = StandardSizes TestVariables = [('KeyType', KeyTypes), ('ValueType', ValueTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/copy_if.test000066400000000000000000000026751344621116200177230ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include #include #include struct pred { __host__ __device__ bool operator()(int x) { return bool(x); } }; """ INITIALIZE = \ """ thrust::host_vector h_input($InputSize); thrust::sequence(h_input.begin(), h_input.end()); thrust::host_vector h_stencil = unittest::random_integers($InputSize); thrust::host_vector h_output($InputSize, -1); thrust::device_vector d_input = h_input; thrust::device_vector d_stencil = h_stencil; thrust::device_vector d_output = h_output; size_t h_count = thrust::copy_if(h_input.begin(), h_input.end(), h_stencil.begin(), h_output.begin(), pred()) - h_output.begin(); size_t d_count = thrust::copy_if(d_input.begin(), d_input.end(), d_stencil.begin(), d_output.begin(), pred()) - d_output.begin(); ASSERT_EQUAL(h_output, d_output); ASSERT_EQUAL(h_count, d_count); """ TIME = \ """ thrust::copy_if(d_input.begin(), d_input.end(), d_stencil.begin(), d_output.begin(), pred()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH((2*sizeof(int) + 2*sizeof(float)) * double($InputSize)); """ InputSizes = [2**N for N in range(20, 27)] TestVariables = [('InputSize', InputSizes)] thrust-1.9.5/performance/fill.test000066400000000000000000000013751344621116200172150ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input($InputSize); thrust::device_vector<$InputType> d_input($InputSize); thrust::fill(h_input.begin(), h_input.end(), $InputType(13)); thrust::fill(d_input.begin(), d_input.end(), $InputType(13)); ASSERT_EQUAL(h_input, d_input); """ TIME = \ """ thrust::fill(d_input.begin(), d_input.end(), $InputType(13)); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof($InputType) * double($InputSize)); """ InputTypes = SignedIntegerTypes InputSizes = StandardSizes TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/fill_optimization.test000066400000000000000000000023271344621116200220210ustar00rootroot00000000000000PREAMBLE = \ """ #include #include template struct constant_functor { T x; constant_functor(T x) : x(x) {} __host__ __device__ T operator()(void) const {return x;} }; template void generate_fill(Iterator first, Iterator last, T x) { thrust::generate(first, last, constant_functor(x)); } """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input($InputSize); thrust::device_vector<$InputType> d_input($InputSize); thrust::fill(h_input.begin(), h_input.end(), $InputType(13)); $Method(d_input.begin(), d_input.end(), $InputType(13)); ASSERT_EQUAL(h_input, d_input); """ TIME = \ """ $Method(d_input.begin(), d_input.end(), $InputType(13)); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof($InputType) * double($InputSize)); """ InputTypes = ['char', 'short', 'int', 'long'] InputSizes = [2**24] Methods = ['thrust::fill', 'generate_fill'] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes), ('Method', Methods)] thrust-1.9.5/performance/find.test000066400000000000000000000026261344621116200172070ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include template void find_partial(const Vector& v) { thrust::find(v.begin(), v.end(), 1); } template void find_full(const Vector& v) { thrust::max_element(v.begin(), v.end()); } template void reduce_full(const Vector& v) { thrust::max_element(v.begin(), v.end()); } """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input($InputSize, 0); thrust::device_vector<$InputType> d_input($InputSize, 0); size_t pos = $Fraction * $InputSize; if (pos < $InputSize) { h_input[pos] = 1; d_input[pos] = 1; } size_t h_index = thrust::find(h_input.begin(), h_input.end(), 1) - h_input.begin(); size_t d_index = thrust::find(d_input.begin(), d_input.end(), 1) - d_input.begin(); ASSERT_EQUAL(h_index, d_index); """ TIME = \ """ $Method(d_input); """ FINALIZE = \ """ RECORD_TIME(); RECORD_BANDWIDTH(sizeof($InputType) * double($InputSize)); """ InputTypes = ['int'] InputSizes = [2**23] Fractions = [0.01, 0.99] Methods = ['find_partial', 'find_full', 'reduce_full'] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes), ('Fraction', Fractions), ('Method', Methods)] thrust-1.9.5/performance/float3_optimization.test000066400000000000000000000057371344621116200222730ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include template struct rotate_tuple { template __host__ __device__ thrust::tuple operator()(const Tuple& t) const { T x = thrust::get<0>(t); T y = thrust::get<1>(t); T z = thrust::get<2>(t); T rx = 0.36f*x + 0.48f*y + -0.80f*z; T ry =-0.80f*x + 0.60f*y + 0.00f*z; T rz = 0.48f*x + 0.64f*y + 0.60f*z; return thrust::make_tuple(rx, ry, rz); } }; struct rotate_float3 { __host__ __device__ float3 operator()(const float3& t) const { float x = t.x; float y = t.y; float z = t.z; float3 rt; rt.x = 0.36f*x + 0.48f*y + -0.80f*z; rt.y =-0.80f*x + 0.60f*y + 0.00f*z; rt.z = 0.48f*x + 0.64f*y + 0.60f*z; return rt; } }; template void rotate_fast(Vector& x, Vector& y, Vector& z, Vector3& v) { typedef typename Vector::value_type T; size_t N = x.size(); thrust::transform(thrust::make_zip_iterator(thrust::make_tuple(x.begin(), y.begin(), z.begin())), thrust::make_zip_iterator(thrust::make_tuple(x.begin(), y.begin(), z.begin())) + N, thrust::make_zip_iterator(thrust::make_tuple(x.begin(), y.begin(), z.begin())), rotate_tuple()); } template void rotate_slow(Vector& x, Vector& y, Vector& z, Vector3& v) { thrust::transform(v.begin(), v.end(), v.begin(), rotate_float3()); } """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_x = unittest::random_samples<$InputType>($InputSize); thrust::host_vector<$InputType> h_y = unittest::random_samples<$InputType>($InputSize); thrust::host_vector<$InputType> h_z = unittest::random_samples<$InputType>($InputSize); thrust::device_vector<$InputType> d_x = h_x; thrust::device_vector<$InputType> d_y = h_y; thrust::device_vector<$InputType> d_z = h_z; thrust::host_vector h_v($InputSize, make_float3(1.0,0.4,0.2)); thrust::device_vector d_v = h_v; $Method(h_x, h_y, h_z, h_v); $Method(d_x, d_y, d_z, d_v); ASSERT_ALMOST_EQUAL(h_x, d_x); ASSERT_ALMOST_EQUAL(h_y, d_y); ASSERT_ALMOST_EQUAL(h_z, d_z); """ TIME = \ """ $Method(d_x, d_y, d_z, d_v); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(2*9*double($InputSize)); RECORD_BANDWIDTH(2*3*sizeof($InputType) * double($InputSize)); """ InputTypes = ['float'] InputSizes = [2**24] Methods = ['rotate_fast','rotate_slow'] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes), ('Method', Methods)] thrust-1.9.5/performance/gather.test000066400000000000000000000024541344621116200175400ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input = unittest::random_integers<$InputType>($InputSize); thrust::host_vector h_map(thrust::make_counting_iterator(0), thrust::make_counting_iterator($InputSize)); std::random_shuffle(h_map.begin(), h_map.end()); thrust::host_vector<$InputType> h_result($InputSize); thrust::device_vector<$InputType> d_input = h_input; thrust::device_vector d_map = h_map; thrust::device_vector<$InputType> d_result($InputSize); thrust::gather(h_map.begin(), h_map.end(), h_input.begin(), h_result.begin()); thrust::gather(d_map.begin(), d_map.end(), d_input.begin(), d_result.begin()); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::gather(d_map.begin(), d_map.end(), d_input.begin(), d_result.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof($InputType) * double($InputSize)); """ InputTypes = SignedIntegerTypes InputSizes = StandardSizes TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/host_sort.test000066400000000000000000000015221344621116200203050ustar00rootroot00000000000000PREAMBLE = \ """ #include #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::host_vector<$KeyType> h_keys_copy(h_keys); // test sort $Sort(h_keys.begin(), h_keys.end()); ASSERT_EQUAL(thrust::is_sorted(h_keys.begin(), h_keys.end()), true); """ TIME = \ """ thrust::copy(h_keys_copy.begin(), h_keys_copy.end(), h_keys.begin()); $Sort(h_keys.begin(), h_keys.end()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['int'] InputSizes = [2**20] Sorts = ['thrust::sort', 'thrust::stable_sort', 'std::sort', 'std::stable_sort'] TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes), ('Sort', Sorts)] thrust-1.9.5/performance/host_sort_by_key.test000066400000000000000000000016021344621116200216460ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::host_vector<$KeyType> h_keys_copy(h_keys); thrust::host_vector<$KeyType> h_values($InputSize); // test sort $Sort(h_keys.begin(), h_keys.end(), h_values.begin()); ASSERT_EQUAL(thrust::is_sorted(h_keys.begin(), h_keys.end()), true); """ TIME = \ """ thrust::copy(h_keys_copy.begin(), h_keys_copy.end(), h_keys.begin()); $Sort(h_keys.begin(), h_keys.end(), h_values.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['int'] InputSizes = [2**20] Sorts = ['thrust::sort_by_key', 'thrust::stable_sort_by_key'] TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes), ('Sort', Sorts)] thrust-1.9.5/performance/inclusive_scan.test000066400000000000000000000017101344621116200212650ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input = unittest::random_integers<$InputType>($InputSize); thrust::device_vector<$InputType> d_input = h_input; thrust::host_vector<$InputType> h_output($InputSize); thrust::device_vector<$InputType> d_output($InputSize); thrust::inclusive_scan(h_input.begin(), h_input.end(), h_output.begin()); thrust::inclusive_scan(d_input.begin(), d_input.end(), d_output.begin()); ASSERT_EQUAL(h_output, d_output); """ TIME = \ """ thrust::inclusive_scan(d_input.begin(), d_input.end(), d_output.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(4*sizeof($InputType)*double($InputSize)); """ InputTypes = SignedIntegerTypes InputSizes = [2**24] #StandardSizes TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/inclusive_scan_by_key.test000066400000000000000000000031411344621116200226270ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$ValueType> h_values = unittest::random_integers<$ValueType>($InputSize); thrust::device_vector<$ValueType> d_values = h_values; thrust::host_vector<$ValueType> h_output($InputSize); thrust::device_vector<$ValueType> d_output($InputSize); srand(13); thrust::host_vector<$KeyType> h_keys($InputSize); for(size_t i = 0, k = 0; i < $InputSize; i++) { h_keys[i] = k; if (rand() % 50 == 0) k++; } thrust::device_vector<$KeyType> d_keys = h_keys; thrust::inclusive_scan_by_key(h_keys.begin(), h_keys.end(), h_values.begin(), h_output.begin()); thrust::inclusive_scan_by_key(d_keys.begin(), d_keys.end(), d_values.begin(), d_output.begin()); ASSERT_EQUAL(h_output, d_output); """ TIME = \ """ thrust::inclusive_scan_by_key(d_keys.begin(), d_keys.end(), d_values.begin(), d_output.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(4*(sizeof($KeyType) + sizeof($ValueType))*double($InputSize)); """ KeyTypes = ['int'] #SignedIntegerTypes ValueTypes = SignedIntegerTypes InputSizes = [2**24] #StandardSizes TestVariables = [('KeyType', KeyTypes), ('ValueType', ValueTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/indirect_sort.test000066400000000000000000000050211344621116200211270ustar00rootroot00000000000000PREAMBLE = \ """ #include template struct indirect_comp { RandomAccessIterator first; StrictWeakOrdering comp; indirect_comp(RandomAccessIterator first, StrictWeakOrdering comp) : first(first), comp(comp) {} template __host__ __device__ bool operator()(IndexType a, IndexType b) { return comp(thrust::raw_reference_cast(first[a]), thrust::raw_reference_cast(first[b])); } }; template void indirect_sort(RandomAccessIterator first, RandomAccessIterator last, StrictWeakOrdering comp) { typedef typename thrust::iterator_traits::value_type T; // todo initialize vector in one step thrust::device_vector permutation(last - first); thrust::sequence(permutation.begin(), permutation.end()); thrust::stable_sort(permutation.begin(), permutation.end(), indirect_comp(first, comp)); thrust::device_vector temp(first, last); thrust::gather(permutation.begin(), permutation.end(), temp.begin(), first); } """ INITIALIZE = \ """ typedef FixedVector KeyType; const size_t N = $InputSize / sizeof(KeyType); thrust::host_vector h_keys(N); for(size_t i = 0; i < h_keys.size(); i++) h_keys[i] = KeyType(rand()); thrust::device_vector d_keys = h_keys; thrust::device_vector d_keys_copy = d_keys; thrust::less comp; // test sort thrust::stable_sort(h_keys.begin(), h_keys.end()); $Sort(d_keys.begin(), d_keys.end(), comp); ASSERT_EQUAL_QUIET(h_keys, d_keys); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); $Sort(d_keys.begin(), d_keys.end(), comp); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ VectorLengths = [2**N for N in range(1,14)] Sorts = ['indirect_sort'] #VectorLengths = range(1,9) #Sorts = ['indirect_sort', 'thrust::stable_sort'] InputSizes = [2**24] TestVariables = [('VectorLength', VectorLengths), ('Sort', Sorts), ('InputSize', InputSizes)] thrust-1.9.5/performance/inner_product.test000066400000000000000000000021221344621116200211310ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input1 = unittest::random_integers<$InputType>($InputSize); thrust::host_vector<$InputType> h_input2 = unittest::random_integers<$InputType>($InputSize); thrust::device_vector<$InputType> d_input1 = h_input1; thrust::device_vector<$InputType> d_input2 = h_input2; $InputType init = 13; $InputType h_result = thrust::inner_product(h_input1.begin(), h_input1.end(), h_input2.begin(), init); $InputType d_result = thrust::inner_product(d_input1.begin(), d_input1.end(), d_input2.begin(), init); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::inner_product(d_input1.begin(), d_input1.end(), d_input2.begin(), init); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(2 * double($InputSize)); RECORD_BANDWIDTH(2 * sizeof($InputType) * double($InputSize)); """ InputTypes = SignedIntegerTypes InputSizes = StandardSizes TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/merge.test000066400000000000000000000024001344621116200173540ustar00rootroot00000000000000PREAMBLE = \ """ #include #include """ INITIALIZE = \ """ thrust::device_vector<$InputType> d_a = unittest::random_integers<$InputType>($InputSize); thrust::device_vector<$InputType> d_b = unittest::random_integers<$InputType>($InputSize); thrust::sort(d_a.begin(), d_a.end()); thrust::sort(d_b.begin(), d_b.end()); thrust::device_vector<$InputType> d_sorted; d_sorted.insert(d_sorted.end(), d_a.begin(), d_a.end()); d_sorted.insert(d_sorted.end(), d_b.begin(), d_b.end()); thrust::stable_sort(d_sorted.begin(), d_sorted.end()); thrust::device_vector<$InputType> d_result(d_a.size() + d_b.size()); thrust::merge(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); ASSERT_EQUAL(d_sorted, d_result); """ TIME = \ """ thrust::merge(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_BANDWIDTH(4 * sizeof($InputType) * double($InputSize)); RECORD_SORTING_RATE(2 * double($InputSize)) """ InputTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(10, 25)] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/merge_sort.test000066400000000000000000000021241344621116200204260ustar00rootroot00000000000000PREAMBLE = \ """ #include template struct my_less { __host__ __device__ bool operator()(const T &x, const T &y) const { return x < y; } }; """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::device_vector<$KeyType> d_keys_copy = d_keys; // test sort thrust::stable_sort(h_keys.begin(), h_keys.end()); thrust::stable_sort(d_keys.begin(), d_keys.end(), my_less<$KeyType>()); ASSERT_EQUAL(d_keys, h_keys); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::stable_sort(d_keys.begin(), d_keys.end(), my_less<$KeyType>()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(18, 25)] TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/min_index.test000066400000000000000000000040161344621116200202340ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include #include using namespace thrust; struct smaller_tuple { __host__ __device__ tuple operator()(tuple a, tuple b) { if (a < b) return a; else return b; } }; int min_index_slow(device_vector& values) { device_vector indices(values.size()); sequence(indices.begin(), indices.end()); tuple init(values[0],0); tuple smallest = reduce(make_zip_iterator(make_tuple(values.begin(), indices.begin())), make_zip_iterator(make_tuple(values.end(), indices.end())), init, smaller_tuple()); return get<1>(smallest); } int min_index_fast(device_vector& values) { counting_iterator begin(0); counting_iterator end(values.size()); tuple init(values[0],0); tuple smallest = reduce(make_zip_iterator(make_tuple(values.begin(), begin)), make_zip_iterator(make_tuple(values.end(), end)), init, smaller_tuple()); return get<1>(smallest); } """ INITIALIZE = \ """ thrust::host_vector h_input = unittest::random_integers($InputSize); thrust::device_vector d_input = h_input; """ TIME = \ """ $Function(d_input); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof(float) * double($InputSize)); """ Functions = ['min_index_slow','min_index_fast'] InputSizes = [2**22] TestVariables = [('Function',Functions), ('InputSize', InputSizes)] thrust-1.9.5/performance/nrm2.test000066400000000000000000000033371344621116200171450ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include #include #include template struct square { __host__ __device__ T operator()(T x) const { return x * x; } }; template typename Vector::value_type nrm2_fast(const Vector& x) { typedef typename Vector::value_type T; return std::sqrt( thrust::transform_reduce(x.begin(), x.end(), square(), T(0), thrust::plus()) ); } template typename Vector::value_type nrm2_slow(const Vector& x) { typedef typename Vector::value_type T; Vector temp(x.size()); // temp <- x * x thrust::transform(x.begin(), x.end(), temp.begin(), square()); return std::sqrt( thrust::reduce(temp.begin(), temp.end()) ); } """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input = unittest::random_integers($InputSize); thrust::device_vector<$InputType> d_input = h_input; $InputType h_result = $Method(h_input); $InputType d_result = $Method(d_input); ASSERT_EQUAL(std::abs(h_result - d_result) / std::abs(h_result + d_result) < 1e-3, true); """ TIME = \ """ $Method(d_input); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof($InputType) * double($InputSize)); """ InputTypes = ['float', 'double'] InputSizes = [2**24] Methods = ['nrm2_fast', 'nrm2_slow'] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes), ('Method', Methods)] thrust-1.9.5/performance/radix_sort.test000066400000000000000000000015721344621116200204440ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::device_vector<$KeyType> d_keys_copy = d_keys; // test sort thrust::stable_sort(h_keys.begin(), h_keys.end()); thrust::stable_sort(d_keys.begin(), d_keys.end()); ASSERT_EQUAL(d_keys, h_keys); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::stable_sort(d_keys.begin(), d_keys.end()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(18, 25)] TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/radix_sort_bits.test000066400000000000000000000017201344621116200214600ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ const size_t InputSize = 1 << 24; thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>(InputSize); // set upper bits to zero for(size_t i = 0; i < InputSize; i++) h_keys[i] >>= (32 - $KeyBits); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::device_vector<$KeyType> d_keys_copy = d_keys; // test sort thrust::stable_sort(h_keys.begin(), h_keys.end()); thrust::stable_sort(d_keys.begin(), d_keys.end()); ASSERT_EQUAL(d_keys, h_keys); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::stable_sort(d_keys.begin(), d_keys.end()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double(InputSize)); """ KeyTypes = ['unsigned int'] KeyBits = range(1, 33) TestVariables = [('KeyType', KeyTypes), ('KeyBits',KeyBits)] thrust-1.9.5/performance/radix_sort_by_key.test000066400000000000000000000024371344621116200220070ustar00rootroot00000000000000PREAMBLE = \ """ #include #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::host_vector<$ValueType> h_values($InputSize); thrust::device_vector<$ValueType> d_values($InputSize); thrust::sequence(h_values.begin(), h_values.end()); thrust::sequence(d_values.begin(), d_values.end()); thrust::device_vector<$KeyType> d_keys_copy = d_keys; // test sort thrust::stable_sort_by_key(h_keys.begin(), h_keys.end(), h_values.begin()); thrust::stable_sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin()); ASSERT_EQUAL(d_keys, h_keys); ASSERT_EQUAL(d_values, h_values); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::stable_sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['char', 'short', 'int', 'long long', 'float', 'double'] ValueTypes = ['unsigned int'] InputSizes = StandardSizes TestVariables = [('KeyType', KeyTypes), ('ValueType', ValueTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/reduce.test000066400000000000000000000015221344621116200175300ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input = unittest::random_integers<$InputType>($InputSize); thrust::device_vector<$InputType> d_input = h_input; $InputType init = 13; $InputType h_result = thrust::reduce(h_input.begin(), h_input.end(), init); $InputType d_result = thrust::reduce(d_input.begin(), d_input.end(), init); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::reduce(d_input.begin(), d_input.end(), init); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof($InputType) * double($InputSize)); """ InputTypes = SignedIntegerTypes InputSizes = StandardSizes TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/reduce_by_key.test000066400000000000000000000041371344621116200210770ustar00rootroot00000000000000PREAMBLE = \ """ #include #include """ INITIALIZE = \ """ thrust::host_vector<$ValueType> h_values = unittest::random_integers<$ValueType>($InputSize); thrust::device_vector<$ValueType> d_values = h_values; thrust::host_vector<$KeyType> h_keys_result($InputSize); thrust::host_vector<$ValueType> h_values_result($InputSize); thrust::device_vector<$KeyType> d_keys_result($InputSize); thrust::device_vector<$ValueType> d_values_result($InputSize); thrust::default_random_engine rng(13); thrust::host_vector<$KeyType> h_keys($InputSize); for(size_t i = 0, k = 0; i < $InputSize; i++) { h_keys[i] = k; if(rng() % 50 == 0) k++; } thrust::device_vector<$KeyType> d_keys = h_keys; thrust::pair< thrust::host_vector<$KeyType>::iterator, thrust::host_vector<$ValueType>::iterator > h_end = thrust::reduce_by_key(h_keys.begin(), h_keys.end(), h_values.begin(), h_keys_result.begin(), h_values_result.begin()); h_keys_result.erase(h_end.first, h_keys_result.end()); thrust::pair< thrust::device_vector<$KeyType>::iterator, thrust::device_vector<$ValueType>::iterator > d_end = thrust::reduce_by_key(d_keys.begin(), d_keys.end(), d_values.begin(), d_keys_result.begin(), d_values_result.begin()); d_keys_result.erase(d_end.first, d_keys_result.end()); ASSERT_EQUAL(h_keys_result, d_keys_result); ASSERT_EQUAL(h_values_result, d_values_result); """ TIME = \ """ thrust::reduce_by_key(d_keys.begin(), d_keys.end(), d_values.begin(), d_keys_result.begin(), d_values_result.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof($KeyType) * double(d_keys.size() + d_keys_result.size()) + sizeof($ValueType) * double(d_values.size() + d_values_result.size())); """ KeyTypes = ['int'] #SignedIntegerTypes ValueTypes = SignedIntegerTypes InputSizes = [2**24] #StandardSizes TestVariables = [('KeyType', KeyTypes), ('ValueType', ValueTypes),('InputSize', InputSizes)] thrust-1.9.5/performance/reduce_float.test000066400000000000000000000012341344621116200207150ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input = unittest::random_samples<$InputType>($InputSize); thrust::device_vector<$InputType> d_input = h_input; $InputType init = 13; """ TIME = \ """ thrust::reduce(d_input.begin(), d_input.end(), init); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); RECORD_BANDWIDTH(sizeof($InputType) * double($InputSize)); """ InputTypes = ['float'] InputSizes = [int(2**(k/2.0)) for k in range(42,56)] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/report.py000066400000000000000000000032751344621116200172540ustar00rootroot00000000000000from build import plot_results, print_results #valid formats are png, pdf, ps, eps and svg #if format=None the plot will be displayed format = 'png' output = print_results #output = plot_results for function in ['fill', 'reduce', 'inner_product', 'gather', 'merge']: output(function + '.xml', 'InputType', 'InputSize', 'Bandwidth', format=format) for function in ['inclusive_scan', 'inclusive_segmented_scan', 'unique']: output(function + '.xml', 'InputType', 'InputSize', 'Throughput', format=format) for method in ['indirect_sort']: output(method + '.xml', 'Sort', 'VectorLength', 'Time', plot='semilogx', title='Indirect Sorting', format=format) for method in ['sort', 'comparison_sort', 'radix_sort']: output(method + '.xml', 'KeyType', 'InputSize', 'Sorting', title='thrust::' + method, format=format) output(method + '_by_key.xml', 'KeyType', 'InputSize', 'Sorting', title='thrust::' + method + '_by_key', format=format) for method in ['set_difference', 'set_intersection', 'set_symmetric_difference', 'set_union']: output(method + '.xml', 'InputType', 'InputSize', 'Sorting', title='thrust::' + method, format=format) output('stl_sort.xml', 'KeyType', 'InputSize', 'Sorting', title='std::sort', format=format) for method in ['radix_sort']: output(method + '_bits.xml', 'KeyType', 'KeyBits', 'Sorting', title='thrust::' + method, plot='plot', dpi=72, format=format) for format in ['png', 'pdf']: output('reduce_float.xml', 'InputType', 'InputSize', 'Bandwidth', dpi=120, plot='semilogx', title='thrust::reduce()', format=format) output('sort_large.xml', 'KeyType', 'InputSize', 'Sorting', dpi=120, plot='semilogx', title='thrust::sort()', format=format) thrust-1.9.5/performance/set_difference.test000066400000000000000000000026301344621116200212270ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_a = unittest::random_integers<$InputType>($InputSize); thrust::host_vector<$InputType> h_b = unittest::random_integers<$InputType>($InputSize); thrust::sort(h_a.begin(), h_a.end()); thrust::sort(h_b.begin(), h_b.end()); thrust::host_vector<$InputType> h_result(h_a.size()); thrust::host_vector<$InputType>::iterator new_end = thrust::set_difference(h_a.begin(), h_a.end(), h_b.begin(), h_b.end(), h_result.begin()); h_result.resize(new_end - h_result.begin()); thrust::device_vector<$InputType> d_a = h_a, d_b = h_b; thrust::device_vector<$InputType> d_result(h_result.size()); thrust::set_difference(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::set_difference(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_BANDWIDTH((2 * double($InputSize) + d_result.size()) * sizeof($InputType)); RECORD_SORTING_RATE(2 * double($InputSize)) """ InputTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(10, 25)] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/set_intersection.test000066400000000000000000000026361344621116200216510ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_a = unittest::random_integers<$InputType>($InputSize); thrust::host_vector<$InputType> h_b = unittest::random_integers<$InputType>($InputSize); thrust::sort(h_a.begin(), h_a.end()); thrust::sort(h_b.begin(), h_b.end()); thrust::host_vector<$InputType> h_result(h_a.size()); thrust::host_vector<$InputType>::iterator new_end = thrust::set_intersection(h_a.begin(), h_a.end(), h_b.begin(), h_b.end(), h_result.begin()); h_result.resize(new_end - h_result.begin()); thrust::device_vector<$InputType> d_a = h_a, d_b = h_b; thrust::device_vector<$InputType> d_result(h_result.size()); thrust::set_intersection(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::set_intersection(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_BANDWIDTH((2 * double($InputSize) + d_result.size()) * sizeof($InputType)); RECORD_SORTING_RATE(2 * double($InputSize)) """ InputTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(10, 25)] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/set_symmetric_difference.test000066400000000000000000000026661344621116200233340ustar00rootroot00000000000000PREAMBLE = \ """ #include #include #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_a = unittest::random_integers<$InputType>($InputSize); thrust::host_vector<$InputType> h_b = unittest::random_integers<$InputType>($InputSize); thrust::sort(h_a.begin(), h_a.end()); thrust::sort(h_b.begin(), h_b.end()); thrust::host_vector<$InputType> h_result(h_a.size()); thrust::host_vector<$InputType>::iterator new_end = thrust::set_symmetric_difference(h_a.begin(), h_a.end(), h_b.begin(), h_b.end(), h_result.begin()); h_result.resize(new_end - h_result.begin()); thrust::device_vector<$InputType> d_a = h_a, d_b = h_b; thrust::device_vector<$InputType> d_result(h_result.size()); thrust::set_symmetric_difference(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::set_symmetric_difference(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_BANDWIDTH((2 * double($InputSize) + d_result.size()) * sizeof($InputType)); RECORD_SORTING_RATE(2 * double($InputSize)) """ InputTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(10, 25)] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/set_union.test000066400000000000000000000030021344621116200202570ustar00rootroot00000000000000PREAMBLE = \ """ #include #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_a = unittest::random_integers<$InputType>($InputSize); thrust::host_vector<$InputType> h_b = unittest::random_integers<$InputType>($InputSize); thrust::sort(h_a.begin(), h_a.end()); thrust::sort(h_b.begin(), h_b.end()); thrust::host_vector<$InputType> h_result(h_a.size() + h_b.size()); thrust::host_vector<$InputType>::iterator h_new_end = thrust::set_union(h_a.begin(), h_a.end(), h_b.begin(), h_b.end(), h_result.begin()); h_result.resize(h_new_end - h_result.begin()); thrust::device_vector<$InputType> d_a = h_a, d_b = h_b; thrust::device_vector<$InputType> d_result(d_a.size() + d_b.size()); thrust::device_vector<$InputType>::iterator d_new_end = thrust::set_union(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); d_result.resize(d_new_end - d_result.begin()); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::set_union(d_a.begin(), d_a.end(), d_b.begin(), d_b.end(), d_result.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_BANDWIDTH(sizeof($InputType) * double(d_a.size() + d_b.size() + d_result.size())); RECORD_SORTING_RATE(2 * double($InputSize)) """ InputTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(10, 25)] TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/sort.test000066400000000000000000000014621344621116200172530ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::device_vector<$KeyType> d_keys_copy = d_keys; // test sort thrust::sort(h_keys.begin(), h_keys.end()); thrust::sort(d_keys.begin(), d_keys.end()); ASSERT_EQUAL(d_keys, h_keys); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::sort(d_keys.begin(), d_keys.end()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = SignedIntegerTypes InputSizes = StandardSizes TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/sort_by_key.test000066400000000000000000000024121344621116200206110ustar00rootroot00000000000000PREAMBLE = \ """ #include #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::host_vector<$ValueType> h_values($InputSize); thrust::device_vector<$ValueType> d_values($InputSize); thrust::sequence(h_values.begin(), h_values.end()); thrust::sequence(d_values.begin(), d_values.end()); thrust::device_vector<$KeyType> d_keys_copy = d_keys; // test sort thrust::sort_by_key(h_keys.begin(), h_keys.end(), h_values.begin()); thrust::sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin()); ASSERT_EQUAL(d_keys, h_keys); ASSERT_EQUAL(d_values, h_values); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['char', 'short', 'int', 'long long', 'float', 'double'] ValueTypes = ['unsigned int'] InputSizes = StandardSizes TestVariables = [('KeyType', KeyTypes), ('ValueType', ValueTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/sort_large.test000066400000000000000000000021101344621116200204140ustar00rootroot00000000000000PREAMBLE = \ """ #include template struct my_less : public thrust::binary_function { __host__ __device__ bool operator()(const T& a, const T& b) const { return a < b; } }; """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::device_vector<$KeyType> d_keys = h_keys; thrust::device_vector<$KeyType> d_keys_copy = d_keys; typedef my_less<$KeyType> Comp; // test sort thrust::sort(h_keys.begin(), h_keys.end(), Comp()); thrust::sort(d_keys.begin(), d_keys.end(), Comp()); ASSERT_EQUAL(d_keys, h_keys); """ TIME = \ """ thrust::copy(d_keys_copy.begin(), d_keys_copy.end(), d_keys.begin()); thrust::sort(d_keys.begin(), d_keys.end(), Comp()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['int'] InputSizes = [2**24] TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/stl_sort.test000066400000000000000000000012071344621116200201320ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$KeyType> h_keys = unittest::random_integers<$KeyType>($InputSize); thrust::host_vector<$KeyType> h_keys_copy = h_keys; """ TIME = \ """ std::copy(h_keys_copy.begin(), h_keys_copy.end(), h_keys.begin()); std::sort(h_keys.begin(), h_keys.end()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_SORTING_RATE(double($InputSize)); """ KeyTypes = ['char', 'short', 'int', 'long', 'float', 'double'] InputSizes = [2**N for N in range(10, 25)] TestVariables = [('KeyType', KeyTypes), ('InputSize', InputSizes)] thrust-1.9.5/performance/unique.test000066400000000000000000000022641344621116200175730ustar00rootroot00000000000000PREAMBLE = \ """ #include """ INITIALIZE = \ """ thrust::host_vector<$InputType> h_input = unittest::random_integers<$InputType>($InputSize); // increase likelihood of equal consecutive elements for(size_t i = 0; i < $InputSize; i++) h_input[i] %= 4; thrust::device_vector<$InputType> d_input = h_input; thrust::device_vector<$InputType> d_copy = d_input; thrust::host_vector<$InputType>::iterator h_end = thrust::unique(h_input.begin(), h_input.end()); thrust::device_vector<$InputType>::iterator d_end = thrust::unique(d_input.begin(), d_input.end()); thrust::host_vector<$InputType> h_result(h_input.begin(), h_end); thrust::device_vector<$InputType> d_result(d_input.begin(), d_end); ASSERT_EQUAL(h_result, d_result); """ TIME = \ """ thrust::copy(d_copy.begin(), d_copy.end(), d_input.begin()); thrust::unique(d_input.begin(), d_input.end()); """ FINALIZE = \ """ RECORD_TIME(); RECORD_THROUGHPUT(double($InputSize)); """ InputTypes = SignedIntegerTypes InputSizes = StandardSizes TestVariables = [('InputType', InputTypes), ('InputSize', InputSizes)] thrust-1.9.5/site_scons/000077500000000000000000000000001344621116200152305ustar00rootroot00000000000000thrust-1.9.5/site_scons/site_tools/000077500000000000000000000000001344621116200174145ustar00rootroot00000000000000thrust-1.9.5/site_scons/site_tools/clang.py000066400000000000000000000105051344621116200210530ustar00rootroot00000000000000"""SCons.Tool.clang Tool-specific initialization for Clang as CUDA Compiler. There normally shouldn't be any need to import this module directly. It will usually be imported through the generic SCons.Tool.Tool() selection method. """ import SCons.Tool import SCons.Scanner.C import SCons.Defaults import os import platform def get_cuda_paths(env): """Determines CUDA {bin,lib,include} paths returns (cuda_path,bin_path,lib_path,inc_path) """ cuda_path = env['cuda_path'] # determine defaults if os.name == 'posix': bin_path = cuda_path + '/bin' lib_path = cuda_path + '/lib' inc_path = cuda_path + '/include' else: raise ValueError, 'Error: unknown OS. Where is CUDA installed?' if platform.machine()[-2:] == '64': lib_path += '64' # override with environment variables if 'CUDA_BIN_PATH' in os.environ: bin_path = os.path.abspath(os.environ['CUDA_BIN_PATH']) if 'CUDA_LIB_PATH' in os.environ: lib_path = os.path.abspath(os.environ['CUDA_LIB_PATH']) if 'CUDA_INC_PATH' in os.environ: inc_path = os.path.abspath(os.environ['CUDA_INC_PATH']) return (cuda_path,bin_path,lib_path,inc_path) CUDASuffixes = ['.cu'] # make a CUDAScanner for finding #includes # cuda uses the c preprocessor, so we can use the CScanner CUDAScanner = SCons.Scanner.C.CScanner() def add_common_clang_variables(env): """ Add underlying common clang variables that are used by multiple builders. """ # "CLANG common command line" if not env.has_key('_CLANGCOMCOM'): # clang needs '-I' prepended before each include path, regardless of platform env['_CLANG_CPPPATH'] = '${_concat("-I ", CPPPATH, "", __env__)}' env['_CLANG_CFLAGS'] = '${_concat("", CFLAGS, "", __env__)}' env['_CLANG_SHCFLAGS'] = '${_concat("", SHCFLAGS, "", __env__)}' env['_CLANG_CCFLAGS'] = '${_concat("", CCFLAGS, "", __env__)}' env['_CLANG_SHCCFLAGS'] = '${_concat("", SHCCFLAGS, "", __env__)}' env['_CLANG_CPPFLAGS'] = '${_concat("", CPPFLAGS, "", __env__)}' # assemble the common command line env['_CLANGCOMCOM'] = '$_CLANG_CPPFLAGS $_CPPDEFFLAGS $_CLANG_CPPPATH' def generate(env): """ Add Builders and construction variables for CUDA compilers to an Environment. """ # create a builder that makes PTX files from .cu files ptx_builder = SCons.Builder.Builder(action = '$CLANG -S --cuda-path=$cuda_path --cuda-device-only $CLANGFLAGS $_CLANG_CFLAGS $_CLANG_CCFLAGS $_CLANGCOMCOM $SOURCES -o $TARGET', emitter = {}, suffix = '.ptx', src_suffix = CUDASuffixes) env['BUILDERS']['PTXFile'] = ptx_builder # create builders that make static & shared objects from .cu files static_obj, shared_obj = SCons.Tool.createObjBuilders(env) for suffix in CUDASuffixes: # Add this suffix to the list of things buildable by Object static_obj.add_action('$CUDAFILESUFFIX', '$CLANGCOM') shared_obj.add_action('$CUDAFILESUFFIX', '$SHCLANGCOM') static_obj.add_emitter(suffix, SCons.Defaults.StaticObjectEmitter) shared_obj.add_emitter(suffix, SCons.Defaults.SharedObjectEmitter) # Add this suffix to the list of things scannable SCons.Tool.SourceFileScanner.add_scanner(suffix, CUDAScanner) add_common_clang_variables(env) (cuda_path, bin_path,lib_path,inc_path) = get_cuda_paths(env) # set the "CUDA Compiler Command" environment variable # windows is picky about getting the full filename of the executable env['CLANG'] = 'clang++' env['SHCLANG'] = 'clang++' # set the include path, and pass both c compiler flags and c++ compiler flags env['CLANGFLAGS'] = SCons.Util.CLVar('') env['SHCLANGFLAGS'] = SCons.Util.CLVar('') + ' -shared' # 'CLANG Command' env['CLANGCOM'] = '$CLANG -o $TARGET --cuda-path=$cuda_path -c $CLANGFLAGS $_CLANG_CFLAGS $_CLANG_CCFLAGS $_CLANGCOMCOM $SOURCES' env['SHCLANGCOM'] = '$SHCLANG -o $TARGET --cuda-path=$cuda_path -c $SHCLANGFLAGS $_CLANG_SHCFLAGS $_CLANG_SHCCFLAGS $_CLANGCOMCOM $SOURCES' # the suffix of CUDA source files is '.cu' env['CUDAFILESUFFIX'] = '.cu' env.PrependENVPath('PATH', bin_path) if 'CLANG_PATH' in os.environ: env.PrependENVPath('PATH', os.path.abspath(os.environ['CLANG_PATH'])) def exists(env): return env.Detect('clang++') thrust-1.9.5/site_scons/site_tools/nvcc.py000066400000000000000000000143641344621116200207270ustar00rootroot00000000000000"""SCons.Tool.nvcc Tool-specific initialization for NVIDIA CUDA Compiler. There normally shouldn't be any need to import this module directly. It will usually be imported through the generic SCons.Tool.Tool() selection method. """ import SCons.Tool import SCons.Scanner.C import SCons.Defaults import os import platform def get_cuda_paths(env): """Determines CUDA {bin,lib,include} paths returns (bin_path,lib_path,inc_path) """ cuda_path = env['cuda_path'] bin_path = cuda_path + '/bin' lib_path = cuda_path + '/lib' inc_path = cuda_path + '/include' # fix up the name of the lib directory on 64b platforms if platform.machine()[-2:] == '64': if os.name == 'posix' and platform.system() != 'Darwin': lib_path += '64' elif os.name == 'nt': lib_path += '/x64' # override with environment variables if 'CUDA_BIN_PATH' in os.environ: bin_path = os.path.abspath(os.environ['CUDA_BIN_PATH']) if 'CUDA_LIB_PATH' in os.environ: lib_path = os.path.abspath(os.environ['CUDA_LIB_PATH']) if 'CUDA_INC_PATH' in os.environ: inc_path = os.path.abspath(os.environ['CUDA_INC_PATH']) return (bin_path,lib_path,inc_path) CUDASuffixes = ['.cu'] # make a CUDAScanner for finding #includes # cuda uses the c preprocessor, so we can use the CScanner CUDAScanner = SCons.Scanner.C.CScanner() def add_common_nvcc_variables(env): """ Add underlying common "NVIDIA CUDA compiler" variables that are used by multiple builders. """ # "NVCC common command line" if not env.has_key('_NVCCCOMCOM'): # nvcc needs '-I' prepended before each include path, regardless of platform env['_NVCC_CPPPATH'] = '${_concat("-I ", CPPPATH, "", __env__)}' # prepend -Xcompiler before each flag which needs it; some do not disallowed_flags = ['-std=c++03'] need_no_prefix = ['-std=c++03', '-std=c++11'] def flags_which_need_no_prefix(flags): # first filter out flags which nvcc doesn't allow flags = [flag for flag in flags if flag not in disallowed_flags] result = [flag for flag in flags if flag in need_no_prefix] return result def flags_which_need_prefix(flags): # first filter out flags which nvcc doesn't allow flags = [flag for flag in flags if flag not in disallowed_flags] result = [flag for flag in flags if flag not in need_no_prefix] return result env['_NVCC_BARE_FLAG_FILTER'] = flags_which_need_no_prefix env['_NVCC_PREFIXED_FLAG_FILTER'] = flags_which_need_prefix env['_NVCC_BARE_CFLAGS'] = '${_concat("", CFLAGS, "", __env__, _NVCC_BARE_FLAG_FILTER)}' env['_NVCC_PREFIXED_CFLAGS'] = '${_concat("-Xcompiler ", CFLAGS, "", __env__, _NVCC_PREFIXED_FLAG_FILTER)}' env['_NVCC_CFLAGS'] = '$_NVCC_BARE_CFLAGS $_NVCC_PREFIXED_CFLAGS' env['_NVCC_BARE_SHCFLAGS'] = '${_concat("", SHCFLAGS, "", __env__, _NVCC_BARE_FLAG_FILTER)}' env['_NVCC_PREFIXED_SHCFLAGS'] = '${_concat("-Xcompiler ", SHCFLAGS, "", __env__, _NVCC_PREFIXED_FLAG_FILTER)}' env['_NVCC_SHCFLAGS'] = '$_NVCC_BARE_SHCFLAGS $_NVCC_PREFIXED_SHCFLAGS' env['_NVCC_BARE_CCFLAGS'] = '${_concat("", CCFLAGS, "", __env__, _NVCC_BARE_FLAG_FILTER)}' env['_NVCC_PREFIXED_CCFLAGS'] = '${_concat("-Xcompiler ", CCFLAGS, "", __env__, _NVCC_PREFIXED_FLAG_FILTER)}' env['_NVCC_CCFLAGS'] = '$_NVCC_BARE_CCFLAGS $_NVCC_PREFIXED_CCFLAGS' env['_NVCC_BARE_SHCCFLAGS'] = '${_concat("", SHCCFLAGS, "", __env__, _NVCC_BARE_FLAG_FILTER)}' env['_NVCC_PREFIXED_SHCCFLAGS'] = '${_concat("-Xcompiler ", SHCCFLAGS, "", __env__, _NVCC_PREFIXED_FLAG_FILTER)}' env['_NVCC_SHCCFLAGS'] = '$_NVCC_BARE_SHCCFLAGS $_NVCC_PREFIXED_SHCCFLAGS' env['_NVCC_BARE_CPPFLAGS'] = '${_concat("", CPPFLAGS, "", __env__, _NVCC_BARE_FLAG_FILTER)}' env['_NVCC_PREFIXED_CPPFLAGS'] = '${_concat("-Xcompiler ", CPPFLAGS, "", __env__, _NVCC_PREFIXED_FLAG_FILTER)}' env['_NVCC_CPPFLAGS'] = '$_NVCC_BARE_CPPFLAGS $_NVCC_PREFIXED_CPPFLAGS' # assemble the common command line env['_NVCCCOMCOM'] = '$_NVCC_CPPFLAGS $_CPPDEFFLAGS $_NVCC_CPPPATH' def generate(env): """ Add Builders and construction variables for CUDA compilers to an Environment. """ # create a builder that makes PTX files from .cu files ptx_builder = SCons.Builder.Builder(action = '$NVCC -ptx $NVCCFLAGS $_NVCC_CFLAGS $_NVCC_CCFLAGS $_NVCCCOMCOM $SOURCES -o $TARGET', emitter = {}, suffix = '.ptx', src_suffix = CUDASuffixes) env['BUILDERS']['PTXFile'] = ptx_builder # create builders that make static & shared objects from .cu files static_obj, shared_obj = SCons.Tool.createObjBuilders(env) for suffix in CUDASuffixes: # Add this suffix to the list of things buildable by Object static_obj.add_action('$CUDAFILESUFFIX', '$NVCCCOM') shared_obj.add_action('$CUDAFILESUFFIX', '$SHNVCCCOM') static_obj.add_emitter(suffix, SCons.Defaults.StaticObjectEmitter) shared_obj.add_emitter(suffix, SCons.Defaults.SharedObjectEmitter) # Add this suffix to the list of things scannable SCons.Tool.SourceFileScanner.add_scanner(suffix, CUDAScanner) add_common_nvcc_variables(env) # set the "CUDA Compiler Command" environment variable # windows is picky about getting the full filename of the executable if os.name == 'nt': env['NVCC'] = 'nvcc.exe' env['SHNVCC'] = 'nvcc.exe' else: env['NVCC'] = 'nvcc' env['SHNVCC'] = 'nvcc' # set the include path, and pass both c compiler flags and c++ compiler flags env['NVCCFLAGS'] = SCons.Util.CLVar('') env['SHNVCCFLAGS'] = SCons.Util.CLVar('') + ' -shared' # 'NVCC Command' env['NVCCCOM'] = '$NVCC -o $TARGET -c $NVCCFLAGS $_NVCC_CFLAGS $_NVCC_CCFLAGS $_NVCCCOMCOM $SOURCES' env['SHNVCCCOM'] = '$SHNVCC -o $TARGET -c $SHNVCCFLAGS $_NVCC_SHCFLAGS $_NVCC_SHCCFLAGS $_NVCCCOMCOM $SOURCES' # the suffix of CUDA source files is '.cu' env['CUDAFILESUFFIX'] = '.cu' # XXX add code to generate builders for other miscellaneous # CUDA files here, such as .gpu, etc. (bin_path,lib_path,inc_path) = get_cuda_paths(env) env.PrependENVPath('PATH', bin_path) def exists(env): return env.Detect('nvcc') thrust-1.9.5/site_scons/site_tools/zip.py000066400000000000000000000065621344621116200206010ustar00rootroot00000000000000"""SCons.Tool.zip Tool-specific initialization for zip. There normally shouldn't be any need to import this module directly. It will usually be imported through the generic SCons.Tool.Tool() selection method. This version applies the patch from scons.tigris.org/issues/show_bug.cgi?id=2575 """ # # Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 The SCons Foundation # # Permission is hereby granted, free of charge, to any person obtaining # a copy of this software and associated documentation files (the # "Software"), to deal in the Software without restriction, including # without limitation the rights to use, copy, modify, merge, publish, # distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject to # the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY # KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE # WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE # LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION # WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # __revision__ = "src/engine/SCons/Tool/zip.py 5134 2010/08/16 23:02:40 bdeegan" import os.path import SCons.Builder import SCons.Defaults import SCons.Node.FS import SCons.Util try: import zipfile internal_zip = 1 except ImportError: internal_zip = 0 if internal_zip: zipcompression = zipfile.ZIP_DEFLATED def zip(target, source, env): compression = env.get('ZIPCOMPRESSION', 0) zf = zipfile.ZipFile(target[0].abspath, 'w', compression) for s in source: if s.isdir(): for dirpath, dirnames, filenames in os.walk(os.path.relpath(s.abspath)): for fname in filenames: path = os.path.join(dirpath, fname) if os.path.isfile(path): zf.write(path) else: zf.write(os.path.relpath(s.abspath)) zf.close() else: zipcompression = 0 zip = "$ZIP $ZIPFLAGS ${TARGET.abspath} $SOURCES" zipAction = SCons.Action.Action(zip, varlist=['ZIPCOMPRESSION']) ZipBuilder = SCons.Builder.Builder(action = SCons.Action.Action('$ZIPCOM', '$ZIPCOMSTR'), source_factory = SCons.Node.FS.Entry, source_scanner = SCons.Defaults.DirScanner, suffix = '$ZIPSUFFIX', multi = 1) def generate(env): """Add Builders and construction variables for zip to an Environment.""" try: bld = env['BUILDERS']['Zip'] except KeyError: bld = ZipBuilder env['BUILDERS']['Zip'] = bld env['ZIP'] = 'zip' env['ZIPFLAGS'] = SCons.Util.CLVar('') env['ZIPCOM'] = zipAction env['ZIPCOMPRESSION'] = zipcompression env['ZIPSUFFIX'] = '.zip' def exists(env): return internal_zip or env.Detect('zip') # Local Variables: # tab-width:4 # indent-tabs-mode:nil # End: # vim: set expandtab tabstop=4 shiftwidth=4: thrust-1.9.5/testing/000077500000000000000000000000001344621116200145345ustar00rootroot00000000000000thrust-1.9.5/testing/CMakeLists.txt000066400000000000000000000031301344621116200172710ustar00rootroot00000000000000set(DRIVER "${CMAKE_CURRENT_SOURCE_DIR}/testframework.cpp") FILE(GLOB SOURCES_CU *.cu) FILE(GLOB SOURCES_CPP *.cpp) set(SOURCES ${SOURCES_CU} ${SOURCES_CPP}) list(FIND SOURCES ${DRIVER} index) if (${index} EQUAL -1) MESSAGE(FATAL_ERROR "${DRIVER} was not found in source list. Something went wrong") endif() list(REMOVE_AT SOURCES ${index} SOURCES) list(LENGTH SOURCES index) message(STATUS "Found ${index} tests in testing") set(CMAKE_INCLUDE_CURRENT_DIR ON) cuda_include_directories(${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_SOURCE_DIR}) add_subdirectory(backend) cuda_add_library(test_driver ${DRIVER} STATIC EXCLUDE_FROM_ALL) set(targets "") foreach(src ${SOURCES}) get_filename_component(exec_name ${src} NAME_WE) set(target testing-${exec_name}) thrust_add_executable(${target} ${src}) target_link_libraries(${target} test_driver) set_target_properties(${target} PROPERTIES EXCLUDE_FROM_ALL TRUE) add_test(NAME ${target} COMMAND ${target}) list(APPEND targets ${target}) endforeach() string(TOLOWER ${DEVICE_BACKEND} backend) set(targets-backend "") foreach(src ${SOURCES_BACKEND}) get_filename_component(exec_name ${src} NAME_WE) set(target testing-${backend}-${exec_name}) thrust_add_executable(${target} ${src}) target_link_libraries(${target} test_driver) set_target_properties(${target} PROPERTIES EXCLUDE_FROM_ALL TRUE) add_test(NAME ${target} COMMAND ${target}) list(APPEND targets-backend ${target}) endforeach() add_custom_target(testing DEPENDS ${targets} ${targets-backend}) add_custom_target(check COMMAND ${CMAKE_CTEST_COMMAND}) add_dependencies(check testing) thrust-1.9.5/testing/SConscript000066400000000000000000000030711344621116200165470ustar00rootroot00000000000000Import('env') # clone the parent's env so that we do not modify it my_env = env.Clone() vars = Variables() # add a variable to filter source files by a regex vars.Add('tests', 'Filter test files using a regex', '.') # update variables my_env.Help(vars.GenerateHelpText(env)) vars.Update(my_env) # populate the environment # with cl we have to do /bigobj if my_env.subst('$CXX') == 'cl': my_env.Append(CPPFLAGS = '/bigobj') # #include the current directory my_env.Append(CPPPATH = Dir('.').srcnode()) # find all .cus & .cpps sources = [] extensions = ['*.cu', '*.cpp'] # gather sources in the current directorie for ext in extensions: sources.extend(my_env.Glob(ext)) # gather sources from directories sources.extend(SConscript('backend/SConscript', exports='env')) # filter sources import re filter_exp = 'int main|driver_instance|{0}'.format(my_env['tests']) pattern = re.compile(filter_exp) def test_filter(src): return pattern.search(src.get_contents()) sources = filter(test_filter, sources) tester = my_env.Program('tester', sources) # create a 'unit_tests' alias unit_tests_alias = my_env.Alias('unit_tests', [tester]) # add the verbose tester to the 'run_unit_tests' alias run_unit_tests_alias = my_env.Alias('run_unit_tests', [tester], tester[0].abspath + ' --verbose') # always build the 'run_unit_tests' target whether or not it needs it my_env.AlwaysBuild(run_unit_tests_alias) # add the unit tests alias to the 'run_tests' alias my_env.Alias('run_tests', [tester], tester[0].abspath) # build children SConscript('trivial_tests/SConscript', exports='env') thrust-1.9.5/testing/adjacent_difference.cu000066400000000000000000000140331344621116200210110ustar00rootroot00000000000000#include #include #include #include template void TestAdjacentDifferenceSimple(void) { typedef typename Vector::value_type T; Vector input(3); Vector output(3); input[0] = 1; input[1] = 4; input[2] = 6; typename Vector::iterator result; result = thrust::adjacent_difference(input.begin(), input.end(), output.begin()); ASSERT_EQUAL(result - output.begin(), 3); ASSERT_EQUAL(output[0], T(1)); ASSERT_EQUAL(output[1], T(3)); ASSERT_EQUAL(output[2], T(2)); result = thrust::adjacent_difference(input.begin(), input.end(), output.begin(), thrust::plus()); ASSERT_EQUAL(result - output.begin(), 3); ASSERT_EQUAL(output[0], T( 1)); ASSERT_EQUAL(output[1], T( 5)); ASSERT_EQUAL(output[2], T(10)); // test in-place operation, result and first are permitted to be the same result = thrust::adjacent_difference(input.begin(), input.end(), input.begin()); ASSERT_EQUAL(result - input.begin(), 3); ASSERT_EQUAL(input[0], T(1)); ASSERT_EQUAL(input[1], T(3)); ASSERT_EQUAL(input[2], T(2)); } DECLARE_VECTOR_UNITTEST(TestAdjacentDifferenceSimple); template void TestAdjacentDifference(const size_t n) { thrust::host_vector h_input = unittest::random_samples(n); thrust::device_vector d_input = h_input; thrust::host_vector h_output(n); thrust::device_vector d_output(n); typename thrust::host_vector::iterator h_result; typename thrust::device_vector::iterator d_result; h_result = thrust::adjacent_difference(h_input.begin(), h_input.end(), h_output.begin()); d_result = thrust::adjacent_difference(d_input.begin(), d_input.end(), d_output.begin()); ASSERT_EQUAL(std::size_t(h_result - h_output.begin()), n); ASSERT_EQUAL(std::size_t(d_result - d_output.begin()), n); ASSERT_EQUAL(h_output, d_output); h_result = thrust::adjacent_difference(h_input.begin(), h_input.end(), h_output.begin(), thrust::plus()); d_result = thrust::adjacent_difference(d_input.begin(), d_input.end(), d_output.begin(), thrust::plus()); ASSERT_EQUAL(std::size_t(h_result - h_output.begin()), n); ASSERT_EQUAL(std::size_t(d_result - d_output.begin()), n); ASSERT_EQUAL(h_output, d_output); // in-place operation h_result = thrust::adjacent_difference(h_input.begin(), h_input.end(), h_input.begin(), thrust::plus()); d_result = thrust::adjacent_difference(d_input.begin(), d_input.end(), d_input.begin(), thrust::plus()); ASSERT_EQUAL(std::size_t(h_result - h_input.begin()), n); ASSERT_EQUAL(std::size_t(d_result - d_input.begin()), n); ASSERT_EQUAL(h_input, h_output); //computed previously ASSERT_EQUAL(d_input, d_output); //computed previously } DECLARE_VARIABLE_UNITTEST(TestAdjacentDifference); template void TestAdjacentDifferenceInPlaceWithRelatedIteratorTypes(const size_t n) { thrust::host_vector h_input = unittest::random_samples(n); thrust::device_vector d_input = h_input; thrust::host_vector h_output(n); thrust::device_vector d_output(n); typename thrust::host_vector::iterator h_result; typename thrust::device_vector::iterator d_result; h_result = thrust::adjacent_difference(h_input.begin(), h_input.end(), h_output.begin(), thrust::plus()); d_result = thrust::adjacent_difference(d_input.begin(), d_input.end(), d_output.begin(), thrust::plus()); // in-place operation with different iterator types h_result = thrust::adjacent_difference(h_input.cbegin(), h_input.cend(), h_input.begin(), thrust::plus()); d_result = thrust::adjacent_difference(d_input.cbegin(), d_input.cend(), d_input.begin(), thrust::plus()); ASSERT_EQUAL(std::size_t(h_result - h_input.begin()), n); ASSERT_EQUAL(std::size_t(d_result - d_input.begin()), n); ASSERT_EQUAL(h_output, h_input); // reference computed previously ASSERT_EQUAL(d_output, d_input); // reference computed previously } DECLARE_VARIABLE_UNITTEST(TestAdjacentDifferenceInPlaceWithRelatedIteratorTypes); template void TestAdjacentDifferenceDiscardIterator(const size_t n) { thrust::host_vector h_input = unittest::random_samples(n); thrust::device_vector d_input = h_input; thrust::discard_iterator<> h_result = thrust::adjacent_difference(h_input.begin(), h_input.end(), thrust::make_discard_iterator()); thrust::discard_iterator<> d_result = thrust::adjacent_difference(d_input.begin(), d_input.end(), thrust::make_discard_iterator()); thrust::discard_iterator<> reference(n); ASSERT_EQUAL_QUIET(reference, h_result); ASSERT_EQUAL_QUIET(reference, d_result); } DECLARE_VARIABLE_UNITTEST(TestAdjacentDifferenceDiscardIterator); template OutputIterator adjacent_difference(my_system &system, InputIterator, InputIterator, OutputIterator result) { system.validate_dispatch(); return result; } void TestAdjacentDifferenceDispatchExplicit() { thrust::device_vector d_input(1); my_system sys(0); thrust::adjacent_difference(sys, d_input.begin(), d_input.end(), d_input.begin()); ASSERT_EQUAL(true, sys.is_valid()); } DECLARE_UNITTEST(TestAdjacentDifferenceDispatchExplicit); template OutputIterator adjacent_difference(my_tag, InputIterator, InputIterator, OutputIterator result) { *result = 13; return result; } void TestAdjacentDifferenceDispatchImplicit() { thrust::device_vector d_input(1); thrust::adjacent_difference(thrust::retag(d_input.begin()), thrust::retag(d_input.end()), thrust::retag(d_input.begin())); ASSERT_EQUAL(13, d_input.front()); } DECLARE_UNITTEST(TestAdjacentDifferenceDispatchImplicit); thrust-1.9.5/testing/advance.cu000066400000000000000000000037151344621116200164740ustar00rootroot00000000000000#include #include #include // TODO expand this with other iterator types (forward, bidirectional, etc.) template void TestAdvance() { typedef typename Vector::value_type T; typedef typename Vector::iterator Iterator; Vector v(10); thrust::sequence(v.begin(), v.end()); Iterator i = v.begin(); thrust::advance(i, 1); ASSERT_EQUAL(*i, T(1)); thrust::advance(i, 8); ASSERT_EQUAL(*i, T(9)); thrust::advance(i, -4); ASSERT_EQUAL(*i, T(5)); } DECLARE_VECTOR_UNITTEST(TestAdvance); template void TestNext() { typedef typename Vector::value_type T; typedef typename Vector::iterator Iterator; Vector v(10); thrust::sequence(v.begin(), v.end()); Iterator const i0 = v.begin(); Iterator const i1 = thrust::next(i0); ASSERT_EQUAL(*i0, T(0)); ASSERT_EQUAL(*i1, T(1)); Iterator const i2 = thrust::next(i1, 8); ASSERT_EQUAL(*i0, T(0)); ASSERT_EQUAL(*i1, T(1)); ASSERT_EQUAL(*i2, T(9)); Iterator const i3 = thrust::next(i2, -4); ASSERT_EQUAL(*i0, T(0)); ASSERT_EQUAL(*i1, T(1)); ASSERT_EQUAL(*i2, T(9)); ASSERT_EQUAL(*i3, T(5)); } DECLARE_VECTOR_UNITTEST(TestNext); template void TestPrev() { typedef typename Vector::value_type T; typedef typename Vector::iterator Iterator; Vector v(10); thrust::sequence(v.begin(), v.end()); Iterator const i0 = v.end(); Iterator const i1 = thrust::prev(i0); ASSERT_EQUAL_QUIET(i0, v.end()); ASSERT_EQUAL(*i1, T(9)); Iterator const i2 = thrust::prev(i1, 8); ASSERT_EQUAL_QUIET(i0, v.end()); ASSERT_EQUAL(*i1, T(9)); ASSERT_EQUAL(*i2, T(1)); Iterator const i3 = thrust::prev(i2, -4); ASSERT_EQUAL_QUIET(i0, v.end()); ASSERT_EQUAL(*i1, T(9)); ASSERT_EQUAL(*i2, T(1)); ASSERT_EQUAL(*i3, T(5)); } DECLARE_VECTOR_UNITTEST(TestPrev); thrust-1.9.5/testing/alignment.cu000066400000000000000000000243051344621116200170470ustar00rootroot00000000000000#include #include struct alignof_mock_0 { char a; char b; }; // size: 2 * sizeof(char), alignment: sizeof(char) struct alignof_mock_1 { int n; char c; // sizeof(int) - sizeof(char) bytes of padding }; // size: 2 * sizeof(int), alignment: sizeof(int) struct alignof_mock_2 { int n; char c; // sizeof(int) - sizeof(char) bytes of padding }; // size: 2 * sizeof(int), alignment: sizeof(int) struct alignof_mock_3 { char c; // sizeof(int) - sizeof(char) bytes of padding int n; }; // size: 2 * sizeof(int), alignment: sizeof(int) struct alignof_mock_4 { char c0; // sizeof(int) - sizeof(char) bytes of padding int n; char c1; // sizeof(int) - sizeof(char) bytes of padding }; // size: 3 * sizeof(int), alignment: sizeof(int) struct alignof_mock_5 { char c0; char c1; // sizeof(int) - 2 * sizeof(char) bytes of padding int n; }; // size: 2 * sizeof(int), alignment: sizeof(int) struct alignof_mock_6 { int n; char c0; char c1; // sizeof(int) - 2 * sizeof(char) bytes of padding }; // size: 2 * sizeof(int), alignment: sizeof(int) void test_alignof_mocks_sizes() { ASSERT_EQUAL(sizeof(alignof_mock_0), 2 * sizeof(char)); ASSERT_EQUAL(sizeof(alignof_mock_1), 2 * sizeof(int)); ASSERT_EQUAL(sizeof(alignof_mock_2), 2 * sizeof(int)); ASSERT_EQUAL(sizeof(alignof_mock_3), 2 * sizeof(int)); ASSERT_EQUAL(sizeof(alignof_mock_4), 3 * sizeof(int)); ASSERT_EQUAL(sizeof(alignof_mock_5), 2 * sizeof(int)); ASSERT_EQUAL(sizeof(alignof_mock_6), 2 * sizeof(int)); } DECLARE_UNITTEST(test_alignof_mocks_sizes); void test_alignof() { ASSERT_EQUAL(THRUST_ALIGNOF(bool) , sizeof(bool)); ASSERT_EQUAL(THRUST_ALIGNOF(signed char) , sizeof(signed char)); ASSERT_EQUAL(THRUST_ALIGNOF(unsigned char) , sizeof(unsigned char)); ASSERT_EQUAL(THRUST_ALIGNOF(char) , sizeof(char)); ASSERT_EQUAL(THRUST_ALIGNOF(short int) , sizeof(short int)); ASSERT_EQUAL(THRUST_ALIGNOF(unsigned short int) , sizeof(unsigned short int)); ASSERT_EQUAL(THRUST_ALIGNOF(int) , sizeof(int)); ASSERT_EQUAL(THRUST_ALIGNOF(unsigned int) , sizeof(unsigned int)); ASSERT_EQUAL(THRUST_ALIGNOF(long int) , sizeof(long int)); ASSERT_EQUAL(THRUST_ALIGNOF(unsigned long int) , sizeof(unsigned long int)); ASSERT_EQUAL(THRUST_ALIGNOF(long long int) , sizeof(long long int)); ASSERT_EQUAL(THRUST_ALIGNOF(unsigned long long int), sizeof(unsigned long long int)); ASSERT_EQUAL(THRUST_ALIGNOF(float) , sizeof(float)); ASSERT_EQUAL(THRUST_ALIGNOF(double) , sizeof(double)); ASSERT_EQUAL(THRUST_ALIGNOF(long double) , sizeof(long double)); ASSERT_EQUAL(THRUST_ALIGNOF(alignof_mock_0), sizeof(char)); ASSERT_EQUAL(THRUST_ALIGNOF(alignof_mock_1), sizeof(int)); ASSERT_EQUAL(THRUST_ALIGNOF(alignof_mock_2), sizeof(int)); ASSERT_EQUAL(THRUST_ALIGNOF(alignof_mock_3), sizeof(int)); ASSERT_EQUAL(THRUST_ALIGNOF(alignof_mock_4), sizeof(int)); ASSERT_EQUAL(THRUST_ALIGNOF(alignof_mock_5), sizeof(int)); ASSERT_EQUAL(THRUST_ALIGNOF(alignof_mock_6), sizeof(int)); } DECLARE_UNITTEST(test_alignof); void test_alignment_of() { ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(bool) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(signed char) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(unsigned char) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(char) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(short int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(unsigned short int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(unsigned int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(long int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(unsigned long int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(long long int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(unsigned long long int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(float) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(double) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(long double) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(char) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(int) ); ASSERT_EQUAL( thrust::detail::alignment_of::value , sizeof(int) ); } DECLARE_UNITTEST(test_alignment_of); template void test_aligned_type_instantiation() { typedef typename thrust::detail::aligned_type::type type; ASSERT_GEQUAL(sizeof(type), 1lu); ASSERT_EQUAL(THRUST_ALIGNOF(type), Align); ASSERT_EQUAL(thrust::detail::alignment_of::value, Align); } void test_aligned_type() { test_aligned_type_instantiation<1>(); test_aligned_type_instantiation<2>(); test_aligned_type_instantiation<4>(); test_aligned_type_instantiation<8>(); test_aligned_type_instantiation<16>(); test_aligned_type_instantiation<32>(); test_aligned_type_instantiation<64>(); test_aligned_type_instantiation<128>(); } DECLARE_UNITTEST(test_aligned_type); template void test_aligned_storage_instantiation() { typedef typename thrust::detail::aligned_storage::type type; ASSERT_GEQUAL(sizeof(type), Len); ASSERT_EQUAL(THRUST_ALIGNOF(type), Align); ASSERT_EQUAL(thrust::detail::alignment_of::value, Align); } template void test_aligned_storage_size() { test_aligned_storage_instantiation(); test_aligned_storage_instantiation(); test_aligned_storage_instantiation(); test_aligned_storage_instantiation(); test_aligned_storage_instantiation(); test_aligned_storage_instantiation(); test_aligned_storage_instantiation(); test_aligned_storage_instantiation(); } void test_aligned_storage() { test_aligned_storage_size<1>(); test_aligned_storage_size<2>(); test_aligned_storage_size<4>(); test_aligned_storage_size<8>(); test_aligned_storage_size<16>(); test_aligned_storage_size<32>(); test_aligned_storage_size<64>(); test_aligned_storage_size<128>(); test_aligned_storage_size<256>(); test_aligned_storage_size<512>(); test_aligned_storage_size<1024>(); test_aligned_storage_size<2048>(); test_aligned_storage_size<4096>(); test_aligned_storage_size<8192>(); test_aligned_storage_size<16384>(); test_aligned_storage_size<3>(); test_aligned_storage_size<5>(); test_aligned_storage_size<7>(); test_aligned_storage_size<17>(); test_aligned_storage_size<42>(); test_aligned_storage_size<10000>(); } DECLARE_UNITTEST(test_aligned_storage); void test_max_align_t() { ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(bool) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(signed char) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(unsigned char) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(char) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(short int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(unsigned short int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(unsigned int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(long int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(unsigned long int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(long long int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(unsigned long long int) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(float) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(double) ); ASSERT_GEQUAL( THRUST_ALIGNOF(thrust::detail::max_align_t) , THRUST_ALIGNOF(long double) ); } DECLARE_UNITTEST(test_max_align_t); void test_aligned_reinterpret_cast() { thrust::detail::aligned_type<1>* a1 = 0; thrust::detail::aligned_type<2>* a2 = 0; // Cast to type with stricter (larger) alignment requirement. a2 = thrust::detail::aligned_reinterpret_cast< thrust::detail::aligned_type<2>* >(a1); // Cast to type with less strict (smaller) alignment requirement. a1 = thrust::detail::aligned_reinterpret_cast< thrust::detail::aligned_type<1>* >(a2); } DECLARE_UNITTEST(test_aligned_reinterpret_cast); thrust-1.9.5/testing/allocator.cu000066400000000000000000000132451344621116200170520ustar00rootroot00000000000000#include #include #include #include template struct my_allocator_with_custom_construct1 : thrust::device_malloc_allocator { __host__ __device__ my_allocator_with_custom_construct1() {} __host__ __device__ void construct(T *p) { *p = 13; } }; template void TestAllocatorCustomDefaultConstruct(size_t n) { thrust::device_vector ref(n, 13); thrust::device_vector > vec(n); ASSERT_EQUAL_QUIET(ref, vec); } DECLARE_VARIABLE_UNITTEST(TestAllocatorCustomDefaultConstruct); template struct my_allocator_with_custom_construct2 : thrust::device_malloc_allocator { __host__ __device__ my_allocator_with_custom_construct2() {} template __host__ __device__ void construct(T *p, const Arg &) { *p = 13; } }; template void TestAllocatorCustomCopyConstruct(size_t n) { thrust::device_vector ref(n, 13); thrust::device_vector copy_from(n, 7); thrust::device_vector > vec(copy_from.begin(), copy_from.end()); ASSERT_EQUAL_QUIET(ref, vec); } DECLARE_VARIABLE_UNITTEST(TestAllocatorCustomCopyConstruct); template struct my_allocator_with_custom_destroy { typedef T value_type; typedef T & reference; typedef const T & const_reference; static bool g_state; __host__ my_allocator_with_custom_destroy(){} __host__ my_allocator_with_custom_destroy(const my_allocator_with_custom_destroy &other) : use_me_to_alloc(other.use_me_to_alloc) {} __host__ ~my_allocator_with_custom_destroy(){} __host__ __device__ void destroy(T *) { #if !__CUDA_ARCH__ g_state = true; #endif } value_type *allocate(std::ptrdiff_t n) { return use_me_to_alloc.allocate(n); } void deallocate(value_type *ptr, std::ptrdiff_t n) { use_me_to_alloc.deallocate(ptr,n); } bool operator==(const my_allocator_with_custom_destroy &) const { return true; } bool operator!=(const my_allocator_with_custom_destroy &other) const { return !(*this == other); } typedef thrust::detail::true_type is_always_equal; // use composition rather than inheritance // to avoid inheriting std::allocator's member // function destroy std::allocator use_me_to_alloc; }; template bool my_allocator_with_custom_destroy::g_state = false; template void TestAllocatorCustomDestroy(size_t n) { { thrust::cpp::vector > vec(n); } // destroy everything if (0 < n) ASSERT_EQUAL(true, my_allocator_with_custom_destroy::g_state); } DECLARE_VARIABLE_UNITTEST(TestAllocatorCustomDestroy); template struct my_minimal_allocator { typedef T value_type; // XXX ideally, we shouldn't require // these two typedefs typedef T & reference; typedef const T & const_reference; __host__ my_minimal_allocator(){} __host__ my_minimal_allocator(const my_minimal_allocator &other) : use_me_to_alloc(other.use_me_to_alloc) {} __host__ ~my_minimal_allocator(){} value_type *allocate(std::ptrdiff_t n) { return use_me_to_alloc.allocate(n); } void deallocate(value_type *ptr, std::ptrdiff_t n) { use_me_to_alloc.deallocate(ptr,n); } std::allocator use_me_to_alloc; }; template void TestAllocatorMinimal(size_t n) { thrust::cpp::vector > vec(n, 13); // XXX copy to h_vec because ASSERT_EQUAL doesn't know about cpp::vector thrust::host_vector h_vec(vec.begin(), vec.end()); thrust::host_vector ref(n, 13); ASSERT_EQUAL(ref, h_vec); } DECLARE_VARIABLE_UNITTEST(TestAllocatorMinimal); void TestAllocatorTraitsRebind() { ASSERT_EQUAL( (thrust::detail::is_same< typename thrust::detail::allocator_traits< thrust::device_malloc_allocator >::template rebind_traits::other, typename thrust::detail::allocator_traits< thrust::device_malloc_allocator > >::value), true ); ASSERT_EQUAL( (thrust::detail::is_same< typename thrust::detail::allocator_traits< my_minimal_allocator >::template rebind_traits::other, typename thrust::detail::allocator_traits< my_minimal_allocator > >::value), true ); } DECLARE_UNITTEST(TestAllocatorTraitsRebind); #if __cplusplus >= 201103L void TestAllocatorTraitsRebindCpp11() { ASSERT_EQUAL( (thrust::detail::is_same< typename thrust::detail::allocator_traits< thrust::device_malloc_allocator >::template rebind_alloc, thrust::device_malloc_allocator >::value), true ); ASSERT_EQUAL( (thrust::detail::is_same< typename thrust::detail::allocator_traits< my_minimal_allocator >::template rebind_alloc, my_minimal_allocator >::value), true ); ASSERT_EQUAL( (thrust::detail::is_same< typename thrust::detail::allocator_traits< thrust::device_malloc_allocator >::template rebind_traits, typename thrust::detail::allocator_traits< thrust::device_malloc_allocator > >::value), true ); ASSERT_EQUAL( (thrust::detail::is_same< typename thrust::detail::allocator_traits< my_minimal_allocator >::template rebind_traits, typename thrust::detail::allocator_traits< my_minimal_allocator > >::value), true ); } DECLARE_UNITTEST(TestAllocatorTraitsRebindCpp11); #endif thrust-1.9.5/testing/allocator_aware_policies.cu000066400000000000000000000102431344621116200221130ustar00rootroot00000000000000#include #include #include #include #include #include template struct test_allocator_t { }; test_allocator_t test_allocator = test_allocator_t(); const test_allocator_t const_test_allocator = test_allocator_t(); struct test_memory_resource_t THRUST_FINAL : thrust::mr::memory_resource<> { void * do_allocate(std::size_t, std::size_t) THRUST_OVERRIDE { return NULL; } void do_deallocate(void *, std::size_t, std::size_t) THRUST_OVERRIDE { } } test_memory_resource; template class CRTPBase> struct policy_info { typedef Policy policy; template