(Alternatively these storage typesĬould be aliased to global memory allocations).ĬUB is regularly tested using the specified versions of the followingĬompilers. Shared memory needed by the thread block. The thread block uses these storage types to statically allocate the union of Once specialized, these classes expose opaque TempStorage member types. Simultaneously access consecutive items) and then transpose the keys intoĪ blocked arrangement of elements across threads. The cub::BlockLoad and cub::BlockStore classes are similarly specialized.įurthermore, to provide coalesced accesses to device memory, these primitives areĬonfigured to access memory using a striped access pattern (where consecutive threads Keys per thread, and implicitly by the targeted compilation architecture. The class is specialized by theĭata type being sorted, by the number of threads per block, by the number of Store(d_out + block_offset, thread_keys) Įach thread block uses cub::BlockRadixSort to collectively sort Store the sorted segment BlockStore(temp_storage. Collectively sort the keys BlockRadixSort(temp_storage. x * ( 128 * 16) // OffsetT for this block's ment // Obtain a segment of 2048 consecutive keys that are blocked across threads int thread_keys īlockLoad(temp_storage. Typename BlockRadixSort::TempStorage sort Using namespace cub // Specialize BlockRadixSort, BlockLoad, and BlockStore for 128 threads // owning 16 integer items each typedef BlockRadixSort BlockRadixSort _global_ void BlockSortKernel( int *d_in, int *d_out)
0 Comments
Leave a Reply. |