Ceci est une ancienne révision du document !
cuda
pycuda
voir http://wiki.tiker.net/PyCuda/Installation/Linux/Ubuntu
tar xzf Downloads/pycuda-2015.1.3.tar.gz cd pycuda-2015.1.3/ export PATH=/local/apps/cuda-7.5/bin:$PATH export LD_LIBRARY_PATH=/local/apps/cuda-7.5/lib64:$LD_LIBRARY_PATH ./configure.py --python-exe=/usr/bin/python3 --cuda-root=/local/apps/cuda-7.5 --cudadrv-lib-dir=/usr/lib/x86_64-linux-gnu --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib --boost-python-libname=boost_python-py34 --boost-thread-libname=boost_thread --no-use-shipped-boost
Utilisation
export PATH=/local/apps/cuda-7.5/bin:$PATH export LD_LIBRARY_PATH=/local/apps/cuda-7.5/lib64:$LD_LIBRARY_PATH
les exemples sont dans /local/admin1/NVIDIA_CUDA-7.5_Samples/
- pour les recopier:
sh cuda-install-samples-7.5.sh ~
- Exemple avec nbody:
./nbody -bench GPU Device 0: "Quadro K620" with compute capability 5.0 > Compute 5.0 CUDA device: [Quadro K620] 3072 bodies, total time for 10 iterations: 4.661 ms = 20.246 billion interactions per second = 404.929 single-precision GFLOP/s at 20 flops per interaction
carte K620 sur Dell 7100
dans le dossier ~/NVIDIA_CUDA-7.5_Samples/1_Utilities/
./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro K620" CUDA Driver Version / Runtime Version 7.5 / 7.5 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 2047 MBytes (2146762752 bytes) ( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores GPU Max Clock rate: 1124 MHz (1.12 GHz) Memory Clock rate: 900 Mhz Memory Bus Width: 128-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro K620 Result = PASS
[CUDA Bandwidth Test] - Starting... Running on... Device 0: Quadro K620 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6417.3 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6471.0 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 26349.3 Result = PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.