**cuda** ====== pycuda ====== * voir [[http://wiki.tiker.net/PyCuda/Installation/Linux/Ubuntu]] * me suis aussi inspiré de [[https://wiki.calculquebec.ca/w/Python/fr]] tar xzf Downloads/pycuda-2015.1.3.tar.gz cd pycuda-2015.1.3/ export PATH=/local/apps/cuda-7.5/bin:$PATH export LD_LIBRARY_PATH=/local/apps/cuda-7.5/lib64:$LD_LIBRARY_PATH ./configure.py --python-exe=/usr/bin/python3 --cuda-root=/local/apps/cuda-7.5 --cudadrv-lib-dir=/usr/lib/x86_64-linux-gnu --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib --boost-python-libname=boost_python-py34 --boost-thread-libname=boost_thread --no-use-shipped-boost erreur: ImportError: No module named 'setuptools' résolu en installant: apt-get install python3-setuptools ppuis: ImportError: No module named 'numpy' # apt-get install python3-scipy Reading package lists... Done Building dependency tree Reading state information... Done The following extra packages will be installed: python3-decorator python3-numpy plus apt-get install libpython3.4-dev une fois fait, on va installer dans un virtualenv pour ne pas toucher aux fichiers système: * on ajoute un alias dans .bashrc pour que python lance python3 ===== venv python3 ===== * on crée le venv en python3 pyvenv-3.4 env_pycuda qui crée le dossier env_pycuda, et on l'active source ~/env_pycuda/bin/activate dans ce venv, il faut ajouter numpy: pip install numpy et enfin: export PATH=/local/apps/cuda-7.5/bin:$PATH export LD_LIBRARY_PATH=/local/apps/cuda-7.5/lib64:$LD_LIBRARY_PATH python setup.py install ce qui donne: $ pip list appdirs (1.4.0) decorator (4.0.6) numpy (1.10.4) pip (1.5.4) py (1.4.31) pycuda (2015.1.3) pytest (2.8.7) pytools (2016.1) setuptools (3.3) six (1.10.0) on récupère les examples dans le dossier: ./pycuda-2015.1.3/examples/download-examples-from-wiki.py les exemples sont dans le dossier wiki-examples/ ===== avec python2 ===== virtualenv env_pycuda_py2 source env_pycuda_py2/bin/activate pip install numpy cd pycuda-2015.1.3/ rm siteconf.py ./configure.py --cuda-root=/local/apps/cuda-7.5/ --cudadrv-lib-dir=/usr/lib/x86_64-linux-gnu --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib --boost-python-libname=boost_python --boost-thread-libname=boost_thread --no-use-shipped-boost python setup.py install pip install . et on vérifie: pip list appdirs (1.4.0) argparse (1.2.1) decorator (4.0.6) numpy (1.10.4) pip (1.5.4) py (1.4.31) pycuda (2015.1.3) pytest (2.8.7) pytools (2016.1) setuptools (2.2) six (1.10.0) wsgiref (0.1.2) ====== Utilisation ====== export PATH=/local/apps/cuda-7.5/bin:$PATH export LD_LIBRARY_PATH=/local/apps/cuda-7.5/lib64:$LD_LIBRARY_PATH les exemples sont dans /local/admin1/NVIDIA_CUDA-7.5_Samples/ * pour les recopier: ''sh cuda-install-samples-7.5.sh ~'' * Exemple avec nbody: ./nbody -bench GPU Device 0: "Quadro K620" with compute capability 5.0 > Compute 5.0 CUDA device: [Quadro K620] 3072 bodies, total time for 10 iterations: 4.661 ms = 20.246 billion interactions per second = 404.929 single-precision GFLOP/s at 20 flops per interaction ====== carte K620 sur Dell 7100 ====== dans le dossier ~/NVIDIA_CUDA-7.5_Samples/1_Utilities/ ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro K620" CUDA Driver Version / Runtime Version 7.5 / 7.5 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 2047 MBytes (2146762752 bytes) ( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores GPU Max Clock rate: 1124 MHz (1.12 GHz) Memory Clock rate: 900 Mhz Memory Bus Width: 128-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro K620 Result = PASS [CUDA Bandwidth Test] - Starting... Running on... Device 0: Quadro K620 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6417.3 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6471.0 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 26349.3 Result = PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. ====== Installation ======