====== UMFPACK ====== * [[http://www.cise.ufl.edu/research/sparse/umfpack/]] ====== Installation ====== ^ ^ ^ ^ | nemo | 4.4 | /usr/local/UMFPACKv4.4 | compilé avec sunperflib | | shrek | | /usr/local/UMFPACKv4.4 | compilé avec [[http://www.cs.utexas.edu/users/flame/goto|K. Goto's BLAS]] | | octopus | 4.4 | /usr/local/UMFPACKv4.4 | compilé avec sunperflib | ====== Utilisation ====== * /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a * /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a * /usr/local/UMFPACKv4.4/UMFPACK/Include * /usr/local/UMFPACKv4.4/AMD/Include * /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ ===== Utilisation de umfpack dans un code fortran ===== : * umfpack est écrit en C * il existe une interface fortran 77, utilisable en fortran 90 * prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo * dans son fichier, ajouter les lignes suivantes: call umf4def (control) ! met les parametres par defauts control (1) = 1 call umf4pcon (control) call umf4sym (N, N, Ap, Ai, Ax, symbolic, control, info) ! pre-order and symbolic analysis call umf4num (Ap, Ai, Ax, symbolic, numeric, control, info) call umf4fsym (symbolic) call umf4sol (sys, x, RHSV, numeric, control, info) call umf4fnum (numeric) call umf4pinf (control, info) Pour cela, il faut bien entendu lier votre programme avec le programme umf4_f77wrapper.c, de la facon suivante: cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c f90 -o poisson3d_umfpack.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -c poisson3d_umfpack.f90 f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -o poisson3d_umfpack poisson3d_umfpack.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf ====== Config ====== * on édite Make.include et Make.solaris (voir lien sur Make.solaris_amd64) avant de compiler diff /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include-ori /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include 50c50 < CONFIG = -DNBLAS --- > CONFIG = 63c63 < # include ../Make/Make.solaris --- > include ../Make/Make.solaris et diff /local2/fboyer/UMFPACKv4.4/UMFPACK/Make/Make.solaris /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.solaris 5a6 > 11,13c12,14 < CC = cc < CFLAGS = -Xc -xO5 -KPIC -dalign -xtarget=generic64 < F77FLAGS = -xO5 -KPIC -dalign -m64 --- > CC = cc > CFLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium > F77FLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium 22d22 < #LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lsunperf -lm -lsunmath 30c30 < LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lm --- > LIB = -L/opt/studio12/SUNWspro/lib/amd64 -R/opt/studio12/SUNWspro/lib/amd64 -lsunperf -lm -lpicl -lmtsk * 64 bits uniquement * il y a un bug dans les programmes de test, corrige dans umf4hb64.f diff /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f-ori /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f 331c331 < $ n, nz, Ap (n+1), Ai (n), j, i, p --- > $ n, nz, Ap (n+1), Ai (nz), j, i, p ====== Tests ====== ===== en C ===== * prendre le source [[http://iusti.polytech.univ-mrs.fr/~jobic/dokuwiki/doku.php?id=librairies_installees&#umfpack|ici]] module load ss12 cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf ou module load ss12u1 cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf -lm ===== en fortran ===== * prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo f90 -o umf4hb64.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -c umf4hb64.f cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -o umf4hb64 umf4hb64.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf et (les matrices de test sont dans le répertoire Demo/HB: ./umf4hb64 < arc130.rua Matrix key: ARC130 UMFPACK V4.4 (Jan. 28, 2005), Control: Matrix entry defined as: double Int (generic integer) defined as: long 0: print level: 2 1: dense row parameter: 0.2 "dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries) 2: dense column parameter: 0.2 "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries) 3: pivot tolerance: 0.1 4: block size for dense matrix kernels: 32 5: strategy: 0 (auto) 6: initial allocation ratio: 0.7 7: max iterative refinement steps: 2 12: 2-by-2 pivot tolerance: 0.01 13: Q fixed during numerical factorization: 0 (auto) 14: AMD dense row/col parameter: 10 "dense" rows/columns have > max (16, (10)*sqrt(n)) entries Only used if the AMD ordering is used. 15: diagonal pivot tolerance: 0.001 Only used if diagonal pivoting is attempted. 16: scaling: 1 (divide each row by sum of abs. values in each row) 17: frontal matrix allocation ratio: 0.5 18: drop tolerance: 0 19: AMD and COLAMD aggressive absorption: 1 (yes) The following options can only be changed at compile-time: 8: BLAS library used: Sun Performance Library BLAS. 9: compiled for ANSI C (uses malloc, free, realloc, and printf) 10: CPU timer is POSIX times ( ) routine. 11: compiled for normal operation (debugging disabled) computer/operating system: Sun Solaris size of int: 4 long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes) symbolic analysis: status: 0. time: 0.00E+00 (sec) estimates (upper bound) for numeric LU: size of LU: 0.14 (MB) memory needed: 0.29 (MB) flop count: 0.94E+05 nnz (L): 1009. nnz (U): 7849. numeric factorization: status: 0. time: 0.00E+00 actual numeric LU statistics: size of LU: 0.02 (MB) memory needed: 0.11 (MB) flop count: 0.42E+04 nnz (L): 417. nnz (U): 787. UMFPACK V4.4 (Jan. 28, 2005), Info: matrix entry defined as: double Int (generic integer) defined as: long BLAS library used: Sun Performance Library BLAS. MATLAB: no. CPU timer: POSIX times ( ) routine. number of rows in matrix A: 130 number of columns in matrix A: 130 entries in matrix A: 1282 memory usage reported in: 16-byte Units size of int: 4 bytes size of long: 8 bytes size of pointer: 8 bytes size of numerical entry: 8 bytes strategy used: symmetric ordering used: amd on A+A' modify Q during factorization: no prefer diagonal pivoting: yes pivots with zero Markowitz cost: 6 submatrix S after removing zero-cost pivots: number of "dense" rows: 7 number of "dense" columns: 0 number of empty rows: 0 number of empty columns 0 submatrix S square and diagonal preserved pattern of square submatrix S: number rows and columns 124 symmetry of nonzero pattern: 0.841193 nz in S+S' (excl. diagonal): 1204 nz on diagonal of matrix S: 124 fraction of nz on diagonal: 1.000000 AMD statistics, for strict diagonal pivoting: est. flops for LU factorization: 8.27000e+03 est. nz in L+U (incl. diagonal): 1336 est. largest front (# entries): 324 est. max nz in any column of L: 18 number of "dense" rows/columns in S+S': 2 symbolic factorization defragmentations: 0 symbolic memory usage (Units): 4690 symbolic memory usage (MBytes): 0.1 Symbolic size (Units): 633 Symbolic size (MBytes): 0 symbolic factorization CPU time (sec): 0.00 symbolic factorization wallclock time(sec): 0.00 matrix scaled: yes (divided each row by sum of abs values in each row) minimum sum (abs (rows of A)): 7.94859e-01 maximum sum (abs (rows of A)): 1.08460e+06 symbolic/numeric factorization: upper bound actual % variable-sized part of Numeric object: initial size (Units) 4013 3870 96% peak size (Units) 16281 4884 30% final size (Units) 8566 596 7% Numeric final size (Units) 9317 1282 14% Numeric final size (MBytes) 0.1 0.0 14% peak memory usage (Units) 18734 7337 39% peak memory usage (MBytes) 0.3 0.1 39% numeric factorization flops 9.41610e+04 4.20900e+03 4% nz in L (incl diagonal) 1009 417 41% nz in U (incl diagonal) 7849 787 10% nz in L+U (incl diagonal) 8728 1074 12% largest front (# entries) 2337 270 12% largest # rows in front 19 18 95% largest # columns in front 123 15 12% initial allocation ratio used: 0.36 # of forced updates due to frontal growth: 0 number of off-diagonal pivots: 0 nz in L (incl diagonal), if none dropped 417 nz in U (incl diagonal), if none dropped 796 number of small entries dropped 9 nonzeros on diagonal of U: 130 min abs. value on diagonal of U: 9.22e-07 max abs. value on diagonal of U: 1.00e+00 estimate of reciprocal of condition number: 9.22e-07 indices in compressed pattern: 74 numerical values stored in Numeric object: 979 numeric factorization defragmentations: 1 numeric factorization reallocations: 1 costly numeric factorization reallocations: 0 numeric factorization CPU time (sec): 0.00 numeric factorization wallclock time (sec): 0.05 numeric factorization mflops (wallclock): 0.08 symbolic + numeric CPU time (sec): 0.00 symbolic + numeric wall clock time (sec): 0.05 symbolic + numeric mflops (wall clock): 0.08 solve flops: 2.14800e+03 iterative refinement steps taken: 0 iterative refinement steps attempted: 0 solve CPU time (sec): 0.00 solve wall clock time (sec): 0.00 total symbolic + numeric + solve flops: 6.35700e+03 total symbolic + numeric + solve CPU time: 0.00 total symbolic+numeric+solve wall clock time: 0.05 total symbolic+numeric+solve mflops(wallclock) 0.13 norm (A*x-b): 1.8917489796876907E-10 norm (A*x-b): 5.838675376512725E-10 norm (A*x-b): 5.838675376512725E-10