Différences
Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente Prochaine révisionLes deux révisions suivantes | ||
umfpack [2009/11/18 18:56] – gerard | umfpack [2009/12/04 09:54] – gerard | ||
---|---|---|---|
Ligne 12: | Ligne 12: | ||
* / | * / | ||
* / | * / | ||
+ | * / | ||
+ | | ||
+ | |||
+ | Utilisation de umfpack dans un code fortran: | ||
+ | * umfpack est écrit en C | ||
+ | * il existe une interface fortran 77, utilisable en fortran 90 | ||
+ | * prendre pour exemple le fichier Demo/ | ||
+ | * dans son fichier, ajouter les lignes suivantes: | ||
+ | |||
+ | < | ||
+ | call umf4def (control) ! met les parametres par defauts | ||
+ | control (1) = 1 | ||
+ | call umf4pcon (control) | ||
+ | call umf4sym (N, N, Ap, Ai, Ax, symbolic, control, info) ! | ||
+ | call umf4num (Ap, Ai, Ax, symbolic, numeric, control, info) | ||
+ | call umf4fsym (symbolic) | ||
+ | call umf4sol (sys, x, RHSV, numeric, control, info) | ||
+ | call umf4fnum (numeric) | ||
+ | call umf4pinf (control, info) | ||
+ | </ | ||
+ | Pour cela, il faut bien entendu lier votre programme avec le programme umf4_f77wrapper.c, | ||
+ | < | ||
+ | cc -o umf4_f77wrapper.o -DDLONG -m64 -I/ | ||
+ | f90 -o poisson3d_umfpack.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -c poisson3d_umfpack.f90 | ||
+ | f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -o poisson3d_umfpack poisson3d_umfpack.o umf4_f77wrapper.o / | ||
+ | |||
+ | </ | ||
====== Config ====== | ====== Config ====== | ||
* on édite Make.include et Make.solaris (voir lien sur Make.solaris_amd64) avant de compiler | * on édite Make.include et Make.solaris (voir lien sur Make.solaris_amd64) avant de compiler | ||
+ | < | ||
+ | diff / | ||
+ | 50c50 | ||
+ | < CONFIG = -DNBLAS | ||
+ | --- | ||
+ | > CONFIG = | ||
+ | 63c63 | ||
+ | < # include ../ | ||
+ | --- | ||
+ | > include ../ | ||
+ | |||
+ | </ | ||
+ | et | ||
+ | < | ||
+ | diff / | ||
+ | 5a6 | ||
+ | > | ||
+ | 11,13c12,14 | ||
+ | < CC = cc | ||
+ | < CFLAGS = -Xc -xO5 -KPIC -dalign -xtarget=generic64 | ||
+ | < F77FLAGS = -xO5 -KPIC -dalign -m64 | ||
+ | --- | ||
+ | > CC = cc | ||
+ | > CFLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium | ||
+ | > F77FLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium | ||
+ | 22d22 | ||
+ | < #LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lsunperf -lm -lsunmath | ||
+ | 30c30 | ||
+ | < LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lm | ||
+ | --- | ||
+ | > LIB = -L/ | ||
+ | |||
+ | </ | ||
+ | |||
* 64 bits uniquement | * 64 bits uniquement | ||
+ | * il y a un bug dans les programmes de test, corrige dans umf4hb64.f | ||
+ | < | ||
+ | diff / | ||
+ | 331c331 | ||
+ | < $ n, nz, Ap (n+1), Ai (n), j, i, p | ||
+ | --- | ||
+ | > $ n, nz, Ap (n+1), Ai (nz), j, i, p | ||
+ | </ | ||
+ | |||
+ | ====== Tests ====== | ||
+ | ===== en C ===== | ||
+ | |||
+ | * prendre le source [[http:// | ||
+ | < | ||
+ | module load ss12 | ||
+ | cc -o umfpack_simple -m64 umfpack_simple.c -I/ | ||
+ | </ | ||
+ | ou | ||
+ | < | ||
+ | module load ss12u1 | ||
+ | cc -o umfpack_simple -m64 umfpack_simple.c -I/ | ||
+ | </ | ||
+ | |||
+ | ===== en fortran ===== | ||
+ | * prendre pour exemple le fichier Demo/ | ||
+ | < | ||
+ | f90 -o umf4hb64.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -c umf4hb64.f | ||
+ | cc -o umf4_f77wrapper.o -DDLONG -m64 -I/ | ||
+ | f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -o umf4hb64 umf4hb64.o umf4_f77wrapper.o / | ||
+ | </ | ||
+ | et (les matrices de test sont dans le répertoire Demo/HB: | ||
+ | < | ||
+ | ./umf4hb64 < arc130.rua | ||
+ | | ||
+ | |||
+ | UMFPACK V4.4 (Jan. 28, 2005), Control: | ||
+ | |||
+ | Matrix entry defined as: double | ||
+ | Int (generic integer) defined as: long | ||
+ | |||
+ | 0: print level: 2 | ||
+ | 1: dense row parameter: | ||
+ | " | ||
+ | 2: dense column parameter: 0.2 | ||
+ | " | ||
+ | 3: pivot tolerance: 0.1 | ||
+ | 4: block size for dense matrix kernels: 32 | ||
+ | 5: strategy: 0 (auto) | ||
+ | 6: initial allocation ratio: 0.7 | ||
+ | 7: max iterative refinement steps: 2 | ||
+ | 12: 2-by-2 pivot tolerance: 0.01 | ||
+ | 13: Q fixed during numerical factorization: | ||
+ | 14: AMD dense row/col parameter: | ||
+ | " | ||
+ | Only used if the AMD ordering is used. | ||
+ | 15: diagonal pivot tolerance: 0.001 | ||
+ | Only used if diagonal pivoting is attempted. | ||
+ | 16: scaling: 1 (divide each row by sum of abs. values in each row) | ||
+ | 17: frontal matrix allocation ratio: 0.5 | ||
+ | 18: drop tolerance: 0 | ||
+ | 19: AMD and COLAMD aggressive absorption: 1 (yes) | ||
+ | |||
+ | The following options can only be changed at compile-time: | ||
+ | 8: BLAS library used: Sun Performance Library BLAS. | ||
+ | 9: compiled for ANSI C (uses malloc, free, realloc, and printf) | ||
+ | 10: CPU timer is POSIX times ( ) routine. | ||
+ | 11: compiled for normal operation (debugging disabled) | ||
+ | computer/ | ||
+ | size of int: 4 long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes) | ||
+ | |||
+ | symbolic analysis: | ||
+ | | ||
+ | | ||
+ | | ||
+ | size of LU: 0.14 (MB) | ||
+ | | ||
+ | flop count: | ||
+ | nnz (L): 1009. | ||
+ | nnz (U): 7849. | ||
+ | numeric factorization: | ||
+ | | ||
+ | | ||
+ | | ||
+ | size of LU: 0.02 (MB) | ||
+ | | ||
+ | flop count: | ||
+ | nnz (L): 417. | ||
+ | nnz (U): 787. | ||
+ | |||
+ | UMFPACK V4.4 (Jan. 28, 2005), Info: | ||
+ | matrix entry defined as: double | ||
+ | Int (generic integer) defined as: long | ||
+ | BLAS library used: Sun Performance Library BLAS. | ||
+ | MATLAB: | ||
+ | CPU timer: | ||
+ | number of rows in matrix A: 130 | ||
+ | number of columns in matrix A: 130 | ||
+ | entries in matrix A: 1282 | ||
+ | memory usage reported in: | ||
+ | size of int: 4 bytes | ||
+ | size of long: 8 bytes | ||
+ | size of pointer: | ||
+ | size of numerical entry: | ||
+ | |||
+ | strategy used: symmetric | ||
+ | ordering used: amd on A+A' | ||
+ | modify Q during factorization: | ||
+ | prefer diagonal pivoting: | ||
+ | pivots with zero Markowitz cost: 6 | ||
+ | submatrix S after removing zero-cost pivots: | ||
+ | number of " | ||
+ | number of " | ||
+ | number of empty rows: 0 | ||
+ | number of empty columns | ||
+ | submatrix S square and diagonal preserved | ||
+ | pattern of square submatrix S: | ||
+ | number rows and columns | ||
+ | symmetry of nonzero pattern: | ||
+ | nz in S+S' (excl. diagonal): | ||
+ | nz on diagonal of matrix S: 124 | ||
+ | fraction of nz on diagonal: | ||
+ | AMD statistics, for strict diagonal pivoting: | ||
+ | est. flops for LU factorization: | ||
+ | est. nz in L+U (incl. diagonal): | ||
+ | est. largest front (# entries): | ||
+ | est. max nz in any column of L: 18 | ||
+ | number of " | ||
+ | symbolic factorization defragmentations: | ||
+ | symbolic memory usage (Units): | ||
+ | symbolic memory usage (MBytes): | ||
+ | Symbolic size (Units): | ||
+ | Symbolic size (MBytes): | ||
+ | symbolic factorization CPU time (sec): | ||
+ | symbolic factorization wallclock time(sec): | ||
+ | |||
+ | matrix scaled: yes (divided each row by sum of abs values in each row) | ||
+ | minimum sum (abs (rows of A)): 7.94859e-01 | ||
+ | maximum sum (abs (rows of A)): 1.08460e+06 | ||
+ | |||
+ | symbolic/ | ||
+ | variable-sized part of Numeric object: | ||
+ | initial size (Units) | ||
+ | peak size (Units) | ||
+ | final size (Units) | ||
+ | Numeric final size (Units) | ||
+ | Numeric final size (MBytes) | ||
+ | peak memory usage (Units) | ||
+ | peak memory usage (MBytes) | ||
+ | numeric factorization flops 9.41610e+04 | ||
+ | nz in L (incl diagonal) | ||
+ | nz in U (incl diagonal) | ||
+ | nz in L+U (incl diagonal) | ||
+ | largest front (# entries) | ||
+ | largest # rows in front | ||
+ | largest # columns in front | ||
+ | |||
+ | initial allocation ratio used: 0.36 | ||
+ | # of forced updates due to frontal growth: | ||
+ | number of off-diagonal pivots: | ||
+ | nz in L (incl diagonal), if none dropped | ||
+ | nz in U (incl diagonal), if none dropped | ||
+ | number of small entries dropped | ||
+ | nonzeros on diagonal of U: 130 | ||
+ | min abs. value on diagonal of U: | ||
+ | max abs. value on diagonal of U: | ||
+ | estimate of reciprocal of condition number: | ||
+ | indices in compressed pattern: | ||
+ | numerical values stored in Numeric object: | ||
+ | numeric factorization defragmentations: | ||
+ | numeric factorization reallocations: | ||
+ | costly numeric factorization reallocations: | ||
+ | numeric factorization CPU time (sec): | ||
+ | numeric factorization wallclock time (sec): | ||
+ | numeric factorization mflops (wallclock): | ||
+ | symbolic + numeric CPU time (sec): | ||
+ | symbolic + numeric wall clock time (sec): | ||
+ | symbolic + numeric mflops (wall clock): | ||
+ | solve flops: | ||
+ | iterative refinement steps taken: | ||
+ | iterative refinement steps attempted: | ||
+ | solve CPU time (sec): | ||
+ | solve wall clock time (sec): | ||
+ | total symbolic + numeric + solve flops: | ||
+ | total symbolic + numeric + solve CPU time: 0.00 | ||
+ | total symbolic+numeric+solve wall clock time: 0.05 | ||
+ | total symbolic+numeric+solve mflops(wallclock) 0.13 | ||
+ | norm (A*x-b): | ||
+ | norm (A*x-b): | ||
+ | norm (A*x-b): | ||
+ | </ |