====== UMFPACK ======
* [[http://www.cise.ufl.edu/research/sparse/umfpack/]]
====== Installation ======
^ ^ ^ ^
| nemo | 4.4 | /usr/local/UMFPACKv4.4 | compilé avec sunperflib |
| shrek | | /usr/local/UMFPACKv4.4 | compilé avec [[http://www.cs.utexas.edu/users/flame/goto|K. Goto's BLAS]] |
| octopus | 4.4 | /usr/local/UMFPACKv4.4 | compilé avec sunperflib |
====== Utilisation ======
* /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a
* /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a
* /usr/local/UMFPACKv4.4/UMFPACK/Include
* /usr/local/UMFPACKv4.4/AMD/Include
* /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/
===== Utilisation de umfpack dans un code fortran =====
:
* umfpack est écrit en C
* il existe une interface fortran 77, utilisable en fortran 90
* prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo
* dans son fichier, ajouter les lignes suivantes:
call umf4def (control) ! met les parametres par defauts
control (1) = 1
call umf4pcon (control)
call umf4sym (N, N, Ap, Ai, Ax, symbolic, control, info) ! pre-order and symbolic analysis
call umf4num (Ap, Ai, Ax, symbolic, numeric, control, info)
call umf4fsym (symbolic)
call umf4sol (sys, x, RHSV, numeric, control, info)
call umf4fnum (numeric)
call umf4pinf (control, info)
Pour cela, il faut bien entendu lier votre programme avec le programme umf4_f77wrapper.c, de la facon suivante:
cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c
f90 -o poisson3d_umfpack.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -c poisson3d_umfpack.f90
f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -o poisson3d_umfpack poisson3d_umfpack.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf
====== Config ======
* on édite Make.include et Make.solaris (voir lien sur Make.solaris_amd64) avant de compiler
diff /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include-ori /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include
50c50
< CONFIG = -DNBLAS
---
> CONFIG =
63c63
< # include ../Make/Make.solaris
---
> include ../Make/Make.solaris
et
diff /local2/fboyer/UMFPACKv4.4/UMFPACK/Make/Make.solaris /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.solaris
5a6
>
11,13c12,14
< CC = cc
< CFLAGS = -Xc -xO5 -KPIC -dalign -xtarget=generic64
< F77FLAGS = -xO5 -KPIC -dalign -m64
---
> CC = cc
> CFLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium
> F77FLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium
22d22
< #LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lsunperf -lm -lsunmath
30c30
< LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lm
---
> LIB = -L/opt/studio12/SUNWspro/lib/amd64 -R/opt/studio12/SUNWspro/lib/amd64 -lsunperf -lm -lpicl -lmtsk
* 64 bits uniquement
* il y a un bug dans les programmes de test, corrige dans umf4hb64.f
diff /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f-ori /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f
331c331
< $ n, nz, Ap (n+1), Ai (n), j, i, p
---
> $ n, nz, Ap (n+1), Ai (nz), j, i, p
====== Tests ======
===== en C =====
* prendre le source [[http://iusti.polytech.univ-mrs.fr/~jobic/dokuwiki/doku.php?id=librairies_installeesumfpack|ici]]
module load ss12
cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf
ou
module load ss12u1
cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf -lm
===== en fortran =====
* prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo
f90 -o umf4hb64.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -c umf4hb64.f
cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c
f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -o umf4hb64 umf4hb64.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf
et (les matrices de test sont dans le répertoire Demo/HB:
./umf4hb64 < arc130.rua
Matrix key: ARC130
UMFPACK V4.4 (Jan. 28, 2005), Control:
Matrix entry defined as: double
Int (generic integer) defined as: long
0: print level: 2
1: dense row parameter: 0.2
"dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries)
2: dense column parameter: 0.2
"dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
3: pivot tolerance: 0.1
4: block size for dense matrix kernels: 32
5: strategy: 0 (auto)
6: initial allocation ratio: 0.7
7: max iterative refinement steps: 2
12: 2-by-2 pivot tolerance: 0.01
13: Q fixed during numerical factorization: 0 (auto)
14: AMD dense row/col parameter: 10
"dense" rows/columns have > max (16, (10)*sqrt(n)) entries
Only used if the AMD ordering is used.
15: diagonal pivot tolerance: 0.001
Only used if diagonal pivoting is attempted.
16: scaling: 1 (divide each row by sum of abs. values in each row)
17: frontal matrix allocation ratio: 0.5
18: drop tolerance: 0
19: AMD and COLAMD aggressive absorption: 1 (yes)
The following options can only be changed at compile-time:
8: BLAS library used: Sun Performance Library BLAS.
9: compiled for ANSI C (uses malloc, free, realloc, and printf)
10: CPU timer is POSIX times ( ) routine.
11: compiled for normal operation (debugging disabled)
computer/operating system: Sun Solaris
size of int: 4 long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
symbolic analysis:
status: 0.
time: 0.00E+00 (sec)
estimates (upper bound) for numeric LU:
size of LU: 0.14 (MB)
memory needed: 0.29 (MB)
flop count: 0.94E+05
nnz (L): 1009.
nnz (U): 7849.
numeric factorization:
status: 0.
time: 0.00E+00
actual numeric LU statistics:
size of LU: 0.02 (MB)
memory needed: 0.11 (MB)
flop count: 0.42E+04
nnz (L): 417.
nnz (U): 787.
UMFPACK V4.4 (Jan. 28, 2005), Info:
matrix entry defined as: double
Int (generic integer) defined as: long
BLAS library used: Sun Performance Library BLAS.
MATLAB: no.
CPU timer: POSIX times ( ) routine.
number of rows in matrix A: 130
number of columns in matrix A: 130
entries in matrix A: 1282
memory usage reported in: 16-byte Units
size of int: 4 bytes
size of long: 8 bytes
size of pointer: 8 bytes
size of numerical entry: 8 bytes
strategy used: symmetric
ordering used: amd on A+A'
modify Q during factorization: no
prefer diagonal pivoting: yes
pivots with zero Markowitz cost: 6
submatrix S after removing zero-cost pivots:
number of "dense" rows: 7
number of "dense" columns: 0
number of empty rows: 0
number of empty columns 0
submatrix S square and diagonal preserved
pattern of square submatrix S:
number rows and columns 124
symmetry of nonzero pattern: 0.841193
nz in S+S' (excl. diagonal): 1204
nz on diagonal of matrix S: 124
fraction of nz on diagonal: 1.000000
AMD statistics, for strict diagonal pivoting:
est. flops for LU factorization: 8.27000e+03
est. nz in L+U (incl. diagonal): 1336
est. largest front (# entries): 324
est. max nz in any column of L: 18
number of "dense" rows/columns in S+S': 2
symbolic factorization defragmentations: 0
symbolic memory usage (Units): 4690
symbolic memory usage (MBytes): 0.1
Symbolic size (Units): 633
Symbolic size (MBytes): 0
symbolic factorization CPU time (sec): 0.00
symbolic factorization wallclock time(sec): 0.00
matrix scaled: yes (divided each row by sum of abs values in each row)
minimum sum (abs (rows of A)): 7.94859e-01
maximum sum (abs (rows of A)): 1.08460e+06
symbolic/numeric factorization: upper bound actual %
variable-sized part of Numeric object:
initial size (Units) 4013 3870 96%
peak size (Units) 16281 4884 30%
final size (Units) 8566 596 7%
Numeric final size (Units) 9317 1282 14%
Numeric final size (MBytes) 0.1 0.0 14%
peak memory usage (Units) 18734 7337 39%
peak memory usage (MBytes) 0.3 0.1 39%
numeric factorization flops 9.41610e+04 4.20900e+03 4%
nz in L (incl diagonal) 1009 417 41%
nz in U (incl diagonal) 7849 787 10%
nz in L+U (incl diagonal) 8728 1074 12%
largest front (# entries) 2337 270 12%
largest # rows in front 19 18 95%
largest # columns in front 123 15 12%
initial allocation ratio used: 0.36
# of forced updates due to frontal growth: 0
number of off-diagonal pivots: 0
nz in L (incl diagonal), if none dropped 417
nz in U (incl diagonal), if none dropped 796
number of small entries dropped 9
nonzeros on diagonal of U: 130
min abs. value on diagonal of U: 9.22e-07
max abs. value on diagonal of U: 1.00e+00
estimate of reciprocal of condition number: 9.22e-07
indices in compressed pattern: 74
numerical values stored in Numeric object: 979
numeric factorization defragmentations: 1
numeric factorization reallocations: 1
costly numeric factorization reallocations: 0
numeric factorization CPU time (sec): 0.00
numeric factorization wallclock time (sec): 0.05
numeric factorization mflops (wallclock): 0.08
symbolic + numeric CPU time (sec): 0.00
symbolic + numeric wall clock time (sec): 0.05
symbolic + numeric mflops (wall clock): 0.08
solve flops: 2.14800e+03
iterative refinement steps taken: 0
iterative refinement steps attempted: 0
solve CPU time (sec): 0.00
solve wall clock time (sec): 0.00
total symbolic + numeric + solve flops: 6.35700e+03
total symbolic + numeric + solve CPU time: 0.00
total symbolic+numeric+solve wall clock time: 0.05
total symbolic+numeric+solve mflops(wallclock) 0.13
norm (A*x-b): 1.8917489796876907E-10
norm (A*x-b): 5.838675376512725E-10
norm (A*x-b): 5.838675376512725E-10