Différences

Ci-dessous, les différences entre deux révisions de la page.

--- umfpack [2009/11/18 18:56] – gerard
+++ umfpack [2009/12/04 09:54] – gerard
@@ Ligne 12: / Ligne 12: @@
   * /usr/local/UMFPACKv4.4/UMFPACK/Include
   * /usr/local/UMFPACKv4.4/AMD/Include
+  * /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/
+Utilisation de umfpack dans un code fortran:
+  * umfpack est écrit en C
+  * il existe une interface fortran 77, utilisable en fortran 90
+  * prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo
+  * dans son fichier, ajouter les lignes suivantes:
+<code>
+  call umf4def (control) ! met les parametres par defauts
+  control (1) = 1
+  call umf4pcon (control)
+  call umf4sym (N, N, Ap, Ai, Ax, symbolic, control, info) !       pre-order and symbolic analysis
+  call umf4num (Ap, Ai, Ax, symbolic, numeric, control, info)
+  call umf4fsym (symbolic)
+  call umf4sol (sys, x, RHSV, numeric, control, info)
+  call umf4fnum (numeric)
+  call umf4pinf (control, info)
+</code>
+Pour cela, il faut bien entendu lier votre programme avec le programme umf4_f77wrapper.c, de la facon suivante:
+<code>
+cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c
+f90 -o poisson3d_umfpack.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -c poisson3d_umfpack.f90
+f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -o poisson3d_umfpack poisson3d_umfpack.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf
+</code>
 ====== Config ======
   * on édite Make.include et Make.solaris (voir lien sur Make.solaris_amd64) avant de compiler
+<code>
+diff /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include-ori /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include
+c50
+< CONFIG = -DNBLAS
+---
+> CONFIG =
+c63
+< # include ../Make/Make.solaris
+---
+> include ../Make/Make.solaris
+</code>
+et
+<code>
+diff /local2/fboyer/UMFPACKv4.4/UMFPACK/Make/Make.solaris /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.solaris
+a6
+>
+,13c12,14
+<  CC = cc
+<  CFLAGS = -Xc -xO5 -KPIC -dalign -xtarget=generic64
+<  F77FLAGS =   -xO5 -KPIC -dalign -m64
+---
+> CC = cc
+> CFLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium
+> F77FLAGS =   -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium
+d22
+< #LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lsunperf -lm -lsunmath
+c30
+<  LIB =  -xlic_lib=sunperf -lfai -lfsu -lfui -lm
+---
+> LIB = -L/opt/studio12/SUNWspro/lib/amd64 -R/opt/studio12/SUNWspro/lib/amd64 -lsunperf -lm -lpicl -lmtsk
+</code>
   * 64 bits uniquement
+  * il y a un bug dans les programmes de test, corrige dans umf4hb64.f
+<code>
+diff  /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f-ori  /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f
+c331
+<      $      n, nz, Ap (n+1), Ai (n), j, i, p
+---
+>      $      n, nz, Ap (n+1), Ai (nz), j, i, p
+</code>
+====== Tests ======
+===== en C =====
+  * prendre le source [[http://iusti.polytech.univ-mrs.fr/~jobic/dokuwiki/doku.php?id=librairies_installees&#umfpack|ici]]
+<code>
+module load ss12
+cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf
+</code>
+ou
+<code>
+module load ss12u1
+cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf -lm
+</code>
+===== en fortran =====
+  * prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo
+<code>
+f90 -o umf4hb64.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -c umf4hb64.f
+cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c
+f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -o umf4hb64 umf4hb64.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf
+</code>
+et (les matrices de test sont dans le répertoire Demo/HB:
+<code>
+./umf4hb64 < arc130.rua
+ Matrix key: ARC130
+UMFPACK V4.4 (Jan. 28, 2005), Control:
+    Matrix entry defined as: double
+    Int (generic integer) defined as: long
+: print level: 2
+: dense row parameter:    0.2
+        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
+: dense column parameter: 0.2
+        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
+: pivot tolerance: 0.1
+: block size for dense matrix kernels: 32
+: strategy: 0 (auto)
+: initial allocation ratio: 0.7
+: max iterative refinement steps: 2
+: 2-by-2 pivot tolerance: 0.01
+: Q fixed during numerical factorization: 0 (auto)
+: AMD dense row/col parameter:    10
+       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
+        Only used if the AMD ordering is used.
+: diagonal pivot tolerance: 0.001
+        Only used if diagonal pivoting is attempted.
+: scaling: 1 (divide each row by sum of abs. values in each row)
+: frontal matrix allocation ratio: 0.5
+: drop tolerance: 0
+: AMD and COLAMD aggressive absorption: 1 (yes)
+    The following options can only be changed at compile-time:
+: BLAS library used:  Sun Performance Library BLAS.
+: compiled for ANSI C (uses malloc, free, realloc, and printf)
+: CPU timer is POSIX times ( ) routine.
+: compiled for normal operation (debugging disabled)
+    computer/operating system: Sun Solaris
+    size of int: 4 long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
+symbolic analysis:
+   status:     0.
+   time:      0.00E+00 (sec)
+   estimates (upper bound) for numeric LU:
+   size of LU:          0.14 (MB)
+   memory needed:       0.29 (MB)
+   flop count:      0.94E+05
+   nnz (L):            1009.
+   nnz (U):            7849.
+numeric factorization:
+   status:     0.
+   time:      0.00E+00
+   actual numeric LU statistics:
+   size of LU:          0.02 (MB)
+   memory needed:       0.11 (MB)
+   flop count:      0.42E+04
+   nnz (L):             417.
+   nnz (U):             787.
+UMFPACK V4.4 (Jan. 28, 2005), Info:
+    matrix entry defined as:          double
+    Int (generic integer) defined as: long
+    BLAS library used:                Sun Performance Library BLAS.
+    MATLAB:                           no.
+    CPU timer:                        POSIX times ( ) routine.
+    number of rows in matrix A:       130
+    number of columns in matrix A:    130
+    entries in matrix A:              1282
+    memory usage reported in:         16-byte Units
+    size of int:                      4 bytes
+    size of long:                     8 bytes
+    size of pointer:                  8 bytes
+    size of numerical entry:          8 bytes
+    strategy used:                    symmetric
+    ordering used:                    amd on A+A'
+    modify Q during factorization:    no
+    prefer diagonal pivoting:         yes
+    pivots with zero Markowitz cost:               6
+    submatrix S after removing zero-cost pivots:
+        number of "dense" rows:                    7
+        number of "dense" columns:                 0
+        number of empty rows:                      0
+        number of empty columns                    0
+        submatrix S square and diagonal preserved
+    pattern of square submatrix S:
+        number rows and columns                    124
+        symmetry of nonzero pattern:               0.841193
+        nz in S+S' (excl. diagonal):               1204
+        nz on diagonal of matrix S:                124
+        fraction of nz on diagonal:                1.000000
+    AMD statistics, for strict diagonal pivoting:
+        est. flops for LU factorization:           8.27000e+03
+        est. nz in L+U (incl. diagonal):           1336
+        est. largest front (# entries):            324
+        est. max nz in any column of L:            18
+        number of "dense" rows/columns in S+S':    2
+    symbolic factorization defragmentations:       0
+    symbolic memory usage (Units):                 4690
+    symbolic memory usage (MBytes):                0.1
+    Symbolic size (Units):                         633
+    Symbolic size (MBytes):                        0
+    symbolic factorization CPU time (sec):         0.00
+    symbolic factorization wallclock time(sec):    0.00
+    matrix scaled: yes (divided each row by sum of abs values in each row)
+    minimum sum (abs (rows of A)):              7.94859e-01
+    maximum sum (abs (rows of A)):              1.08460e+06
+    symbolic/numeric factorization:      upper bound               actual      %
+    variable-sized part of Numeric object:
+        initial size (Units)                    4013                 3870    96%
+        peak size (Units)                      16281                 4884    30%
+        final size (Units)                      8566                  596     7%
+    Numeric final size (Units)                  9317                 1282    14%
+    Numeric final size (MBytes)                  0.1                  0.0    14%
+    peak memory usage (Units)                  18734                 7337    39%
+    peak memory usage (MBytes)                   0.3                  0.1    39%
+    numeric factorization flops          9.41610e+04          4.20900e+03     4%
+    nz in L (incl diagonal)                     1009                  417    41%
+    nz in U (incl diagonal)                     7849                  787    10%
+    nz in L+U (incl diagonal)                   8728                 1074    12%
+    largest front (# entries)                   2337                  270    12%
+    largest # rows in front                       19                   18    95%
+    largest # columns in front                   123                   15    12%
+    initial allocation ratio used:                 0.36
+    # of forced updates due to frontal growth:     0
+    number of off-diagonal pivots:                 0
+    nz in L (incl diagonal), if none dropped       417
+    nz in U (incl diagonal), if none dropped       796
+    number of small entries dropped                9
+    nonzeros on diagonal of U:                     130
+    min abs. value on diagonal of U:               9.22e-07
+    max abs. value on diagonal of U:               1.00e+00
+    estimate of reciprocal of condition number:    9.22e-07
+    indices in compressed pattern:                 74
+    numerical values stored in Numeric object:     979
+    numeric factorization defragmentations:        1
+    numeric factorization reallocations:           1
+    costly numeric factorization reallocations:    0
+    numeric factorization CPU time (sec):          0.00
+    numeric factorization wallclock time (sec):    0.05
+    numeric factorization mflops (wallclock):      0.08
+    symbolic + numeric CPU time (sec):             0.00
+    symbolic + numeric wall clock time (sec):      0.05
+    symbolic + numeric mflops (wall clock):        0.08
+    solve flops:                                   2.14800e+03
+    iterative refinement steps taken:              0
+    iterative refinement steps attempted:          0
+    solve CPU time (sec):                          0.00
+    solve wall clock time (sec):                   0.00
+    total symbolic + numeric + solve flops:        6.35700e+03
+    total symbolic + numeric + solve CPU time:     0.00
+    total symbolic+numeric+solve wall clock time:  0.05
+    total symbolic+numeric+solve mflops(wallclock) 0.13
+ norm (A*x-b):  1.8917489796876907E-10
+ norm (A*x-b):  5.838675376512725E-10
+ norm (A*x-b):  5.838675376512725E-10
+</code>