umfpack

UMFPACK

Installation

nemo 4.4 /usr/local/UMFPACKv4.4 compilé avec sunperflib
shrek /usr/local/UMFPACKv4.4 compilé avec K. Goto's BLAS
octopus 4.4 /usr/local/UMFPACKv4.4 compilé avec sunperflib

Utilisation

  • /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a
  • /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a
  • /usr/local/UMFPACKv4.4/UMFPACK/Include
  • /usr/local/UMFPACKv4.4/AMD/Include
  • /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/

:

  • umfpack est écrit en C
  • il existe une interface fortran 77, utilisable en fortran 90
  • prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo
  • dans son fichier, ajouter les lignes suivantes:
  call umf4def (control) ! met les parametres par defauts
  control (1) = 1
  call umf4pcon (control)
  call umf4sym (N, N, Ap, Ai, Ax, symbolic, control, info) !       pre-order and symbolic analysis
  call umf4num (Ap, Ai, Ax, symbolic, numeric, control, info)
  call umf4fsym (symbolic)
  call umf4sol (sys, x, RHSV, numeric, control, info)
  call umf4fnum (numeric)
  call umf4pinf (control, info)

Pour cela, il faut bien entendu lier votre programme avec le programme umf4_f77wrapper.c, de la facon suivante:

cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c
f90 -o poisson3d_umfpack.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -c poisson3d_umfpack.f90
f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -dalign -O4 -KPIC -xmodel=medium -m64 -o poisson3d_umfpack poisson3d_umfpack.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf

Config

  • on édite Make.include et Make.solaris (voir lien sur Make.solaris_amd64) avant de compiler
diff /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include-ori /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.include
50c50
< CONFIG = -DNBLAS
---
> CONFIG = 
63c63
< # include ../Make/Make.solaris
---
> include ../Make/Make.solaris

et

diff /local2/fboyer/UMFPACKv4.4/UMFPACK/Make/Make.solaris /local/apps/src/UMFPACKv4.4/UMFPACK/Make/Make.solaris
5a6
> 
11,13c12,14
<  CC = cc
<  CFLAGS = -Xc -xO5 -KPIC -dalign -xtarget=generic64
<  F77FLAGS =   -xO5 -KPIC -dalign -m64
---
> CC = cc
> CFLAGS = -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium
> F77FLAGS =   -xO5 -xdepend -DLP64 -xprefetch=auto -xprefetch_level=3 -xipo=2 -m64 -xmodel=medium
22d22
< #LIB = -xlic_lib=sunperf -lfai -lfsu -lfui -lsunperf -lm -lsunmath
30c30
<  LIB =  -xlic_lib=sunperf -lfai -lfsu -lfui -lm
---
> LIB = -L/opt/studio12/SUNWspro/lib/amd64 -R/opt/studio12/SUNWspro/lib/amd64 -lsunperf -lm -lpicl -lmtsk
  • 64 bits uniquement
  • il y a un bug dans les programmes de test, corrige dans umf4hb64.f
diff  /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f-ori  /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/umf4hb64.f
331c331
<      $      n, nz, Ap (n+1), Ai (n), j, i, p
---
>      $      n, nz, Ap (n+1), Ai (nz), j, i, p

Tests

  • prendre le source ici
module load ss12
cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf

ou

module load ss12u1
cc -o umfpack_simple -m64 umfpack_simple.c -I/usr/local/UMFPACKv4.4/UMFPACK/Include -R/usr/local/UMFPACKv4.4/UMFPACK/Lib -L/usr/local/UMFPACKv4.4/UMFPACK/Lib -R/usr/local/UMFPACKv4.4/AMD/Lib -L/usr/local/UMFPACKv4.4/AMD/Lib -lumfpack -lamd -xlic_lib=sunperf -lm
  • prendre pour exemple le fichier Demo/umf4hb64.f dans /local/apps/src/UMFPACKv4.4/UMFPACK/Demo/ sur nemo
f90 -o umf4hb64.o -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -c umf4hb64.f
cc -o umf4_f77wrapper.o -DDLONG -m64 -I/usr/local/UMFPACKv4.4/UMFPACK/Include -c umf4_f77wrapper.c
f90 -g -fast -C -e -fpp -stackvar -xcheck=init_local -fpover -ftrap=%none -Xlist -fsimple=0 -fns=no -xmodel=medium -dalign -m64 -o umf4hb64 umf4hb64.o umf4_f77wrapper.o /usr/local/UMFPACKv4.4/UMFPACK/Lib/libumfpack.a /usr/local/UMFPACKv4.4/AMD/Lib/libamd.a -xlic_lib=sunperf

et (les matrices de test sont dans le répertoire Demo/HB:

./umf4hb64 < arc130.rua
 Matrix key: ARC130                        

UMFPACK V4.4 (Jan. 28, 2005), Control:

    Matrix entry defined as: double
    Int (generic integer) defined as: long

    0: print level: 2
    1: dense row parameter:    0.2
        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
    2: dense column parameter: 0.2
        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
    3: pivot tolerance: 0.1
    4: block size for dense matrix kernels: 32
    5: strategy: 0 (auto)
    6: initial allocation ratio: 0.7
    7: max iterative refinement steps: 2
    12: 2-by-2 pivot tolerance: 0.01
    13: Q fixed during numerical factorization: 0 (auto)
    14: AMD dense row/col parameter:    10
       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
        Only used if the AMD ordering is used.
    15: diagonal pivot tolerance: 0.001
        Only used if diagonal pivoting is attempted.
    16: scaling: 1 (divide each row by sum of abs. values in each row)
    17: frontal matrix allocation ratio: 0.5
    18: drop tolerance: 0
    19: AMD and COLAMD aggressive absorption: 1 (yes)

    The following options can only be changed at compile-time:
    8: BLAS library used:  Sun Performance Library BLAS.
    9: compiled for ANSI C (uses malloc, free, realloc, and printf)
    10: CPU timer is POSIX times ( ) routine.
    11: compiled for normal operation (debugging disabled)
    computer/operating system: Sun Solaris
    size of int: 4 long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)

symbolic analysis:
   status:     0.
   time:      0.00E+00 (sec)
   estimates (upper bound) for numeric LU:
   size of LU:          0.14 (MB)
   memory needed:       0.29 (MB)
   flop count:      0.94E+05
   nnz (L):            1009.
   nnz (U):            7849.
numeric factorization:
   status:     0.
   time:      0.00E+00
   actual numeric LU statistics:
   size of LU:          0.02 (MB)
   memory needed:       0.11 (MB)
   flop count:      0.42E+04
   nnz (L):             417.
   nnz (U):             787.

UMFPACK V4.4 (Jan. 28, 2005), Info:
    matrix entry defined as:          double
    Int (generic integer) defined as: long
    BLAS library used:                Sun Performance Library BLAS.
    MATLAB:                           no.
    CPU timer:                        POSIX times ( ) routine.
    number of rows in matrix A:       130
    number of columns in matrix A:    130
    entries in matrix A:              1282
    memory usage reported in:         16-byte Units
    size of int:                      4 bytes
    size of long:                     8 bytes
    size of pointer:                  8 bytes
    size of numerical entry:          8 bytes

    strategy used:                    symmetric
    ordering used:                    amd on A+A'
    modify Q during factorization:    no
    prefer diagonal pivoting:         yes
    pivots with zero Markowitz cost:               6
    submatrix S after removing zero-cost pivots:
        number of "dense" rows:                    7
        number of "dense" columns:                 0
        number of empty rows:                      0
        number of empty columns                    0
        submatrix S square and diagonal preserved
    pattern of square submatrix S:
        number rows and columns                    124
        symmetry of nonzero pattern:               0.841193
        nz in S+S' (excl. diagonal):               1204
        nz on diagonal of matrix S:                124
        fraction of nz on diagonal:                1.000000
    AMD statistics, for strict diagonal pivoting:
        est. flops for LU factorization:           8.27000e+03
        est. nz in L+U (incl. diagonal):           1336
        est. largest front (# entries):            324
        est. max nz in any column of L:            18
        number of "dense" rows/columns in S+S':    2
    symbolic factorization defragmentations:       0
    symbolic memory usage (Units):                 4690
    symbolic memory usage (MBytes):                0.1
    Symbolic size (Units):                         633
    Symbolic size (MBytes):                        0
    symbolic factorization CPU time (sec):         0.00
    symbolic factorization wallclock time(sec):    0.00

    matrix scaled: yes (divided each row by sum of abs values in each row)
    minimum sum (abs (rows of A)):              7.94859e-01
    maximum sum (abs (rows of A)):              1.08460e+06

    symbolic/numeric factorization:      upper bound               actual      %
    variable-sized part of Numeric object:
        initial size (Units)                    4013                 3870    96%
        peak size (Units)                      16281                 4884    30%
        final size (Units)                      8566                  596     7%
    Numeric final size (Units)                  9317                 1282    14%
    Numeric final size (MBytes)                  0.1                  0.0    14%
    peak memory usage (Units)                  18734                 7337    39%
    peak memory usage (MBytes)                   0.3                  0.1    39%
    numeric factorization flops          9.41610e+04          4.20900e+03     4%
    nz in L (incl diagonal)                     1009                  417    41%
    nz in U (incl diagonal)                     7849                  787    10%
    nz in L+U (incl diagonal)                   8728                 1074    12%
    largest front (# entries)                   2337                  270    12%
    largest # rows in front                       19                   18    95%
    largest # columns in front                   123                   15    12%

    initial allocation ratio used:                 0.36
    # of forced updates due to frontal growth:     0
    number of off-diagonal pivots:                 0
    nz in L (incl diagonal), if none dropped       417
    nz in U (incl diagonal), if none dropped       796
    number of small entries dropped                9
    nonzeros on diagonal of U:                     130
    min abs. value on diagonal of U:               9.22e-07
    max abs. value on diagonal of U:               1.00e+00
    estimate of reciprocal of condition number:    9.22e-07
    indices in compressed pattern:                 74
    numerical values stored in Numeric object:     979
    numeric factorization defragmentations:        1
    numeric factorization reallocations:           1
    costly numeric factorization reallocations:    0
    numeric factorization CPU time (sec):          0.00
    numeric factorization wallclock time (sec):    0.05
    numeric factorization mflops (wallclock):      0.08
    symbolic + numeric CPU time (sec):             0.00
    symbolic + numeric wall clock time (sec):      0.05
    symbolic + numeric mflops (wall clock):        0.08

    solve flops:                                   2.14800e+03
    iterative refinement steps taken:              0
    iterative refinement steps attempted:          0
    solve CPU time (sec):                          0.00
    solve wall clock time (sec):                   0.00

    total symbolic + numeric + solve flops:        6.35700e+03
    total symbolic + numeric + solve CPU time:     0.00
    total symbolic+numeric+solve wall clock time:  0.05
    total symbolic+numeric+solve mflops(wallclock) 0.13

 norm (A*x-b):  1.8917489796876907E-10
 norm (A*x-b):  5.838675376512725E-10
 norm (A*x-b):  5.838675376512725E-10
  • umfpack.txt
  • Dernière modification : 2017/08/25 09:56
  • de 127.0.0.1