Interfacing Python, C and Fortran

Motivations

  • Why ? we want :
    • fast development time, and doing experiments must be easy (usability and expressiveness)
    • fast execution time → we wait less, we can do more experiments (efficiency, speed)
  • How ?
    • high level languages and/or "interpreted languages" : faster development, but execution could be slow
    • low-level languages and/or compiled languages : faster execution, but
  • Solutions :
    • use a «compiled language» with has an REPL : D, Ocaml, (C++ with cling ?)
    • use a «dynamic langage» with an JIT compiler : Julia, LuaJit, Pyhton with Numba, Javascript
    • write interfaces between a pair of languages : e.g. Python/C

Gratuitous polemic : summing integers

  • C :wc -c → 110 bytes, CPU time : .001995s
    #include <stdio.h>
    int main() { double s = 0; for(int i = 0; i < 1000000 ; ++i)  s+= i ; printf("%f\n",s) ; }
    
  • Fortran with loop : wc -c → 110 bytes, CPU time : .001972s
    program z 
      real(8) :: x = 0 
      do concurrent (i=1:1000000)
       x = x + i
      end do
      print *, x
    end
    
  • Fortran without loop:wc -c → 50 bytes, CPU time : .004939s
    program z 
    print*,sum([(i*1.d0,i=1,1000000)])
    end
    
  • Python : wc -c → 27 bytes, CPU time : .033916s or 0.0158s
    print(sum(range(1000001)))
    
  • Julia : wc -c → 23 bytes, CPU time : .143400s or 3.416 μs (in the REPL)
    print(sum(1:1000_000))
    
  • D : wc -c → 110 bytes, CPU time : .005992s
    import std.algorithm;
    import std.range;
    import std.stdio;
    void main(){
      writeln(sum(iota(1.,1000000.)));
    }
    

Outline

  • Python C bindings:
    • cffi
    • Swig
    • cython
  • F2py
  • write interfaces between Fortran and C using ISO C bindings

CFFI

  • simple , lightweight and dynamic (using a dlopen )
  • interface between C and other langages
  • no C++ support
  • ABI mode use binary library module
  • API mode need a compiler
  • each C function must be declared in Python (cdef)

CFFI example

  • c file
    double mysum(int m, int n, double *x)
    {
      double s = 0.;
      int i, j;
      for(i = 0; i < m; ++i) for(j = 0; j < n; ++j) s += x[i * n + j];
      return s;
    }
  • Python
(...)
ffi = FFI()
ffi.cdef("double mysum(int m, int n, double * x);")
ffi_h = ffi.dlopen("./mysum.so")
hilb = 1. / (arange(1., 5.)[:, None] + arange(1., 5.)[None, :] - 1.)
p_hilb = ffi.cast("double *", hilb.ctypes.data) # or use ffi.from_buffer(hilb))
print(ffi_h.mysum(*hilb.shape, p_hilb))
  • in code/cffi, run the file compile_and_use_mysum.py

SWIG

  • generic language interface generator : c, ocaml, c++, D, ocaml, lua, R, guile
  • we must write interfaces files ".i" in a macro language
  • used in the PyTrilinos project
  • we can include a header in a global manner

SWIG Example

  • c file
int sum_int(const int n) { (...) }

double sum_exp(const int n) { (...)  }

double norm_vect(const int n, const double * x) { (...) }
  • interface file
%module my_funcs
%{
   #define SWIG_FILE_WITH_INIT
   #include "my_funcs.h"
%}
%include "numpy.i"
%init %{
import_array();
%}

%apply( int DIM1,  double * IN_ARRAY1 ) {(const int n, const double * x) };
%include "my_funcs.h"
  • to use Numpy, we need of the file numpy.i - which is generally included in your numpy distribution - : this header defines macros like IN_ARRAY1, INPLACE_ARRAY1, ARGOUT_ARRAY1

SWIG Example II

  • calling Python code
import sys, os
sys.path.append(os.getcwd()+"/build")
from my_funcs import *
sqr_int = sum_int(10)
print(sum_exp(10))
from numpy import *
x = arange(10, dtype=float)
norm_x=norm_vect(x)
assert abs(norm_x**2-sqr_int) < 1e-12
  • we use CMake for building the interface. SWIG will generate a wrapper C file and a python module file : the CMake FindPython module defines a useful macro Python_add_library to compile the source and the wrapper
mkdir -p build && cd build && cmake .. && make && cd .. 
./use_my_func.py

Cython

  • fork of the Pyrex project : convert Python code to C
  • main usage : used to accelerate your code
  • write directly your code in an extended Python with some new keywords : cdef, decorators
  • Cython can annotate your code to see the interactions with Python
  • you can use it directly the notebook with %%cython (do no fortget to load extension %load_ext Cython

REPL example

In [6]:
def f(x):
    s = 0
    for i in range(x):
        for j in range(x):
            for k in range(x):
                s+= i+j+k
    return s
%timeit f(100)
54.6 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [9]:
%%cython
def f2(x):
    s = 0
    for i in range(x):
        for j in range(x):
            for k in range(x):
                s+= i+j+k
    return s
In [10]:
%timeit f2(100)
34.8 ms ± 925 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [15]:
%%cython
def f3(x):
    cdef int i,j,k, s = 0
    for i in range(x):
        for j in range(x):
            for k in range(x):
                s+= i+j+k
    return s
In [16]:
%timeit f3(100)
243 µs ± 9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Heat example

  • Python kernel
    def K2D_loop_slow(X):
      m, n = X.shape
      H = zeros((m, n), dtype=float64)
      for i in range(1, m-1):
          for j in range(1, n-1): H[i, j] = 4. * X[i, j] - X[i-1,  j] - X[i+1, j] - X[i, j-1] - X[i, j+1]
      return H
    
  • cython kernel
@cython.boundscheck(False)
@cython.wraparound(False)
def K2D_para(c_np.ndarray[double, ndim=2] X):

    cdef Py_ssize_t m = X.shape[0]
    cdef Py_ssize_t n = X.shape[1]
    cdef double[:,:] H_view
    cdef int i, j
    H = np.zeros((m,n))
    H_view = H
    for i in prange(1,m-1,nogil=True):
        for j in range(1,n-1):
            H_view[i,j] = 4. * X[i,j] - X[i-1, j] -X[i+1,j] - X[i, j-1] -X[i, j+1]
    return H
  • notice the use of prange with nogil keywords. It is a openmp range based iteration

Heat Example II

  • to build the heat example, you could use cmake
    mkdir -p build && cd build && cmake .. && make && cd ..
    LD_PRELOAD="build/heat.so" PYTHONPATH="./build" ./use_heat.py
    
  • On my computer , I obtained
algorithm time(s)
Cython // 5.953e-05s
Numpy 1.031e-04s
Cython 1.166e-04s
loop 1.254e-02s
  • The cython version is clearly better than the Python loop version, but with a vectorial numpy version we do better :-(

F2py

  • interface generator for Fortran code
  • generate also documentation
  • %%fortranmagic command in the REPL (pip install --user fortran-magic)
  • could generate signature file (-h option)

REPL Example

In [22]:
%load_ext fortranmagic
The fortranmagic extension is already loaded. To reload it, use:
  %reload_ext fortranmagic
In [34]:
%%fortran
subroutine sumf(f, s, x, n)
    real, intent(out) ::s
    real, dimension( 1:n ), intent(in) :: x
    external f
    integer, intent(in) :: n
    s=0.0
    do i=1,n
    s=s + f( x(i) )
    end do
end subroutine sumf
In [37]:
from numpy import arange
sumf(lambda x : x, arange(11.))
Out[37]:
55.0

Heat Example

  • compile the heat kernel, rename it and run
    f2py -m heat -c heat.f90
    ln -s *.so heat.so
    PYTHONPATH="."  ./use_heat.py
    
  • on my laptop, it gives the following
subroutine time(s)
kernel 5.271e-03s
kernel_loop 5.109e-03s
kernel_numpy 1.131e-02s
In [9]:
! cat codes/f2py/heat.f90 
subroutine kernel(u_in, u_out, m, n ) 

    implicit none
    integer, intent(in) :: m, n
    real(8), dimension( 1:m, 1:n ), intent(in) :: u_in
    real(8), dimension( 1:m, 1:n ), intent(out) :: u_out

    u_out(2:m-1,2:n-1) = 4.d0 * u_in(2:m-1, 2:n-1) &
                              - u_in(1:m-2, 2:n-1) &
                              - u_in(3:m, 2:n-1)   &
                              - u_in(2:m-1,1:n-2)  &
                              - u_in(2:m-1,3:n)

end subroutine kernel

subroutine kernel_loop(u_in,  u_out,  m, n ) 

    implicit none
    integer, intent(in) :: m, n
    real(8), dimension( 1:m, 1:n ), intent(in) :: u_in
    real(8), dimension( 1:m, 1:n ), intent(out) :: u_out
    integer :: i,j

    do concurrent(j=2:n-1)
       do concurrent(i=2:m-1)
    u_out(i, j) = 4.d0 * u_in(i, j) - u_in(i-1, j) - u_in(i, j-1) &
                                       - u_in(i+1, j) - u_in(i, j+1)
       end do
    end do

end subroutine kernel_loop

Interfaces between C and fortran

  • Why ?
    • lots of good and robust math libraries are written in fortran (lapack, blas, odepack)
    • fortran lacks of system functions
  • How ? With modern fortran versiosn(95, 2003, 2008), interfacing between C and fortran has been standardize
  • In a rough way :
    • use bind(C, name="name_in_C_of_the function")
    • do not forget to put a value attribute in fortran signature if the argument is passed by value
    • pointers arguments are normal for Fortran

Heat Example (Not Again !!!) : when Fortran calls C

  • in the Fortan calling program we need a interface
interface 
     subroutine heat_c(m, n, X_in, X_out) bind(c, name='heat_c')
         use, intrinsic :: iso_c_binding
         integer(c_int), value :: m,n
         real(c_double)  :: X_in(*), X_out(*)
     end subroutine heat_c
   end interface
  • for the following C function
void heat_c( int m, int n, const double *u_in, double *u_out)
  • Please, notice
    • the value attribute for m and n
    • the c_int and c_double kind qualifiers which are defined in the iso_c_binding Fortran module

C calls Fortran

  • the following Fortran subroutine

    subroutine kernel(m, n, u_in,  u_out) bind( C, name="heat_fortran" )
    
          implicit none
          integer(c_int32_t), intent(in), value :: m, n
          real(c_double), dimension( 1:m, 1:n ), intent(in) :: u_in
          real(c_double), dimension( 1:m, 1:n ), intent(out) :: u_out
    
          u_out(2:m-1,2:n-1) = 4.d0 * u_in(2:m-1, 2:n-1)  - u_in(1:m-2, 2:n-1)  - u_in(3:m, 2:n-1) &
                                                          - u_in(2:m-1, 1:n-2)  - u_in(2:m-1,3:n)
    
      end subroutine kernel
    
  • will be called by a simple C call
    extern void heat_fortran(int m, int n, double * u_in, double * u_out);
    (...)
    heat_fortran(m, n, X_in, X_out);
    (...)
    
  • Please, notice again the value attribute for m,n !

Conclusion and References

  • for C :
    • use ffi for a simple solution
    • use SWIG if your have lots of headers and functios
    • use cython to accelerate your code, particularly if you have loops
  • for Fortran use f2py
  • References
    • Cython → Python High Performance Programming by Gabriele Larano
    • F2py → Interface avec le langage Fortran (fr), Pierre Navaro (on line)
    • ISO C bindings → Modern Fortran explained, Metcalf, Reid and Cohen