Skip to content

Cython Wrappers

Nicholas McKibben edited this page Jul 31, 2020 · 5 revisions

A Few Notes On Cython

Written 2020/03/28, Nicholas McKibben

Motivation

So you found a C/C++ library that you think would be really nice to call from Python. You have a few options:

  1. Rewrite the fool thing in Python
  2. Wrap the library using ctypes
  3. Wrap the library with SWIG
  4. Use the Boost Python library
  5. Use pybind11
  6. Wrap the library using Cython

There are indeed more ways to do this, but these listed above are the most common ways to tackle this problem.

Now, you should always prefer 1. Compatibility issues almost disappear if you write it in Python. Obviously if everything were as easy as rewriting it in Python, you wouldn't be reading this document. So the problem is obviously when the library is large and/or complicated enough that it is not worth it to rewrite and maintain yourself.

ctypes (see docs) comes packaged with Python and can be great for wrapping libraries written in C (for example, see scikit-glpk). I am of the opinion that it is almost always the wrong thing to do (TM) when dealing with C++, there's just not great support. If it is really as simple as a quick ctypes wrapper, by all means, but it probably won't be.

I've never used SWIG, so if you want to go down that route please consult their documentation.

The Boost Python library is full featured and lets you do some neat things. Boost is not included in scipy, so currently it's a nonstarter.

From this issue it appears that pybind11 has portability problems and is generally not the way to go in the context of scipy.

Enter cython. It says about itself: "[Cython] makes writing C extensions for Python as easy as Python itself." Of course you should never trust what anyone or anything says about itself. It is a powerful tool and thus has some bloat and a learning curve. In the context of scipy, the bloat is already there (it's used extensively in many modules) so the only thing left is learning how to use it. ctypes will be an untenable solution in the context of scientific computing where working efficiently with numpy arrays and dynamically allocated arrays is important. Fortunately, this is where Cython shines with lots of functionality right out of the box.

Working with Cython

Please see the Cython docs. These should be your gospel texts when developing with Cython. There are often ways, varied and sundry, of working around core Cython mechanics to achieve some end, but there are no guarantees that a hack won't break in a future release or that behavior will be consistent across the code base. There is almost always a way to do what you want that is sanctioned by the docs. In the case that there isn't, resort to StackOverflow and hope for the best.

There are lots of ways to think about Cython. I think most people approach it as a way to speed their Python programs up. Often you'll find a raw increase in performance just by adding a few typed variables, especially when you have a lot of looping. Performance improvement is not the focus of this document. We are approaching Cython as a tool to call highly performant C/C++ libraries from Python.

I also find it helpful to treat Cython as another programming language itself. Python is (mostly) a subset of that language -- most things that work in Python will work in Cython. Sometimes you'll run into things that won't, for example, namedtuple doesn't work (currently). As a rule, anything that you would like to be very Python-like, should be done in Python. Cython also has elements of C, but anything you'd like to be very C/C++ should be done in C/C++. Cython has deeper roots in C than C++, but C++ support is catching up. As mentioned, Cython is very well suited for the things you want to tie together.

This guide will be opinionated. Here are some guiding principles

  • Make a PXD file for every header you want to use
  • You never have to manage memeory if you don't want to -- and you probably don't want to
  • Use your PXD files to write a single PYX file that will contain the wrapper

Headers

The external library you want to call should have its interfaces defined in header files. These should include all classes and functions.

There is a simple formula for making a PXD

  1. Make a PXD file with the same name (doesn't have to be, but makes it easier to organize). So [header].h -> [header].pxd.
  2. Place all the PXD files in the same directory with a blank __init__.pyd file. You can put them in separate directories, but pain and agony await for those that try this.
  3. Enumerate the things in the header that you need.

You don't need to copy every single thing from the headers into the PXD files, only the things you need. Cython is not able to read header files directly because parsing C++ is notoriously hard. It's for this reason that automated PXD creation tools are no good and probably unmaintained. I've had some success using Doxygen to create XML representations of classes and namespaces and parsing these to programatically construct the PXD files. This is brittle and error prone because of some Cython oddities and some C++ oddities. As scipy does not use Doxygen, I would recommend just doing it by hand.

To wrap up: it's your job to copy down anything you need into a PXD file.

Here's an example to look at and make some observations about:

from libcpp cimport bool
from libcpp.string cimport string

from AnotherPXDFile cimport CppClass

ctypedef double ret_type

cdef extern from "[header].[h|hpp]" namespace "NAMESPACE::INNER::INNER_INNER" nogil:
    ret_type function_name(size_t arg, int * another, const CppClass & thing)

cdef cppclass CppClass2:
     int member
     CppClass2(string & arg) except +
     void method(const bool flag) const

Some observations:

  • cimport is used to import Cython things
  • Some types need be imported from Cython includes (see here <https://github.com/cython/cython/tree/master/Cython/Includes)
  • There is apparently another cdef cppclass definition in AnotherPXDFile.pxd that defines CppClass. We can cimport this class definition and use it as an extension type
  • There are analogs to Python things, like def -> cdef and class -> cppclass
  • There are analogs to C and C++ things, like typedef -> ctypedef and class -> cppclass
  • cdef things cannot be used in Python, read up on usage of def and cpdef here for when you can and can't use things
  • The function or class you will import into Python should be defined as def or class, respectively
  • You can define references and pointers
  • All the things in the extern declaration have counterparts in the header file it says its from
  • There's almost no reason to not release the GIL (nogil directive) when working with external C++ library. So why not? If you are writing your own Cython functions, you might need the GIL (if you're instantiating Python objects).
  • If you're using C++, use cppclass instead of struct whenever you have struct s, I've run into issues trying to define struct`s and never run into problems calling them `cppclass es.
  • To get exception forwarding for constructors, use except + after constructors (see here)
  • Don't list destructors, Cython doesn't want them and doesn't care

Quirks

  • You don't have to have namespace (same as typing namespace "")
  • You can alias things if it helps:
cdef extern from "header.hpp" nogil: # <- no namespace if there isn't any
    cdef cppclass AliasName "Name": # <- C++ code will see it as Name, but Cython will call it as AliasName
    pass

You may need to alias things to get the desired effect. For example, consider the following header file:

To do "the right thing", you'll have to make the PXD as follows:

cdef extern from "header.h" nogil:
    cdef enum MyEnum:
    FIRST "MyEnum::FIRST"
    SECOND "MyEnum::SECOND"
    THIRD "MyEnum::THIRD"

Another instance is where you need to call the correct specialization of a function. For example, consider the following header file:

To get the desired behavior, we need to recognize that Cython is limited by its Python roots and we need to do this:

from libcpp.string cimport string
cdef extern from "header.hpp" namespace "MyNamespace" nogil:
    cdef cppclass MyClass:
    void method(string & str)
    void method_int "MyNamespace::MyClass::method" (int * not_str)

Could have renamed both of them, but just one needed to do one so Cython knows how to call each function.

Also note that a class can only be allocated on the stack if it has a nullary constructor (a constructor with no arguments). This is different than in C++ and is due to the way that Cython generates its code. You'll need to allocate a pointer instead if you need a class without a nullary constructor, i.e.:

...
cdef HasNullaryConstructor a # OK: will work
cdef NoNullaryConstructor a(arg1, arg2) # ERROR: will not work
...

See the memory section on how to allocate pointers.

Memory

Often C/C++ interfaces will require allocating memory. In scipy, it would be strange to restrict yourself to malloc and free (and their CPython counterparts) since you have a C++ compiler with modern standard features.

Always prefer unique_ptr:

This may not always work (because you need to be able to accept const & constructor args), so use the unique_ptr constructor. You may also need to use an allocator to create an array, for the trivial example of an array of `double`:

If you use a smart pointer, the memory will garbage collect itself. In case the library you are wrapping likes to control the memory, use shared_ptr. This will avoid deallocation twice if the library is trying to do it too.

In fact, most of the time you will probably not need to use smart pointers if all you need are arrays of basic types. Use and prefer memoryview s instead of unique_ptr and malloc:

cimport numpy as np
import numpy as np
cdef double[:, ::1] arr = np.empty((50, 100), dtype='double')
  • Make sure to cimport numpy to use efficient C interfaces when possible (Cython will figure this out for you)
  • Notice the [:, ::1] notation -- it means that we are in RowMajor order. [::1, :] would be Fortran order. Use np.ascontiguousarray(arr) to make sure that you are RowMajor before passing to the Cython wrapper (if you expect RowMajor order)

If you are using numpy arrays, you can get a memory view of any array. If you need the underlying pointer (to pass to a C function, for example), you have to do something a little funky because of the way Cython tries to adopt Python-first syntax:

cimport numpy as np
import numpy as np
cdef double[::1] arr = np.empty((n,), dtype='double')
double * arr_ptr = &arr[0]

If you need to dereference pointers, then prepending * will not work, use either the [0] convention or cython.operator.dereference().

The Wrapper

Now that we have the functions, classes we need, we can make the wrapper. The wrapper will be a pyx file:

from libcpp.string cimport string

from PYXFILE cimport Thing
from PYXFile2 cimport method

def wrapper(double[::1] arr, bool flag=True, pyarg=None):

    cdef string arg
    if pyarg is None:
    pyarg = b'Default arg' # <-- expects bytes (could use encode())
else:
        arg = <string?>pyarg # <-- <type> casts as `type`, `?` does explicit type check

    cdef Thing thing = method(&arr[0], flag, arg)

cdef int res = thing.get_result()

return res

Memory-safe C++ Class Wrappers

Consider the case where you want to make a Python wrapper around a C++ class, say this one:

#ifndef MyClass_h
#define MyClass_h
namespace MyNamespace {

class MyClass {
public:
    MyClass(bool flag);
    int method_I_want(int arg);
}; // MyClass
} // MyNamespace
#endif

So you might try to do the following in a Cython definition file:

# MyClass.pxd

from libcpp cimport bool
cdef extern from "MyClass.h" namespace "MyNamespace" nogil:
    cdef cppclass MyClass:
        MyClass(bool flag) except +
        int method_I_want(int arg)

But how are you going to call it from Python? You can't import a cppclass from a pxd file in the Python interpreter, so you need to make a pyx file with a class on top of it. How should you do this? A simple pattern for creating these sorts of wrappers consists of modifying the definition file to consist of a C++ version, Python version, and an owner/ptr model for memory management. A pyx file will then simply implement the interfaces we desire to use in Python.

# MyClass.pxd using owner/ptr model

from libcpp cimport bool
from libcpp.memory cimport unique_ptr
cdef extern from "MyClass.h" namespace "MyNamespace" nogil:
    cdef cppclass CppMyClass "MyNamespace::MyClass":
        CppMyClass(bool flag) except +
        int method_I_want(int arg)

# We must declare the cdef'd attributes/methods of classes in the PXD file
cdef class MyClass:
    cdef unique_ptr[CppMyClass] _owner # each instance "owns" it's own memory, auto garbage collect
    cdef CppMyClass * _ptr # how to access the underlying C++ object

###################################
# MyClass.pyx
from libcpp.memory cimport unique_ptr
from .MyClass cimport CppMyClass, MyClass

cdef class MyClass:
    def __cinit__(self, bool flag):
        # Here's the owner/ptr memory model:
        self._owner = unique_ptr[CppMyClass](new CppMyClass(flag)) # prefer make_unique, but since argument isn't const, we need call constructor; could have used allocate here
        self._ptr = self._owner.get()
    def method_I_want(self, int arg): # <- here's the wrapper
        return self._ptr.method_I_want(arg)

MyClass can now be imported by Python. Since Cython requires a nullary constructor to create objects on the stack and we don't have one, we need to allocate on the heap, i.e., use a pointer. In the owner/ptr model, we use a unique_ptr (or shared_ptr if we absolutely need to) which owns a pointer. We use the pointer to access the underlying C++ methods. Using smart pointers means we never have to worry about freeing memory -- it will happen automagically. Using a unique_ptr, we also have a clear ownership model. If we just use this pattern, it will take care of classes with nullary constructors as well (at the cost of heap allocations, of course).

Building the Wrapper

Use a distutils-style setup.py file. There are good examples of these in the Cython doc as well as out in the wild. I always refer to this one when I need a simple one.

You may also need to add the library you are wrapping as extensions in order to use them as shared libraries. An example of that can be found here.

The scipy build system will do something similar to this. Use the conventions of the modules you are developing for.