Thursday, January 8, 2015

Rvalue references and function overloading.

No matter how many articles get written about rvalue references, I still get the impression that many people don't understand them fully. Some belief that rvalues work differently as template parameters than they do in non-template contexts, which requires special rules in overload resolution. I intend to show how rvalues would consistently with the rest of the language. In the coarse of writing this article, I even straightened out my own misconceptions of rvalues.

I assume that the reader has some level of familiarity with rvalue references; understanding perfect forwarding helps. For either an introduction or refresher, I recommend "C++ Rvalue References Explained", a very thorough introduction and a well indexed reference.

Today, I want to break out of my normal pedagogical style into a Q&A. I will ask that the reader thinks for a moment about what they expect a code example to do, and why.

Non-template rvalue overloading.

Typically, discussion of rvalues is limited to template functions and move constructors, but consider the following code example:
void f(int) { } void f(int&) { } void f(int&&) { } int main() { f(0); // (1) int x; f(x); // (2) } 
What overload(s) of f get chosen at (1) and (2)? Please take a moment to compare (1) and (2) to each overload before continuing.

(1) is ambiguous because it can refer to f(int) and f(int&&), while (2) is ambiguous because it can be f(int) or f(int&).

The C++ standard describes a sort of "best fit" algorithm for overload resolution by first determining the viable overloads, then ranking them based on the conversions required of the arguments being passed. The highest rank, considered an "exact match," includes identity (or: "no conversion required") and considers reference binding a member of that category. Because of that, f(int), has the same rank as f(int&) and f(int&&) in overloading.
struct T { T() { } T(const T&) { } T(T&&) { } }; void f(T&) { } void f(const T&) { } void f(T&&) { } int main() { T t; T t2(t); // (1) T t3(T{}); // (2) f(t); // (3) f(T{}); // (4) }
Does this example suffer from the same ambiguity? Why or why not?

Being familiar with copy and move constructors, we know that (1) will invoke T(const T&) and that (2) will invoke T(T&&), but shouldn't there be an ambiguity because we can bind temporaries to const references? Every overload has the "exact match" rank, but the standard also defines a sub-ranking for "implicit conversion sequences", which covers reference parameters (including rvalue references) and "qualification adjustment" such as converting an int& to const int&. Given two overloads on reference parameters, the rvalue reference overload is prefered to the lvalue one, thus T(T&&) is chosen for (1) and f(T&&) for (4). (3) has a split between f(T&) and f(const T&), but the former wins because the argument, t, is not const and would require adjustment.

Note that f would be ambiguous if we had defined an f(T) overload because every other overload would have the same relative ranking. While f(T&) matches more precisely than f(const T&), they both have the same rank compared to f(T). 

Before considering the impact of rvalues on template functions, let's ponder what we already know about perfect forwarding.
template<typename T> void f(T&&) { }
We know that this function can be called on any value and might refer to it as a "universal reference", although it has more recently been redubbed "forwarding reference". Let's try to reproduce this behaviour without templates.
void f(int&&) { } void f(int& &&) { } void f(const int& &&) { }
This code is not valid. I usually like clang's diagnostics the best, but here it complains, rather mundanely, "type name declared as a reference to a reference". Gcc gives us a very nice hint about this code: "cannot declare reference to ‘int&’, which is not a typedef or a template type argument".
using ref = int&; using cref = const int&; void f(int&&) { } void f(ref&&) { } void f(cref&&) { } int main() { int x = 0; const int y = 0; f(0); // calls f(int&&) f(x); // calls f(ref&&) f(y); // calls f(cref&&) }
Neither clang nor gcc has any difficulty with this version. The above set of overloads looks fairly similar to the familiar perfect-forwarding pattern.
using rval = int&&; using rrval = rval&&; // (1) using rrrval = rrval&&; // (2) void f(rval) { } void f(rrval) { } // (3) void f(rrrval) { } // (4)
For (1) and (2), are they valid? For (3) and (4), what values would these overloads accept?

You may be surprised, but (1) and (2) are completely valid, however (3) and (4) are not. Again, the diagnostics of gcc and clang give us some good insight. They complain that f(rrval) redefines f(rval), and ditto for f(rrrval). That is to say: rval, rrval, and rrrval all refer to the same type! The same works for normal references--(int&) & is the same as int&.

While int and int&& are distinct types, the standard states that there "shall be no references to references," which includes rvalue references to references. This is one context where non-rvalues always take precedence.
using ref = int&; using rval = int&&; using T1 = ref&&; // T1 = int& using T2 = rval&; // T2 = int& 
This explains why perfect forwarding works the way it does. You see T&&, but it actually decays to the exact type of the parameter.

Rvalues in template functions:

One might consider rvalue references in the context of template functions as forwarding references, previously called universal references, because they seem to apply to every situation.
template<typename T> void f(T) { } template<typename T> void f(T&) { } template<typename T> void f(T&&) { } int main() { int x = 0; f(0); // (1) f(x); // (2) } 
Which overload of f gets called at (1) and (2)

As a rule of thumb, one might assume that f(T&&) gets chosen for both because it can be, (with T = int and T = int&, respectively) but actually the very same rules apply as the non-template version. (1) is ambiguous between f(T) or f(T&&) and (2) is ambiguous between all three overloads (again, because they all fit into the "exact match" rank).
template<typename T> void f(T&&) { }

void f(const int&) { } int main() { int x = 0; f(x); }
Which overload of f gets called? 

This time, f(T&&) gets instantiated with T = int& and f(const int&) requires requires a qualification adjustment from int& to const int&, so f(T&&) gets chosen.
template<typename T> struct X { }; template<typename T> void f(T&&) { } template<typename T> void f(X<T&&>&) { } int main() { X<int> x; X<int&> xref; f(x); // (1) f(xref); // (2) }
Which overload of f gets chosen at (1) and (2)?

It's f(T&&) for both. Sure, T&& can bind to anything, but the T in X<T&&> can only bind to actual rvalue references. They compiler does not try to fit T&& with int or int&, but looks at the whole type, X<T&&> vs X<int> and X<int&> and cannot deduce T. Ironically, if we wanted a signature that would bind to X of anything, it would have to be X<T>.

Conclusions

Many seem to believe that template rvalue references are inconsistent with normal overload resolution in C++ and find referring to rvalues in this context as "universal" or "forwarding" references as helpful in order to reason about why they work differently. I hope that through the above examples, the reader has learned something about rvalue references and the consistency of the rules surrounding them.

Wednesday, November 26, 2014

The Python API and C++

Recently, for a job interview task, I was asked to write a Python module with a C or C++ implementation to solve an otherwise simple task. Obviously, I chose C++. While I had never used the Python API before, I found that the existing information on extending Python with C quite sufficient. What surprised me, however, is how little information existed for using C++. A few libraries exist, like Boost.Python, PyCXX, and some utilities that parse C++ to create Python bindings, but I didn't find much in the way of actual information without examining the sources of these libraries.

I will not discuss much why someone would want to implement a Python module in another language (efficiency? better library support for certain tasks? preference?), but why C++? The Python API has basically no type safety--everying is a PyObject *, whether it represents a string, number, or tuple. It requires a considerable amount of boiler-plating--something we can reduce by using the C++ type system. It presents some interesting technical challenges, which are what I will focus on. I will assume some knowledge of the Python API.

Note: I will be basing this off Python 2.7. Yes, Python 3 is newer, but due to incompatibilities, not a replacement, and also not my system default. Also, I have little experience with the Python API, so do not take this article as authoritative. It represents a journal of my experiments.

I have started working on a little utility library (https://github.com/splinterofchaos/py-cxx) for personal use, but for a listing of the code for this article, see the gist: https://gist.github.com/splinterofchaos/b099149a701edfa5948f


Writing a Python Module: The Basics

First, we will want to create a Python module, which alone is rather uninteresting. For a more in-depth study, one should refer to https://docs.python.org/2/extending/extending.html.

Every module requires an init function which communicates to the interpreter what functions, types, and objects this module offers. For now, let's consider a module that counts how many time a certain function gets called.
#include <Python.h>

PyObject *count(PyObject *self, PyObject *args)
{
  static int i = 0;
  PySys_WriteStdout("%i\n", ++i);  // Just like printf.
  return PyInt_FromLong(i);
}

static PyMethodDef countMethods[] = {
  {"count", count, METH_VARARGS, "Returns the number of times called."},
  {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initcount()
{
  PyObject *m = Py_InitModule("count", countMethods);
}
See setup.py for building this example.

Here, countMethods contains the defined functions in a {name, c-function, function-type, __doc__-string} structure. count must be a PyCFunction, a function taking self (probably null) and args (argument tuple) parameters and returning an object. METH_VARARGS lets the interpreter know this is a regular function--other types of functions do exist, but more on that later.

The PyMODINIT_FUNC macro tells Python that, obviously, this function initializes the module. Note that even Py_InitModule() returns a regular Python object!

There are several improvements we can make. First, we could write an overloaded function, to_py_int(), that could dispatch between PyInt_FromLong(), PyInt_FromSsize_t(), and friends, but that's rather mundane so I'll be skipping it. More interesting: we can write a function to define methods.

Aside from METH_VARARGS, we can have a function be a METH_KEYWORDS which takes an additional parameter, a dictionary,  and thus not be a PyCFunction,  METH_NOARGS, which must still be a PyCFunction, and may receive a self argument, but always NULL for args, or METH_O, which has an object as self. It may be convenient to write a function that takes a pointer to a specific type instead of the generic PyObject, but by casting we lose certain safety guarantees and it can be easy to do something stupid, like writing a function will the wrong number of arguments or the wrong METH_* variant.
#include <type_traits>
template<typename R, typename...X>
constexpr int arity(R(*)(X...)) {
  return sizeof...(X);
}

template<typename R, typename...X>
constexpr bool returns_PyObject(R(*)(X...)) {
  // Result is either a PyObject, or a subclass of one.
  return std::is_convertible<R, PyObject *>::value;
}

template<typename R, typename...X>
constexpr bool is_PyCFunction(R(*)(X...)) {
  return false;
}

template<>
constexpr bool is_PyCFunction(PyCFunction) {
  return true;
}

template<typename F>
constexpr int method_type(F f) {
  return arity(f) == 3     ? METH_KEYWORDS
       : is_PyCFunction(f) ? METH_VARARGS
                           : METH_O;
}

template<typename F>
constexpr PyMethodDef method_def(const char *name, const char *doc,
                                 int type, F f)
{
  static_assert(arity(F()) == 2 || arity(F()) == 3,
                "Methods must have an arity of 2 or 3");
  static_assert(returns_PyObject(F()), "Methods must return a PyObject *.");
  return {name, (PyCFunction)f, type, doc};
}

template<typename F>
constexpr PyMethodDef method_def(const char *name, const char *doc, F f)
{
  return method_def(name, doc, method_type(f), f);
}

static PyMethodDef countMethods[] = {
  method_def("count", "Returns the number of times called.", count),
  {NULL, NULL, 0, NULL}
};
Note that in order to use static_asserts, we construct an F instead of passing f because f, as a parameter, may not be a constexpr.

Now, we can declare methods in a type-safe manor without having to specify METH_* or lose any safety. While it may be a little limiting (for example, we can't use a lambda to define the method), one can always revert to not using method_def as well.

Note: It may be safe to define a function that takes no arguments and cast it to a PyCFunction, however I don't know that this would be true across all architectures and ABI calling conventions. 

One thing lacking from this example is actually using the args parameter. For that, we will need to use PyArg_ParseTuple().


A Type-Safe PyArg_ParseTuple().

Let's use the example of finding the cross product of two vectors.
#include <Python.h>

#include "Py.h"  // includes MethodDef()

PyObject *cross(PyObject *self, PyObject *args)
{
  float a, b, c;
  float x, y, z;

  if (!PyArg_ParseTuple(args, "(fff)(fff)", &a, &b, &c, &x, &y, &z))
    return nullptr;

  float i = b*z - c*y;
  float j = c*x - a*z;
  float k = a*y - b*x;

  return Py_BuildValue("fff", i, j, k);
}

static PyMethodDef vecMethods[] = {
  MethodDef("cross", "Returns the cross product of two 3D vectors.", cross),
  {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initvec()
{
  PyObject *m = Py_InitModule("vec", vecMethods);
}
This lets us write, in Python, cross((a,b,c), (x,y,z)). Even simple functions like this benefit from being written in statically typed languages since, in Python, when one wants to do many operations on some variables, their types must be checked every time, lest you try to add a string to an integer. Here, we do nine operations, but only check the types of the initial six arguments.

PyArg_ParseTuple() is really quite simple; you pass in args and a format string (in this case, using f for float), and pointers to the variables you want to fill. If the tuple doesn't fit the expected format, it sets an error so we can just return NULL. We do our calculation and call Py_BuildValue(), which creates a tuple when given more than one value. Unfortunately, it's very verbose and not type-safe. We can fix that, but first, we must build a format string, preferably at compile time, to pass in.

First, we can use, for convenience, a typedef of std::integer_sequence to build a list of chars.
template<char...cs>
using CharList = std::integer_sequence<char, cs...>;
Then, define mappings for PyArg_ParseTuple.
template<typename...T>
struct CharListConcat;

template<typename T>
struct CharListConcat<T> {
  using type = T;
};

template<typename...U, char...cs, char...cs2>
struct CharListConcat<CharList<cs...>, CharList<cs2...>, U...> {
  using type = typename CharListConcat<CharList<cs..., cs2...>, U...>::type;
};

template<typename...T>
using CharListConcat_t = typename CharListConcat<T...>::type;

template<> struct PTCharListOf<float> {
  using type = CharList<'f'>;
};

template<typename...Ts>
struct PTCharListOf<std::tuple<Ts...>> {
  using type = CharListConcat_t<CharList<'('>,
                                typename PTCharListOf<std::decay_t<Ts>>::type...,
                                CharList<')'>>;
};

template<typename T>
using PTCharListOf_t = typename PTCharListOf<T>::type;
Unfortunately, this strategy is a but limited--we couldn't pass in an std::vector to get the desired affect because we wouldn't know how many elements go into it. A better option would be to add a PyObject * specialization for PTCharListOf and manually check that the result is a list.

template<> struct PTCharListOf<PyObject *> {
  using type = CharList<'O'>;
};
Next, we define a type to build the format:

template<typename...Ts>
struct ParseTupleBuilder { };

template<typename CL, typename T, typename...Ts>
struct ParseTupleBuilder<CL, T, Ts...> {
  using type = ParseTupleBuilder<CharListConcat_t<CL, PTCharListOf_t<T>>,
                                 Ts...>;
  constexpr static const char *fmt = type::fmt;
};

template<char...cs>
struct ParseTupleBuilder<CharList<cs...>> {
  using type = CharList<cs...>;

  static const char fmt[sizeof...(cs) + 1];
};

template<char...cs>
const char ParseTupleBuilder<CharList<cs...>>::fmt[] = { cs..., '\0' };

template<typename...Ts>
constexpr const char *ParseTupleFormat(Ts...) {
  return ParseTupleBuilder<CharList<>, std::decay_t<Ts>...>::fmt;
}
One interesting thing: When I defined fmt inside ParseTupleBuilder, I got an error from inside Python on typing "import vec" claiming that fmt's constructor had not been defined. The Python docs warn that static global variables with constructors may not be used if Python was built with a C compiler, but defining fmt outside the struct seems to fix this.

Finally, we can start defining ParseTuple(). The strategy I chose was to build an std::tuple of arguments to send to PyArg_ParseTuple() and examine each argument in a helper function. This will require two helpers, defined below, apply_tuple() and map_tuple().
template<typename F, typename T, size_t...Is>
decltype(auto) apply_tuple(F&& f, T&& t, std::index_sequence<Is...>) {
  return std::forward<F>(f)(std::get<Is>(std::forward<T>(t))...);
}

template<typename F, typename T, size_t...Is>
decltype(auto) map_tuple(F&& f, T&& t, std::index_sequence<Is...>) {
  return std::make_tuple(std::forward<F>(f)(std::get<Is>(std::forward<T>(t)))...);
}

template<typename F, typename...Ts,
         typename Is = std::make_index_sequence<sizeof...(Ts)>>
decltype(auto) map_tuple(F&& f, std::tuple<Ts...> &t) {
  return map_tuple(std::forward<F>(f), t, Is());
}

template<typename...Bound,
         typename Indicies = std::make_index_sequence<sizeof...(Bound)>>
bool ParseTuple_impl(std::tuple<Bound...> &&bound) {
  return apply_tuple(PyArg_ParseTuple, bound, Indicies());
}

template<typename...Bound, typename Arg, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, Arg &a, Args &...as) {
  return ParseTuple_impl(std::tuple_cat(std::move(bound), std::make_tuple(&a)),
                          as...);
}

template<typename...Bound, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, Optional, Args &...as) {
  return ParseTuple_impl(std::move(bound), as...);
}

template<typename...Bound, typename...Ts, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, std::tuple<Ts &...> &t,
                     Args &...as) {
  auto &&mapped = map_tuple([](auto &x) { return &x; }, t);
  return ParseTuple_impl(std::tuple_cat(bound, std::move(mapped)),
                         as...);
}

template<typename...Args>
bool ParseTuple(PyObject *args, Args &&...as) {
  return ParseTuple_impl(std::make_tuple(args, ParseTupleFormat(as...)),
                          as...);
}
Before getting back to our cross product function, we will also want a BuildTuple() function. Please excuse the repetitive nature of this code.
template<typename...Bound,
         typename Indicies = std::make_index_sequence<sizeof...(Bound)>>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound) {
  return apply_tuple(Py_BuildValue, bound, Indicies());
}

template<typename...Bound, typename Arg, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, Arg a, Args ...as) {
  return BuildValue_impl(std::tuple_cat(std::move(bound), std::make_tuple(a)),
                         as...);
}

template<typename...Bound, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, Optional, Args &...as) {
  return BuildValue_impl(std::move(bound), as...);
}

template<typename...Bound, typename...Ts, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, std::tuple<Ts...> &t,
                          Args &...as) {
  return BuildValue_impl(std::tuple_cat(bound, std::move(t)), as...);
}

template<typename...Args>
PyObject *BuildValue(Args &...as) {
  return BuildValue_impl(std::make_tuple(ParseTupleFormat(as...)),
                          as...);
}
And finally, getting back to our cross product...
PyObject *cross(PyObject *self, PyObject *args)
{
  float a, b, c;
  float x, y, z;

  if (!ParseTuple(args, std::tie(a,b,c), std::tie(x,y,z)))
    return nullptr;

  float i = b*z - c*y;
  float j = c*x - a*z;
  float k = a*y - b*x;

  return BuildValue(i, j, k);
}
That sure was a lot of work, but it created a simple interface that's hard to use improperly.

Extending Python Types

Probably the cornerstone of extending Python itself would be to define new types that interact well with the existing Python infrastructure. For efficiency's sake, the more variables we can statically type and hold in a struct, the better. The Python docs suggest extending a type this way:
typedef struct {
    PyObject_HEAD
    ...
} MyType;
The macro, PyObject_HEAD, contains fields common to any Python Object to ensure that a casting MyType pointer to a PyObject is valid. This is a common technique for representing inheritance in C, however, we can get the same affect in C++ by using inheritance.

Also, every Python type requires an accompanying PyTypeObject, which is also a PyObject. The PyTypeObject stores lots of runtime information about a type including what function to use for allocation, to convert to a string, its methods, its base class, how to deallocate it, and more.

We can use a constructor for our extension type, but it may be wisest not to. One of the fields of the type object, tp_alloc, defines how to allocate memory for this type, including setting the reference count to one,  specifying the ob_type field (a member of PyObject), and a few other things. For example, it must work, even for classes that inherit from our custom type. It relates enough to Python's internals that I think it best be left alone, and can be left as NULL in the PyTypeObject without trouble.

More interesting would be tp_new, which must point to a function that calls tp_alloc and initializes the memory, and must be defined in order to create instances of our new type. We can define tp_new to use a placement new for objects in our type that require construction.

We can generalize that an extension of PyObject will look like this:
template<typename T>
struct Extention : PyObject
{
  static PyTypeObject type;

  T ext;

  T       &get()       & { return this->ext; }
  const T &get() const & { return this->ext; }

  T       *ptr()       & { return &this->ext; }
  const T *ptr() const & { return &this->ext; }
};
We can define a default tp_new and tp_dealloc and initialize type like so:
template<typename T,
         typename = std::enable_if_t<std::is_default_constructible<T>::value>>
newfunc default_new()
{
  return [](PyTypeObject *type, PyObject *args, PyObject *kwds)
  {
    using Self = Extention<T>;
    Self *self = (Self *) type->tp_alloc(type, 0);
    if (self)
      new (self->ptr()) T();
    return (PyObject *) self;
  };
}

template<typename T,
         typename = std::enable_if_t<!std::is_default_constructible<T>::value>>
auto default_new() {
  return [](PyTypeObject *type, PyObject *args, PyObject *kwds)
  {
    return type->tp_alloc(type, 0);
  };
}

template<typename T>
PyTypeObject Extention<T>::type = {
  PyObject_HEAD_INIT(NULL)
  0,                         // ob_size
  0,                         // tp_name
  sizeof(Extention<T>),      // tp_basicsize
  0,                         // tp_itemsize
  destructor([](PyObject *self) {
    ((Extention *) self)->get().T::~T();
    self->ob_type->tp_free(self);
  }),
  0,                         // tp_print
  0,                         // tp_getattr
  0,                         // tp_setattr
  0,                         // tp_compare
  0,                         // tp_repr
  0,                         // tp_as_number
  0,                         // tp_as_sequence
  0,                         // tp_as_mapping
  0,                         // tp_hash 
  0,                         // tp_call
  0,                         // tp_str
  0,                         // tp_getattro
  0,                         // tp_setattro
  0,                         // tp_as_buffer
  Py_TPFLAGS_DEFAULT,        // tp_flags
  0,                         // tp_doc 
  0,                         // tp_traverse 
  0,                         // tp_clear 
  0,                         // tp_richcompare 
  0,                         // tp_weaklistoffset 
  0,                         // tp_iter 
  0,                         // tp_iternext 
  0,                         // tp_methods 
  0,                         // tp_members 
  0,                         // tp_getset 
  0,                         // tp_base 
  0,                         // tp_dict 
  0,                         // tp_descr_get 
  0,                         // tp_descr_set 
  0,                         // tp_dictoffset 
  0,                         // tp_init 
  0,                         // tp_alloc 
  default_new<T>(),          // tp_new
};
PyTypeObject does have a few more fields, but the compiler sets them to 0 for us. We do, however, have to set tp_basicsize in order for the right amount of memory to be allocated. Since a type in C++ may not be default-constructible, default_new() may return a function that does not construct the object; this must be done in tp_init.

Now, returning to the cross product example, consider this:
struct Vec {
  float x, y, z;
};

using PyVec = Extention<Vec>;

int init_vec(PyVec *self, PyObject *args, PyObject *)
{
  Vec &v = self->get();
  if (!ParseTuple(args, v.x, v.y, v.z))
    return -1;
  return 0;
}

PyObject *vec_str(PyVec *self)
{
  return PyString_FromString(("<"  + std::to_string(self->get().x) +
                              ", " + std::to_string(self->get().y) +
                              ", " + std::to_string(self->get().z) +
                              ">").c_str());
}

PyMODINIT_FUNC initvec()
{
  PyVec::type.tp_name = "vec.Vec";
  PyVec::type.tp_init = (initproc) init_vec;
  PyVec::type.tp_repr = PyVec::type.tp_str = (reprfunc) vec_str;
  if (PyType_Ready(&PyVec::type) < 0)
    return;

  PyObject *m = Py_InitModule("vec", vecMethods);
  if (!m)
    return;

  Py_INCREF(&PyVec::type);
  PyModule_AddObject(m, "Vec", (PyObject *) &PyVec::type);
}
Note that tp_repr is used to display the result of evaluating an expression, and tp_str is used for printing. tp_init is used to construct our value and relates to Vec.__init__() in Python. PyType_Ready() finalizes the type and fills in some of the missing tp_* fields. We add the type to the module as a global object and increment its reference count so Python doesn't try to destruct it. For brevity, I decided not to include functions to check the type safety of the initproc and reprfunc casts.

Since Vec is default constructible, we only need to worry about assigning the members in the init function.

And now, cross looks like this:
PyObject *cross(PyObject *self, PyObject *args)
{
  PyObject *o1, *o2;
  if (!ParseTuple(args, o1, o2))
    return nullptr;

  // Ensure o1 and 2 are the right types.
  if (!PyType_IsSubtype(o1->ob_type, &PyVec::type) ||
      !PyType_IsSubtype(o2->ob_type, &PyVec::type))
    return nullptr;
  
  Vec &v = ((PyVec *) o1)->get(), &w = ((PyVec *) o2)->get();
  float i = v.y*w.z - v.z*w.y;
  float j = v.z*w.x - v.x*w.z;
  float k = v.x*w.y - v.y*w.x;

  PyObject *ret = PyVec::type.tp_new(&PyVec::type, nullptr, nullptr);

  PyObject *val = BuildValue(i, j, k);
  init_vec((PyVec *) ret, val, nullptr);
  Py_DECREF(val);

  return ret;
}


Conclusions

Despite this being quite a long article, it has only touched the surface of how the Python API can be extended. There are many restrictions and it certainly puts a cramp on C++'s style, but the moral of this story is that just because you need to work with a C API doesn't mean you can't use modern C++ techniques.

Tuesday, February 4, 2014

Clang 3.4 and C++14

With each new release, gcc and clang add on more C++11 and C++14 features. While clang has been behind on some features, though ahead on others, they now claim to have C++1y all worked out.

This article is not comprehensive.
Clang's 3.4 C++ release notes:  http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html#id1
libc++'s C++1y status: http://libcxx.llvm.org/cxx1y_status.html

Note: To compile these examples requires the flags, "-stdlib=libc++" and "-std=c++1y".

 

 

Variable templates.


This feature, from N3651, took me most be surprise, but it also seems quite obvious. In the simplest example, let def<T> be a variable that represents the default-constructed value of any type, T.

template<typename T>
constexpr T def = T();
 
auto x = def<int>; // x = int()
auto y = def<char>; // y = char() 

The proposal uses the example of pi, where it may be more useful to store it as a float or double, or even long double. By defining it as a template, one can have precision when needed and faster, but less precise, operations otherwise.

For another example, consider storing a few prime numbers, but not specifying the type of their container.

template<template<typename...> class Seq>
Seq<int> primes = { 1, 2, 3, 5, 7, 11, 13, 17, 19 };

auto vec = primes<std::vector>;
auto list = primes<std::list>;
(gist)

Also, the standard library contains many template meta-functions, some with a static value member. Variable templates help there, too.

template<typename T, typename U>
constexpr bool is_same = std::is_same<T,U>::value;

bool t = is_same<int,int>;   // true
bool f = is_same<int,float>; // false
(std::is_same)

But since variable templates can be specialized just like template functions, it makes as much sense to define it this way:

template<typename T, typename U>
constexpr bool is_same = false;

template<typename T>
constexpr bool is_same<T,T> = true;
(gist)

Except for when one requires that is_same refers to an integral_constant.

One thing worries me about this feature: How do we tell the difference between template meta-functions, template functions, template function objects, and variable templates? What naming conventions will be invented? Consider the above definition of is_same and the fallowing:

// A template lambda that looks like a template function.
template<typename T>
auto f = [](T t){ ... };

// A template meta-function that might be better as a variable template.
template<typename T>
struct Func : { using value = ...; };

They each has subtly different syntaxes. For example, N3545 adds an operator() overload to std::integral_constant which enables syntax like this: bool b = std::is_same<T,U>(), while N3655 adds std::is_same_t<T,U> as a synonym for std::is_same<T,U>::value. (Note: libc++ is missing std::is_same_t.) Even without variable templates, we have now three ways to refer to the same thing.

Finally, one problem I did have with it is I wrote a function like so:

template<typename T>
auto area( T r ) {
    return pi<T> * r * r;
};

and found that clang thought pi<T> was undefined at link time and clang's diagnostics did little to point that out.

/tmp/main-3487e1.o: In function `auto $_1::operator()<Circle<double> >(Circle<double>) const':
main.cpp:(.text+0x5e3d): undefined reference to `_ZL2piIdE'
clang: error: linker command failed with exit code 1 (use -v to see invocation

I solved this by explicitly instantiating pi for the types I needed by adding this to main:

pi<float>;
pi<double>;

Why in main and not in global scope? When I tried it right below the definition of pi, clang thought I wanted to specialize the type. Finally, attempting template<> pi<float>; left the value uninitialized. This is a bug in clang, and has been fixed. Until the next release, variable templates work as long as only non-template functions refer to them.

 

 

Generic lambdas and generalized capture.


Hey, didn't I already do an article about this? Well, that one covers Faisal Vali's fork of clang based off of the N3418, which has many more features than this iteration based off the more conservative N3559. Unfortunately it lacks the terseness and explicit template syntax (i.e. []<class T>(T t) f(t)), but it maintains automatic types for parameters ([](auto t){return f(t);}).

Defining lambdas as variable templates helps, but variable templates lack the abilities of functions, like implicit template parameters. For the situations where that may be helpful, it's there.

template<typename T>
auto convert = [](const auto& x) {
    return T(x);
};
(gist)

Also, previously, clang couldn't capture values by move or forward into lambdas, which prohibited capturing move-only types by anything other than a reference. Transitively, that meant many perfect forwarding functions couldn't return lambdas.

Now, initialization is "general", to some degree.

std::unique_ptr<int> p = std::make_unique<int>(5);
auto add_p = [p=std::move(p)](int x){ return x + *p; };
std::cout << "5 + 5 = " << add_p(5) << std::endl;
(See also: std::make_unique)

Values can also be copied into a lambda using this syntax, but check out Scott Meyer's article for why [x] or [=] does not mean the same thing as [x=x] for mutable lambdas. (http://scottmeyers.blogspot.de/2014/02/capture-quirk-in-c14.html)

Values can also be defined and initialized in the capture expression.

std::vector<int> nums{ 5, 6, 7, 2, 9, 1 };
 
auto count = [i=0]( auto seq ) mutable {
    for( const auto& e : seq )
        i++; // Would error without "mutable".
    return i;
};

gcc has had at least partial support for this since 4.5, but should fully support it in 4.9.

 

 

Auto function return types.


This is also a feature gcc has had since 4.8 (and I wrote about, as well), but that was based off of N3386, whereas gcc 4.9 and clang 3.4 base off of N3638. I will not say much here because this is not an entirely new feature, not much has changed, and it's easy to groc.

Most notably, the syntax, decltype(auto), has been added to overcome some of the shortcomings of auto. For example, if we try to write a function that returns a reference with auto, a value is returned. But if we write it...

decltype(auto) ref(int& x) {
    return x;
}

decltype(auto) copy(int x) {
    return x;
} 
(gist)

Then a reference is returned when a is given, and a copy when a value is given. (Alternatively, the return type of ref could be auto&.)

 

 

More generalized constexprs.


The requirement that constexprs be single return statements worked well enough, but simple functions that required more than one line could not be constexpr. It sometimes forced inefficient implementations in order to have at least some of its results generated at compile-time, but not always all. The factorial function serves as a good example.

constexpr unsigned long long fact( unsigned long long x ) {
    return x <= 1 ? 1ull : x * fact(x-1);
}

but now we can write...

constexpr auto fact2( unsigned long long x ) {
    auto product = x;
    while( --x ) // Branching.
        product *= x; // Variable mutation.
    return product;
}
(gist)

This version may be more efficient, both at compile time and run time.

The accompanying release of libc++ now labels many standard functions as constexpr thanks to N3469 (chrono), 3470 (containers), 3471 (utility), 3302 (std::complex), and 3789 (functional).

Note: gcc 4.9 does not yet implement branching and mutation in constexprs, but does include some of the library enhancements.

 

 

std::integer_sequence for working with tuples.


Although this library addition may not be of use to everyone, anyone who has attempted to unpack a tuple into a function (like this guy or that guy or this one or ...) will appreciate N3658 for "compile-time integer sequences". Thus far, no standard solution has existed. N3658 adds one template class, std::integer_sequence<T,t0,t1,...,tn>, and std::index_sequence<t0,...,tn>, which is an integer_sequence with T=size_t. This lets us write an apply_tuple function like so:


template<typename F, typename Tup, size_t ...I>
auto apply_tuple( F&& f, Tup&& t, std::index_sequence<I...> ) {
    return std::forward<F>(f) (
         std::get<I>( std::forward<Tup>(t) )... 
    );
}
(See also: std::get)

For those who have not seen a function like this, the point of this function is just to capture the indexes from the index_sequence and call std::get variadically. It requires another function to create the index_sequence.

N3658 also supplies std::make_integer_sequence<T,N>, which expands to std::integer_sequence<T,0,1,...,N-1>, std::make_index_sequence<N>, and std::index_sequence_for<T...>, which expands to std::make_index_sequence<sizeof...(T)>.


// The auto return type especially helps here.
template<typename F, typename Tup >
auto apply_tuple( F&& f, Tup&& t ) {
    using T = std::decay_t<Tup>; // Thanks, N3655, for decay_t.

    constexpr auto size = std::tuple_size<T>(); // N3545 for the use of operator().
    using indicies = std::make_index_sequence<size>; 

    return apply_tuple( std::forward<F>(f), std::forward<Tup>(t), indicies() ); 
}
(See also: std::decay, std::tuple_size, gist

Unfortunately, even though the proposal uses a similar function as an example, there still exists no standard apply_tuple function, nor a standard way to extract an index_sequence from a tuple. Still, there may exist several conventions for applying tuples. For example, the function may be the first element or an outside component. The tuple may have an incomplete argument set and require additional arguments for apply_tuple to work.

Update: Two library proposals in the works address this issue: N3802 (apply), and N3416 (language extension: parameter packs).

 

 

experimental::optional.


While not accepted into C++14, libc++ has an implementation of N3672's optional hidden away in the experimental folder. Boost fans may think of it as the standard's answer to boost::optional as functional programers may think of it as like Haskell's Maybe.

Basically, some operations may not have a value to return. For example, a square root cannot be taken from a negative number, so one might want to write a "safe" square root function that returned a value only when x>0.


#include <experimental/optional>

template<typename T>
using optional = std::experimental::optional<T>;

optional<float> square_root( float x ) {
    return x > 0 ? std::sqrt(x) : optional<float>();
}
(gist)

Using an optional is simple because they implicitly convert to bools and act like a pointer, but with value semantics (which is incidentally how libc++ implements it). Without optional, one might use a unique_ptr, but value semantics on initialization and assignment make optional more convenient.


auto qroot( float a, float b, float c ) 
    -> optional< std::tuple<float,float> >
{
    // Optionals implicitly convert to bools.
    if( auto root = square_root(b*b - 4*a*c) ) {
        float x1 = -b + *root / (2*a);
        float x2 = -b - *root / (2*a);
        return {{ x1, x2 }}; // Like optional{tuple{}}.
    }
    return {}; // An empty optional.
}  
(gist)

 

Misc. improvements.


This version of libc++ allows one to retrieve a tuple's elements by type using std::get<T>.


std::tuple<int,char> t1{1,'a'};
std::tuple<int,int>  t2{1,2}; 
int x = std::get<int>(t1); // Fine.
int y = std::get<int>(t2); // Error, t2 contains two ints.


Clang now allows the use of single-quotes (') to separate digits. 1'000'000 becomes 1000000, and 1'0'0 becomes 100. (example) (It doesn't require that the separations make sense, but one cannot write 1''0 or '10.)

libc++ implements N3655, which adds several template aliases in the form of std::*_t = std::*::type to <type_traits>, such as std::result_of_t, std::is_integral_t, and many more. Unfortunately, while N3655 also adds std::is_same_t (see the top of the 7th page), libc++ does not define it. I do not know, but I believe this may be an oversight that will be fixed soon as it requires only one line.

N3421 adds specializations to the members of <functional>. If one wanted to send an addition function into another functions, one might write f(std::plus<int>(),args...), but we new longer need to specify the type and can instead write std::plus<>(). This instantiates a function object that can accept two values of any type to add them. Similarly, std::greater<>, std::less<>, std::equal_to<>, etc...

 

 

Conclusions.


This may not be the most ground-breaking release, but C++14 expands on the concepts from C++11, improves the library, and adds a few missing features, and I find it impressive that the clang team has achieved this so preemptively. I selected to talk about the features I thought were most interesting, but I did not talk about, for example, sized deallocation, std::dynarray (<experimental/dynarry>), some additional overloads in <algorithm>, or Null Forward Iterators, to name a few. See the bottom for links to the full lists.

The GNU team still needs to do more work to catch up to clang. If one wanted to write code for both gcc 4.9 and clang 3.4, they could use generic lambdas, auto for return types, but not variable templates or generalized constexprs. For the library, gcc 4.9 includes std::make_unique (as did 4.8), the N3412 specializations in <functional>, integer sequences, constexpr library improvements, even experimental::optional (though I'm not sure where), and much more. It may be worth noting it does not seem to include the <type_traits> template aliases, like result_of_t.

See clang's full release notes related to C++14 here: http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html#id1
For libc++'s improvements, see: http://libcxx.llvm.org/cxx1y_status.html
gcc 4.9's C++14 features: http://gcc.gnu.org/projects/cxx1y.html
And gcc's libstdc++ improvements:  http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2014

The code I wrote to test these features: https://gist.github.com/splinterofchaos/8810949

Friday, December 28, 2012

Clang and Generic (Polymorphic) Lambdas.

Recently a Faisal Vali put forth an implementation of n3418, which he co-authored with Herb Stutter and Dave Abraham,  allowing generic lambdas using a fork of clang. It also includes auto type deduction, which I wrote about being implemented in gcc 4.8. There are a few caveats before continuing: This has not been merged into the mainline. It has a few bugs, but Vali is quick to fix them if you point one out. The implementation itself is a proof of concept (similar to automatic type deduction) and so it isn't unreasonable to assume some things might change. Section 2.4 of the proposal (named lambdas) has not yet been implemented. And while this doesn't allow us to do many things that were previously impossible, the possible used to be so verbose that no one would want to do it!

Generic lambdas are profound and may have a great impact on the style of code written in C++. Consider this a light (and lacking) overview of what is possible.

Before continuing, I want to note that I found evidence that some GCC developers had begun working on generic lambdas (from the mailing list: Latest experimental polymorphic lambda patches), however, I cannot find anything more recent than 2009 discussing this, and code using auto and lambdas does not compile.


Being terse.

This patch allows the writing of incredibly terse polymorphic functions, such as these:

auto add = [](auto x, auto y) x + y;
auto times = [](auto x, auto y) x * y; 
auto print_int = [](int x) void(std::cout << x);

No {}, no return, auto-deduced types, and void can even be used to throw away the value of state-full operations. x and y can be anything and the return type is entirely dependent on them. Why is this interesting? Say you want to find the product of a vector.

auto prod = std::accumulate ( 
    v.begin(), v.end(), 1,
    []( int x, int y ){ return x * y; }
);

Bit of a mouthful, right? v might store ints today, but tomorrow, maybe it will store long long ints to avoid overflow or just unsigned ints to avoid negatives. When the vector's declaration changes, the lambda's argument types need to change, which is a maintenance problem. I currently solve this by writing polymorphic function objects.

constexpr struct {
    template< class X, class Y >
    auto operator () ( X x, Y y )
        -> decltype(x*y)
    {
        return x * y;
    }
} timesObj{}; 

But the above and times are mostly equivalent. (A lambda can be thought of as an automatic function object. It even has a unique type. (see below: Overloading))

auto prod = std::accumulate ( 
    v.begin(), v.end(), 1,
    times
);

This never needs to change unless the underlying operation (multiplication) changes.

Sometimes, an underlying operation is consistent across types. Using zip_tuple as an example from my article "Zipping and Mapping Tuples", one could write:

std::tuple<int,std::string> a{1,"hello "}, 
                            b{2,"world!"};

// r = {3,"hello world"}
auto r = zipTuple( ([](auto x, auto y) x + y), a, b );

Because of the comma operator, we must put the lambda in parentheses to make clear where it ends. 

Up until now, things like composition could not be lambdas.

template< class F, class G > struct Composition {
    F f;
    G g;

    template< class ...X >
    auto operator () ( X&& ...x ) 
        -> decltype( f(g(std::forward<X>(x)...)) )
    {
        return f( g( std::forward<X>(x)... ) );
    }
};

template< class F, class G, class C = Composition<F,G> >
C compose( const F& f, const G& g ) {
    return C{f,g};
}

int main () {
    auto add1 = []( auto x ) x + 1;
    auto Char = []( auto c ) char(c);
    // Prints "a + 1 = b"
    std::cout << "a + 1 = " << compose(Char,add1)('a') << std::endl;
}

compose is generic and it returns a generic function object. Generic lambdas make the same possible by returning another generic lambda that captures f and g.

auto compose = [](auto f, auto g) 
    [=](auto x) f(g(x));

However, this version of compose only allows one argument. Luckily, generic lambdas can be fully templated, variadic, and perfect forwarding.

auto compose = 
    []( auto f, auto g )
        [=]<class ...X>( X&& ...x ) 
            f( g(std::forward<X>(x)...) );

However, the syntax for these lambdas is so convenient, one might as well drop the functional programming theory and write

auto f = [](char c) char(c+1);

For an example of the power of nested lambdas, consider currying:

auto curry3 = 
    []( auto f )
        [=](auto x) [=](auto y) [=](auto z) 
            f(x,y,z);

auto sum3 = [](auto x, auto y, auto z) x + y + z;
auto ten = curry3(sum3)(2)(3)(5)

Nested lambdas especially aid in the use of monads, as I have written about previously ("Monads in C++").

auto justThree = Just(1) >>= [](auto x)
                 Just(2) >>= [](auto y)
                 mreturn<std::unique_ptr>( x + y ); // Or Just(x+y).

This also takes care of the return mreturn problem I discussed in that article.

Overloading

Overloading functions is obviously useful, but impossible with lambdas alone. To fully take advantage of their brevity, we must have a way to overload them. In the proposal, Mathius Gaunard is attributed with the following:

template<class F1, class F2> struct overloaded : F1, F2
{
    overloaded(F1 x1, F2 x2) : F1(x1), F2(x2) {}
    using F1::operator();
    using F2::operator();
};

template<class F1, class F2>
overloaded<F1, F2> overload(F1 f1, F2 f2)
{ return overloaded<F1, F2>(f1, f2); } 

(See also: "Unifying Generic Functions and Function Objects")

This works because lambdas are function objects with a unique type, and can therefore act as the base class for overloaded. This is an unlikely solution because this fact is so rarely taken advantage of, however there is much advantage to take!

Unfortunately, one cannot inherit from function pointers, so, in order to overload lambdas and regular functions together requires a little more work. First, we must define a base type that can handle both function pointers and function objects. It's job is to just forward the arguments to its function.

template< class F > struct Forwarder : F {
    constexpr Forwarder( const F& f ) : F(f) { }
};

template< class R, class ...X > struct Forwarder<R(*)(X...)> {
    using type = R(*)(X...);
    type f;

    constexpr Forwarder( type f ) : f(f) { }

    constexpr R operator () ( X... x ) {
        return f(x...);
    }
};

template< class R, class ...X > 
struct Forwarder<R(X...)> : Forwarder<R(*)(X...)>
{
    using type = R(*)(X...);
    constexpr Forwarder( type f ) 
        : Forwarder<R(*)(X...)>(f)
    {
    }
};

Function pointers get two specializations because decltype(f)=R(X) and decltype(&f)=R(*)(X). It makes the most sense to specialize for pointers, but only doing so would require we take the address of our functions when we call overload.

Next, Overloaded inherits from two Forwarders.

template< class F, class G >
struct Overloaded : Forwarder<F>, Forwarder<G> {
    constexpr Overloaded( const F& f, const G& g )
        : Forwarder<F>(f), Forwarder<G>(g)
    {
    }
};

template< class F > F overload( F&& f ) {
    return std::forward<F>(f);
}

template< class F, class G, class ...H,
          class O1 = Overloaded<F,G> > 
auto overload( const F& f, const G& g, const H& ...h ) 
    -> decltype( overload(O1(f,g),h...) )
{
    return overload( O1(f,g), h... );
}

Overloads can be of arity and domain (or argument type). The simplest example, for demonstration purposes, is a printing function.

auto prnt = overload (
    // Anything cout is already defined for.
    []( const auto& x ) 
        -> decltype( void(std::cout << x) )
    { std::cout << x; },

    // Any STL sequence.
    []<class Sequence>( const Sequence& s )
        -> decltype( void(std::begin(s)) )
    {
        std::cout << "[ ";
        for( const auto& x : s ) 
            std::cout << x << ' ';
        std::cout << ']';
    },

    // These are both sequences for which cout is defined.
    // Specializing disambiguates this.
    []( const char* const s ) { std::cout << s; },
    []( const std::string& s ) { std::cout << s; }
);

// ...
prnt("abc"); // Prints abc.
prnt(std::vector<int>{1,2,3}); // Prints [ 1 2 3 ]. 

Although defining all overloads in a single statement is an annoyance, they are grouped together, they don't require a template<...> line, and the visual clutter is overall less than if prnt were defined as the equivalent (explicit) function object.

Perhaps a function must be overloaded, but decltype or std::enable_if is too accepting and specializing for each type is redundant. For example, one might be annoyed by the last two string specializations of prnt. One solution is to define yet another overload type.

template< class X, class F >
struct UnaryOverload {
    F f;
    UnaryOverload( const F& f ) : f(f) { }

    using result_type = typename std::result_of< F(X) >::type;

    result_type operator () ( X x ) const {
        return f(x);
    }
};

template< class ...X, class F >
auto unary_overload_set( const F& f ) 
    -> decltype( overload(UnaryOverload<X,F>(f)...) )
{
    return overload( UnaryOverload<X,F>(f)... );
}

auto prnt = overload (
    // ...

    unary_overload_set<const char* const,
                       const std::string&>(
        []( auto&& s ) { std::cout << s; }
    )
);

One might write an overloading class to specialize for specific types, or a category of types, or more generally, a class might be written to do type conversion before calling the inner function, to prepare the output, or whatever one's needs may be. An overloading type might even select one of two functions based on an enable_if.

    // Where pred is a templated type defining pred<X>::value.
    auto h = enable_when<pred>( f, g );

The downsides of overloading function objects include that each overload must be defined all at once and none can be added. That isn't too bad since the point of lambdas is to be brief, but one should be mindful of extensibility when writing generic functions. (In other words, if an overload must be added, is it OK to modify the function object, or must the user be able to add overloads later?)


Recursion.

Without generic lambdas, recursion is possible.

std::function<int(int)> fact = [&]( int x ) x * fact(x-1);

Or, with function pointers, which a lambda may implicitly convert to.

// In global scope:
using IntToInt = int(*)(int);
IntToInt fact = []( auto x ) not x ? 1 : x * fact(x-1);

With generic lambdas, we could write it like this:

auto fact1 = []( auto f, int n ) -> int 
    not n ? 1 : f(f,n-1) * n;
auto fact = []( int x ) -> int 
    fact1( fact1, x );

One might notice that the Fibonacci sequence could be implemented in a similar fashion. In researching recursive lambdas, I came across the fixed-point combinator. Haskell has fix, which can be implemented like this:

auto fix = []( auto f )
    [=]( auto x ) -> decltype(x) f( f, x );

auto fact = fix (
    []( auto f, int n ) -> int
    not n ? 1 : f(f,n-1) * n
);

auto fib = fix (
    []( int f, int n ) -> int
    n == 0 ? 0 :
    n == 1 ? 1 :
    f(f,n-1) + f(f,n-2)
); 

fix is a generalization of a certain type of recursion. For an idea of how one would implement fix without lambdas, see this Stack Overflow post.

Making prnt above variadic requires a different kind of recursion.

// Variadic void unary.
auto vvu_impl = overload (
    [] (auto,auto) {},
    []<class X, class ...Y>( const auto& self, const auto& u, 
                             X&& x, Y&& ...y ) 
    {
        u( std::forward<X>(x) );
        self( self, u, std::forward<Y>(y)... );
    }
);

auto vvu = []( const auto& u )
    [&]<class ...x>( const x& ...x )
        vvu_impl( vvu_impl, u, x... );

// Variadic print.
// vprint(x,y...) = prnt(x); prnt(y)...
auto vprint = vvu( prnt );

auto print_line = []<class ...X>( const X& ...x ) 
    vprint( x..., '\n' );
 
print_line( "abc ", 123, std::vector<int>{1} ); // prints "abc 123 [1]\n" 

We can generalize left-associativity as well.

auto chainl_impl = overload (
    []( auto self, auto b, auto r ) { return r; },
    []<class ...Z>( auto self, auto b, auto x, auto y, Z ...z ) 
        self( self, b, b(x,y), z... )
);

auto chainl = []( auto b )
    [=]<class ...X>( const X& ...x )
        chainl_impl( chainl_impl, b, x... );

auto sum = chainl(add);
auto three = sum(1,1,1);

// Variadic compose.
auto vcompose = chainl( compose );

auto inc = []( auto x ) ++x;
auto addThree = vcompose( inc, inc, inc );

A good exercise might be to (a) write a variadic version of fix and (b) use that version to reimplement chainl and vprint.

There are, of course, many types of recursion. Implementing recursive lambdas is more complicated than for regular functions, not by too much.


Conclusions.

Polymorphic (generic) lambdas are very powerful indeed, but it may take a while before GCC, MSVC, and others catch up, much yet before Faisal Vali's branch is merged back into Clang. Still they may have a strong impact on code written in C++ in the future. Some thought that templates relieved a sort of functional language in C++, and others thought the same of constexpr. Generic lambdas reveal another, but more flexible one.

Lambdas cannot be marked constexpr. In terms of efficiency, I do not think this matters. They are implicitly inlined, so the compiler may still take advantage of any compile-time information it can gather. However, the result of a lambda expression could never be used in a template parameter, for example, which means they don't replace generic constexpr function objects.

Also, explicitly specifying a type is more verbose because the rules are the same as for template member functions, so lambdas can't replace template functions that require explicit parameters.

    auto f = []<class X>() { return x(); }; 
    f.operator()<int>(); // bad

The proposal to add polymorphic lambdas to C++ is not finalized and a few things are up in the air. For example, can we elide auto and just write [](x) f(x)? Should we be allowed to elid the enclosing braces and return? Are the implemented parts of this proposal useful? Remember that the standardization process is open to the public and that we can make our voices heard about the features that impact us.

Personally, I like the proposal as implemented currently. (Requiring auto, but eliding { return ... }.) I would go a step further and say that auto should be extended to allow variadic parameters. (i.e. [](auto ...x) f(x...)) And named lambdas (section 2.4) will be a very nice addition.

What are your thoughts?




Source for this article: https://gist.github.com/4347130 and https://gist.github.com/4381376
Another (google translated) article on generic lambdas: http://translate.google.com/translate?hl=en&sl=ja&u=http://d.hatena.ne.jp/osyo-manga/20121225/1356368196&prev=/search%3Fq%3Dgeneric%2Blambda%2Bc%252B%252B%2Bclang%26start%3D10%26hl%3Den%26safe%3Doff%26client%3Dubuntu%26sa%3DN%26tbo%3Dd%26channel%3Dfs%26biw%3D1535%26bih%3D870&sa=X&ei=v-LbUOG5BOmQ2gXw7ICABg&ved=0CFsQ7gEwBjgK
A long and thoughou article on the fixed-point combinator: http://mvanier.livejournal.com/2897.html

Friday, December 14, 2012

Quick and Easy -- Manipulating C++ Containers Functionally.

Update: Added examples for dup and foldMap.

Probably the most useful parts of the standard C++ library would be container and algorithms support. Who has worked in C++ for any non-trivial amount of time without using std::vector, list, map, or any of the others? <algorithm>, on the other hand, is more something everyone should know. It solves many of the problems that C++ developers encounter on a daily basis.

    "How do I test if there exists an element, x, where p(x) is true?" : std::any_of
    "How do I copy each element, x, where p(x)?" : std::copy_if
    "How do I removed each element, x, where p(x)?" : std::remove_if
    "How do I move elements from one container to another?" : std::move, <algorithm> version.
    "How do I find a subsequence?" : std::search
    "How do I sort an array?" std::sort
    "How do I find the sum of an array?" : std::accumulate

Any programmer worth half their salt could write any of these functions in their sleep--they're basic--and the thing is that these algorithms do get written, over and over and over again. Either because one does not realize a specific <algorithm> function exists, or because one is thinking on a low level and unable to see the higher level abstractions.

What I like most about the STL is that the only requirements for adapting any data type to a sequence are (1) define an iterator, and (2) define begin() and end(). After that, all (if not most) of <algorithm> becomes instantly usable with that type. (As well as the range-based for loop.) This makes it incredibly generic and useful.

What I dislike is its verbosity. For example:

    std::transform( xs.begin(), xs.end(), xs.begin(), f );

Wouldn't this be more clear if written...


    xs = std::transform(xs,f);

And this allows us to compose functions.

    std::transform( xs.begin(), xs.end(), xs.begin(), f );
    std::transform( xs.begin(), xs.end(), xs.begin(), g );

    // vs
    xs = std::transform( std::transform(xs,f), g );

    // Or, using actual composition:
    xs = std::transform( xs, compose(g,f) );

That's what this article will be about. An abstraction over the STL that lends itself to writing more terse, concise code without losing any clarity. This abstraction is less general, by design, because it works on entire containers, not iterators. I am not writing about a replacement for any <algorithm> functions, but an alternative inspired by functional programming.

However, I do go over many <algorithm> functions, so this can also be thought of as a review.


Filtering, Taking, and Dropping: Collecting data.

I've always found the erase-remove idiom an unintuitive solution to such a common problem. I certainly would not have figured it out on my own without the help of the C++ community to point it out. Requiring containers to define a predicated erase wouldn't be generic, and <algorithm> knows only of iterators, not containers, so the standard library can't offer anything simpler. filter fills this gap by combining its knowledge of containers and iterators.

template< class P, class S >
S filter( const P& p, S s ) {
    using F = std::function< bool(typename S::value_type) >;

    s.erase (
        std::remove_if( std::begin(s), std::end(s), std::not1(F(p)) ),
        std::end(s)
    );

    return s;
}

// ...

std::vector<int> naturals = {1,2,3,4,5,6,7,8,9/*,...*/};
auto evens = filter( [](int x){ return x%2==0; }, naturals );

See also: std::not1.

It does two things: First, it inverses the predicate meaning we can use positive logic (defining what we want to keep, rather than throw away) and second, it abstracts the erase-remove idiom.

Using filter, we can write a rather quick-and-dirty qsort.

// For each x in s, returns pair( p(x), not p(x) ).
template< class P, class S >
std::pair<S,S> partition( const P& p, S s ) {
    using F = std::function< bool(typename S::value_type) >;

    // There does exist std::partition_copy, 
    // however this function demonstrates a use of filter.
    return std::make_pair ( 
        filter( p,    s ),
        filter( std::not1(F(p)), s )
    );
}

// Fake Quick-Sort: A non-in-place, qsort-inspired function.
template< class S >
S fake_qsort( S s ) {
    using X = typename S::value_type;

    if( s.size() < 2 )
        return s;

    X pivot = s.back();
    s.pop_back();

    S low, high; 
    std::tie(low,high) = partition (
        [&]( const X& x ) { return x <= pivot; },
        std::move(s)
    );

    low = fake_qsort( std::move(low) );
    low.push_back( pivot );
    
    // Append the sorted high to the sorted low.
    high = fake_qsort( std::move(high) );
    std::move( std::begin(high), std::end(high), 
               std::back_inserter(low) );

    return low;
}
See also: std::partition, std::partition_copy, and std::sort.

take is a function that may seem entirely trivial, at least at first.

template< class S, class _X = decltype( *std::begin(std::declval<S>()) ),
          class X = typename std::decay<_X>::type >
std::vector<X> take( size_t n, const S& s ) {
    std::vector<X> r;
    std::copy_n( std::begin(s), n, std::back_inserter(r) );
    return r;
}

template< class P, class S, 
          class _X = decltype( *std::begin(std::declval<S>()) ), 
          class X  = typename std::decay<_X>::type >
std::vector<X> takeWhile( const P& p, const S& s ) {
    std::vector<X> r;
    std::copy( std::begin(s), 
               std::find_if( std::begin(s), std::end(s), p ),
               std::back_inserter(r) );
    return r;
}

It also breaks the convention of returning s's type. There's a reason for that. Infinite lists. Consider this Haskell code:

    take 10 [1..] == [1,2,3,4,5,6,7,8,9,10]

[1...] is an infinite list, starting at one. Obviously, it doesn't actually exist in memory. take returns a finite list that does.

The concept of iterators that represent infinite ranges in C++ isn't new, but neither is it common. std::insert_iterator could insert a theoretically infinite number of elements into a container. std::istream_ and ostream_iterator may read from or write to a file infinitely.

We can create pseudo-containers to represent infinite ranges and plug them into take.

template< class X > struct Reader {
    using iterator = std::istream_iterator<X>;

    iterator b;
    iterator e;

    Reader( iterator b = iterator( std::cin ), 
            iterator e = iterator() )
        : b(b), e(e)
    {
    }

    iterator begin() const { return b; }
    iterator end()   const { return e; }
};

// Probably ill-advised, 
// but this is /one/ way of doing IO before main().
std::vector<int> three = take( 3, Reader<int>() );

Sometimes we want to take the contents of an entire container, so dup may be helpful.

std::vector<X> dup( const S& s ) {
    std::vector<X> r;
    std::copy( std::begin(s), 
               std::end(s),
               std::back_inserter(r) );
    return r;
}

std::ifstream in( "in.txt" );
// Reader's constructor implicitly converts in to an iterator.
// Some may consider this bad style and require the constructor be "explicit".
std::vector<int> contents = dup( Reader<int>(in) );

The counterpart to take is drop, but it does not have take's quirks.

template< class S >
S drop( size_t n, const S& s ) {
    return S (
        std::next( std::begin(s), n ),
        std::end(s)
    );
}

// Predicate version
template< class P, class S >
S dropWhile( const P& p, const S& s ) {
    return S (
        std::find_if_not( std::begin(s), std::end(s), p ),
        std::end(s)
    );
}

Reader<int> r = drop( 2, Reader<int>() );

drop makes no promises about infinite lists, but unlike most container- or range-based algorithms, it can work on them. In the above example, two integers are read from std::cin, and their values lost.

For another example of the use of pseudo-containers, consider this solution the the first Euler Project problem using boost::irange.

#include <boost/range/irange.hpp>
void euler1() {
    // multiples of...
    auto three = boost::irange( 3, 1001, 3 );
    auto five  = boost::irange( 5, 1001, 5 );

    // Ensure that the final sum has no duplicates.
    std::vector<int> all;
    std::set_union( std::begin(three), std::end(three),
                    std::begin(five),  std::end(five),
                    std::back_inserter(all) );

    std::cout << "The sum of every multiple of 3 or 5 bellow 1000 :" 
        << std::accumulate( std::begin(all), std::end(all), 0 ) 
        << std::endl;
}


Folding: Reducing a list from many to one. (std::accumulate)

Accumulating is the "imperative" description of folding. Historically, you'd call the variable you update with the results of each calculation the accumulator. To accumulate, then, is to iterate through a sequence, updating the accumulator with each iteration.

Folding is another way to think of it. A fold is a transformation from a list of values to just one value. Haskell defines foldl and foldr, meaning "fold left" and "right".

template< class F, class X, class S >
constexpr X foldl( F&& f, X x, const S& s ) {
    return std::accumulate (
        std::begin(s), std::end(s),
        std::move(x), std::forward<F>(f) 
    );
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    std::cout << "((10 - 5) - 3) - 2) = " << foldl( std::minus<int>(), 10, v ) << std::endl;
}

foldl is really just another name for accumulate. The accumulation function (here, std::minus) expects the accumulator as the left argument and value to accumulate as its right. foldr is reversed: Not only does it iterate in reverse, but expects the accumulator in the right-hand argument.

// A function wrapper that flips the argument order.
template< class F > struct Flip {
    F f = F();

    constexpr Flip( F f ) : f(std::move(f)) { }

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y )
        -> typename std::result_of< F(Y,X) >::type
    {
        return f( std::forward<Y>(y), std::forward<X>(x) );
    }
};

template< class F, class X, class S >
constexpr X foldr( F&& f, X x, const S& s ) {
    using It = decltype(std::begin(s));
    using RIt = std::reverse_iterator<It>;
    return std::accumulate (
        // Just foldl in reverse.
        RIt(std::end(s)), RIt(std::begin(s)),
        std::move(x), 
        Flip<F>(std::forward<F>(f))
    );
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    std::cout << "(2 - (3 - (5-10))) = " << foldr( std::minus<int>(), 10, v ) << std::endl;
}

Folding is great for monoids; types that have a binary operation with, often, the the signature "X(const X&, const X&)".

std::vector<std::string> strs = { "st", "ri", "ng" };
// std::string associated with (+) is a monoid.
std::cout << "'st' + 'ri' + 'ng' = " << 
    foldl( std::plus<std::string>(), std::string(), strs ) << std::endl;

using Func = std::function< int(int) >;

auto comp = []( Func f, Func g ) {
    return [f,g]( int x ){ return f(g(x)); };
};

auto inc = []( int x ) { return x+1; };
auto id  = []( int x ) { return x;   };

std::vector<Func> incs = { inc, inc, inc };
// Functions can be monoids under composition.
std::cout << "(inc . inc . inc)(1) = " << 
    foldl( comp, Func(id), incs )(1) << std::endl;

Functional programmers also like to build lists using fold. They build lists starting at the tail, so they typically prefer foldr to foldl. std::forward_list works like [] in Haskell and linked lists in other functional languages. This snippet simply copies the values from the std::vector, v.

using List = std::forward_list<int>;
auto cons = []( List l, int x ) {
    l.push_front( x );
    return std::move(l);
};

List l = foldr( cons, List(), v );

Note: This one is not an example of a monoid.


Zip and Map: many to many. (std::transform)

To zip two sequences together by some function is the same as calling std::transform. Transform implies modifying each member by some function. Zip implies the same, but with the visual metaphor of combining two lists into one, starting at one end and working up.

template< class F, template<class...>class S, class X, class Y,
          class Res = typename std::result_of< F(X,Y) >::type >
S<Res> zip( F&& f, const S<X>& v, const S<Y>& w ) {
    S<Res> r;
    std::transform( std::begin(v), std::end(v),
                    std::begin(w), 
                    std::back_inserter(r),
                    std::forward<F>(f) );
    return r;
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    auto doubleV = zip( std::plus<int>(), v, v );
}

Note: The only way I have discovered to write zip variadically is with tuples. Since this article is not on tuples, refer to the definition of transform in "Zipping and Mapping Tuples".

Note2: An in-place version of this function is possible, but showing both general and optimized versions of each function would be redundant, and the topic of optimization is worth discussing on its own.

Mapping is similar to zipping--in fact the two-argument forms of zip(f,xs) and map(f,xs) should be equivalent. The three argument form, like map(f,xs,ys), applies f to every combination of x and y.

    map(f,{x,y},{a,b}) == { f(x,a), f(x,b), f(y,a), f(y,b) }

If xs is size N and ys is of size M, then map(f,xs,ys) returns a sequence of size N x M.

template< class F, template<class...>class S, class X,
          class Res = typename std::result_of< F(X) >::type >
S<Res> map( const F& f, const S<X>& s ) {
    S<Res> r;
    std::transform( std::begin(s), std::end(s),
                    std::back_inserter(r),
                    std::forward<F>(f) );
    return r;
}

template< class F, template<class...>class S, class X, class Y,
          class Res = typename std::result_of< F(X,Y) >::type >
S<Res> map( const F& f, const S<X>& xs, const S<Y>& ys ) {
    S<Res> r;

    for( const X& x : xs ) 
        for( const Y& y : ys )
            r.emplace_back( f(x,y) );

    return r;
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    std::vector<int> w = { 9, 8, 7 };
    auto sums = map( std::plus<int>(), v, w );
}

map is a bread and butter function in functional programming.

    // Convert a sequence from one type to another:
    auto ints = map( toInt, floats );

    // In a game loop:
    actors = map( update, actors ); 

    // A deck of cards (four suites with twelve values).
    auto deck = map( make_card, suites, value );

    // Making variants of the same thing from simpler data.
    auto inits = { 1, 2, 3, 4 };
    auto cs = map (
        []( int i ) { return std::complex<float>(i,0.1); },
        inits
    );

    // Checking for collisions:
    ColisionObject collisions = map( make_collision, actors, actors );

    // AI:
    states = map( successor, actions, states );

One downfall of map is that it may create redundancies, which makes filter useful in conjunction.

    states = filter (
        state_is_valid,
        map( successor, actions, states )
    );

While this may turn an algorithm from one-pass (update and add if valid) to two-pass (update all states, then filter), it also makes simpler algorithms that can be optimized more easily by the compiler at times. For example,

    for( auto x : xs ) {
        for( auto y : ys ) {
            z = x * y;
            if( pred(z) ) r.push_back(z);
        }
    }

    // or:
    auto r = filter( pred, map(std::multiplies<int>(),xs,ys) );
    
While only profiling can tell in any given instance, the second example may be faster under some circumstances. The compiler may be able to vectorize the call to map, but have difficulties applying the same optimization to the first because it cannot evaluate both the multiplication and predicate in one vectorized step.

Sometimes, the goal is to calculate something given the data, rather than map it. Naively, one might write something like

    auto r = fold( f, map(g,xs) );

But isn't creating the new container inefficient? What if an in-place version of map were implemented, wouldn't transforming xs before folding still be inefficient? Thus, foldMap is useful.
  
template< class Fold, class Map, class X, class S >
X foldMap( const Fold& f, const Map& m, X x, const S& s ) {
    for( auto& y : s )
        x = f( std::move(x), m(y) );
    return x;
}

#include <cctype>
int main() {
    const char* names[] = { "jonh", "mary", "cary" };
    auto appendNames = []( std::string x, std::string y ) {
        return x + " " + y; 
    };
    auto capitolizeName = []( std::string name ) {
        name[0] = std::toupper( name[0] );
        return name;
    };
    std::cout << "Names : " 
        << foldMap (
            appendNames,
            capitolizeName,
            std::string(),
            names
        ) << std::endl;
}



Conclusions.

Haskell's Data.List is actually a lot like <algorithm>, though on a higher level of abstraction. There are some things that can only be done with iterators, but many that can also only be done with whole containers. Data.List gives some good inspiration for helpful algorithms, even in C++.

But unlike in C++, Haskell uses simple linked lists by default and all of Data.List's function work only on linked lists. This gives both Haskell and functional programming a bad name when people compare Haskell code using linked lists to C++ code using std::vector. (See "C++ Benchmark -- std::vector vs. std::list vs. std::deque") When libraries are written to optimize inefficiencies in the linked list, like Data.Text, they re-implement Data.List's interface and often achieve equivalent efficiency to well-optimized C, but not without plenty of redundancy.

In C++, we can write one static interface that is both generic and efficient. Writing functional code does not mean writing slow code. The mathematical nature of these operations can even help the compiler optimize. The high-level interface of Data.List fits snugly atop of the low-level interface of iterators.


Source for this article: https://gist.github.com/4290166