Monday, September 24, 2012

Range-based for loop on a int range? Enumerate!

Update: The enumerate implementation now allows one to specify a custom step, or stride.  

C++’s new range-based for loop offers a very convenient syntax compared to the old.

   // #include <stdio.h>
   for( int x : {1,2,3,4,5,6,7,8,9,10} )
       printf( “%d “, x );

Here, everything in the {...} gets initialized as an std::initializer_list<int>. This is convenient and all, but what if I don’t what to have to write one though ten? Maybe this...

   #include <stdio.h>
   #include <vector>
   #include <algorithm>

   // Create an inclusive range [b,e].
   std::vector<int> enumerate( int b, int e ) {
       // but store it in the range [b,e).
       std::vector<int> v( e-b + 1 );
       std::iota( v.begin(), v.end(), b ); // C++11
       return v;

   int main() {
       for( int x : enumerate(1,10) )
           printf( "%d ", x );

To compile without c++11, just replace std::iota with a while loop.

This works, but in a worrying way. Even with full optimizations, gcc cannot realize that the output would be equivalent to a regular for loop. (i.e. for(i=1;i<=10;i++)). We want a truly zero-overhead abstraction.

One solution is that we can invent a phony container type with a phony iterator type! The container will hold a special iterator type that will do nothing but abstract an int as an iterator type. How does that work?

An iterator has several requirements one might not immediately consider. Many STL algorithms require looking into a class called std::iterator_traits which define many things like what type the iterator will dereference to, the category of the iterator (random access, forward, etc.), and a few other useful things. Our iterator type needs to meet these constrains. In order to do so, we merely inherit from std::iterator, which defines these for us. Here is the implementation in full:

   #include <stdio.h>
   #include <vector>
   #include <algorithm>

   template< class I > struct XRange {
       typedef I value_type;
       typedef I difference_type;
        struct iterator
            : std::iterator<std::random_access_iterator_tag,I,I>
            value_type i;
            value_type stride;

            constexpr iterator( value_type i, value_type stride )
                : i(i), stride(stride) { }
            constexpr iterator( iterator it, value_type stride )
                : i(*it), stride(stride) { }

            constexpr value_type operator* () { return i; }
            iterator operator++ () { ++i; return *this; }
            iterator operator-- () { --i; return *this; }
            iterator operator++ (int) { auto cpy = *this; ++(*this); return cpy; }

            iterator operator-- (int) { auto cpy = *this; --(*this); return cpy; }

            constexpr iterator operator+ ( difference_type n ) {
                return iterator( (i + n) / stride, stride );
            constexpr iterator operator- ( difference_type n ) {
                return iterator( (i - n) / stride, stride );
            constexpr value_type operator- ( iterator other ) {
                return (i - *other) / stride;

            constexpr bool operator== ( iterator other ) { return i == *other; }
            constexpr bool operator!= ( iterator other ) { return i != *other; }

            iterator& operator+= ( difference_type other ) {
                i += other * stride;
                return *this;

            iterator& operator-= ( difference_type other ) {
                i -= other * stride;
                return *this;

        value_type stride;
        iterator b, e;

        constexpr XRange( value_type b, value_type e, value_type stride=1 )
            : stride(stride), b(b,stride), e(e,stride)  { }
        constexpr XRange( iterator b, iterator e, value_type stride=1 )
            : stride(stride), b(b,stride), e(e,stride)  { }

        constexpr iterator begin() { return b; }
        constexpr iterator end()   { return e; }

    typedef XRange<unsigned int> IRange;

     * enumerate [b,e] = XRange( [b,e+1) )
     * Creates an inclusive range from b to e.
    constexpr IRange enumerate( unsigned int b, unsigned int e,
                                unsigned int stride=1 ) {
        // Adding one to the en
        return IRange( b, e + 1, stride );

   int main() {
       for( unsigned int x : enumerate(1,10) )
           printf( "%d ", x );

Now, how well does gcc optimize this? Here, compiled with “g++ enumerate.cpp -std=c++11 -O3 -S”:

       .file "enumerate.cpp"
       .section .rodata.str1.1,"aMS",@progbits,1
       .string "%d " // The string we’re printing
       .string ""
       .section .text.startup,"ax",@progbits
       .p2align 4,,15
       .globl main
       .type main, @function
       subq $8, %rsp
       .cfi_def_cfa_offset 16
       movl $1, %edx      // Load 1
       movl $.LC0, %esi // Load “%d “
       movl $1, %edi     
       xorl %eax, %eax
       call __printf_chk // print “1 “
       movl $2, %edx    // Load 2
       movl $.LC0, %esi
       movl $1, %edi
       xorl %eax, %eax
       call __printf_chk // print “2 “
       movl $3, %edx    // Load 3...

GCC has optimized the loop entirely out! It is now equivalent to writing:

   printf( “%d “, 1 );
   printf( “%d “, 2 );
   // ...

We have found a truly zero-overhead abstraction with nicer syntax than in the past, and all with just a small amount of work. We can even use this with STL functions.

   std::vector<int> toVector( const IRange& r ) {
       std::vector<int> v;
       std::copy( r.begin(), r.end(), std::back_inserter(v) );
       return v;
       // Alternatively: return std::vector<int>( r.begin(), r.end() );
   int main() {
       auto v = toVector( enumerate(1,10) );
       for( unsigned int x : v )
           printf( "%d ", x );

We cannot insert into an IRange, take an element out, or any modifying action, but we can easily create a vector or list from one and then modify that. We can save the creation of a vector or initializer_list by using this instead, and not calling std::iota. 

In conclusion, I want to note that the above implementation is under-tested and could possibly be more standard compliant; but I thought the trick was so neat that I’d try and share it anyway. Feedback is welcome.

Monday, September 3, 2012

Function Composition in C++

Function composition can be achieved in C++ very similarly to partial application. First, what is it?

In Haskell, one can write something like f . g, meaning the composition of f and g. It produces a function, h, such that h(x) = f( g(x) ). Since Haskell uses curried notation, if f takes two arguments, then h would be defined h(x,y) = f( g(x), y ). Composition becomes especially useful when we can express a chain of function calls with it. For example this...

    auto y = f(x);
    auto z = g(y);
    auto a = h(z);

...could be rewritten like so, using Haskell notation:

    auto a = (h . g . f)(x);

As with partial application, one might first think of writing compose as a lambda function, but we don't know the type of the arguments. Instead, we create a type that holds f and g.

    template< class F, class G > struct Composition {
        F f;
        G g;

        template< class _F, class _G >
        constexpr Composition( _F&& f, _G&& g ) : f( forward<_F>(f) ), g( forward<_G>(g) ) { }

        template< class X, class ...Y >
        decltype( f( g(declval<X>()), declval<Y>()... ) )
        operator() ( X&& x, Y&& ...y ) {
            return f( g(forward<X>(x)), forward<Y>(y)... );

    template< class F, class G, class C = Composition<F,G> >
    constexpr C compose( F f, G g ) {
        return C( move(f), move(g) );

We can now very simply define a:

    auto a = compose( h, compose(g,f) )(x);
    auto b = compose( compose(h,g) f )(x); // Equivalently

Composition really doesn't get more complicated than that. We can implement a few different versions for slightly different purposes, but that's about it. Still, this adds flexibility, so let's look at some others. Each requires writing its own Composition class, but i'll just briefly go over the different ways to look at compositions.

First, i should note that the GCC extends the standard by adding compose1 and compose2. compose1 acts on unary functions (like f(x)), however it is unlike the the above compose in that it expects f to take only one argument (the return of g(x)). compose2 does something more interesting. Consider this:

    typedef std::pair<int,int> vec;
    int& vec_x( vec& v ) { return v.first; }
    int& vec_y( vec& v ) { return v.second; }

    auto sum_of_elements = __gnu_cxx::compose2( std::plus<int>(), vec_x, vec_y );
    int five = sum_of_elements( std::make_pair(2,3) );

compose2 takes one binary function (in this case, +), and two unary functions and produces a unary function. sum_of_elements(v) = vec_x(v) + vec_y(v). This is not standard, however, so implementing our own version makes sense for portability. I might rename this function to bcompose for "binary composition".

Our regular compose function expects a n-ary f and a unary g. Wouldn't a version expecting a unary f and n-ary g be helpful? I would call this mcompose.

At the end of the day, if a composition does not exists that perfectly meats one's needs, it can be trivially implemented. 

Partial Application in C++

All too often, one finds that std::for_each or std::transform are not enough and falls back to the plain old for loop. Or perhaps C++11's range-based for loop. Some examples:

    // Game loop:
    while( dont_quit() ) {
        t = update_time();
        for( auto& c : charactors 
            c = update( t , c );

    // Vector multiplication (x,y) * k = (x*k, y*k).
    vec operator* ( vec v, float k ) {
        for( auto& x : v ) 
            x = x * k;
        return v;

Partial application is one way of dealing with this. In Haskell I could write the last one as such:

    updateCharactors t cs = map (update t) cs
    mult xs k = map (* k) xs

where (*k) is the partial application of (*) with k. (*k) returns a function, f, such that f(x) = x*k.Haskell, using curried notation by default, allows a natural expression of partial application by simply declaring the function with less arguments than it takes. In C++ we have a little work to do.

Of coarse, std::bind exists, but it suffers from two problems: the horrid std::placeholders::_1 syntax and the requirement of the caller to explicitly state all of the unbound arguments with a _1 or _2. I find this syntax unfriendly, but generic. It allows one to switch arguments around, duplicate arguments, bind nested arguments, and all sorts of neat things. It does what it does well, but I want something easier to use.

Ones first thought will probably be to implement a function taking a function, f, and an argument, x, and return a lambda that will take a y. This assumes, however, that f only takes one more argument (it may take many or none) and that y will be the same type as x, which is not true in the about the update example.

A less obvious solution is to define a type that holds a function and one argument and overloads the () (function call) operator.

    #include <utility>

    using std::forward;
    using std::declval;

    template< class F, class X >
    struct Partial {
        F f;
        X x;

        // perfect forwarding
        template< class _F, class _X >
        constexpr Partial( _F&& f, _X&& x ) : f(forward<_F>(f)), x(forward<_X>(x)) { }

        // decltype declval
        template< class ...Y >
        decltype( f( x, declval<Y>()... ) )
        operator() ( Y&& ...y ) {
            return f( x, forward<Y>(y)... );

Now, we can make simple partial applications manually. Before trying to tackle the above examples of multiply and update, let's try a simpler one: plusOne. We'll use std::plus for simplicity and apply an int to it.

    auto addBy( int x ) -> Partial< std::plus<int>, int > {
        return Partial< std::plus<int>, int >( std::plus<int>(), x );

    auto plusOne = addBy( 1 );
    cout << plusOne(3) << '\n'; // 4.

While this works, the syntax is highly inconvenient and overly verbose. We can write a generic function that applies an argument to a function with syntax like papply(f,x), but we have to decide how it passes x along. If we pass by reference, then when addBy returns, it will return a Partial with a reference to its argument, which will no longer exist (undefined behaviour.). If we copy x, then types with expensive copies will be inefficient to partially apply.

To have our cake and eat it too, we can define two functions, closure and closet. Closures carry references to arguments and a function object, but in languages like C++, they suffer from the upward funarg problem (as described above, with addBy). Languages that use reference counting allow variables to outlast the scope of their functions, but stay alive since another object references them. They usually rely on some form of garbage collection. In C++, we can instead define who owns the garbage. I use the word "closet" as a sort of mnemonic for "closed-in closure". It copies its arguments.

    template< class F, class X, class P = Partial<F,X> >
constexpr P closure( F&& f, X&& x ) {
        return P( forward<F>(f), forward<X>(x) );

    template< class F, class X, class P = Partial<F,X> >
constexpr P closet( F f, X x ) {
        return P( move(f), move(x) );

Now, we can revisit addBy and the examples at the top.

    auto addBy( int x ) -> decltype( closet(std::plus<int>(),x) ) {
        return closet( std::plus<int>(), x );

    auto plusOne = addBy( 1 );

    // Game loop
    while( dont_quit() ) {
        std::transform( begin(charactors), end(charactors), begin(charactors),

            closure( update, update_time() ) );

    vec operator* ( vec v, float k ) {
        std::transform( begin(v), end(v), begin(v), closure(std::multiplies<float>(),k) ); 
        return v;

As you can see, we usually can use closure over closet, but they both have their uses. closure for passing down, closet for passing up.

I hope this has been useful, however it is still incomplete. closure and closet can be given variadic counter-parts which bind multiple arguments. We can also write versions to bind the last arguments (rclosure and rcloset?). I leave these as exercises for the reader. rclosure and -closet will require a new class, RPartial, but should be otherwise the same.

One could also use these techniques to write function transformations that do all sorts of weird things. For example, in stack-based languages, rot rotates the top three elements on the stack such that the bottom element (the first) becomes the top (last). Thinking of function arguments as a stack, one could write this:

    template< class F >
    struct Rot
        F f;

        template< class _F >
        constexpr Rot( _F&& f ) : f( forward<_F>(f) ) { }

        template< class Arg1, class ...Args >
        constexpr auto operator() ( Arg1&& arg1, Args&& ...args )
            -> decltype( f(declval<Args>()..., declval<Arg1>()) )
            return f( forward<Args>(args)..., forward<Arg1>(arg1) );

    template< class F >
    constexpr Rot<F> rot( F f )
        return Rot<F>( move(f) );

rot used with closure can partially apply the second or third argument or be used to flip the arguments of a binary function.