Friday, October 26, 2012

fmap in C++

Previously, I went over an introduction and explanation of fmap and a plausible implementation in C++, but couldn't achieve the same level of usefulness as in Haskell. Today I want to implement a much more generally useful fmap. This post assumes a working knowledge of fmap, but not of Haskell.

Just to review: fmap is a general way of saying "apply f to the value(s) of F(x)", where F, or Functor, might be an std::vector or std::unique_ptr or some user-defined type. For example fmap(f,ptr) means "apply f to the value ptr contains".

The problem was that we wanted one fmap that worked on all STL-like containers, and one that worked on all smart pointers, but the two functions had the same signature and it wouldn't compile.

template< class F, template<class...>class S, class X,
          class R = typename std::result_of<F(X)>::type >
S<R> fmap( F&& f, const S<X>& s ) {
    S<R> r;
    std::transform( std::begin(s), std::end(s),
                    std::back_inserter(r),
                    std::forward<F>(f) );
    return r;
}

template< class F, template<class...>class Ptr, class X,
          class R = typename std::result_of<F(X)>::type >
Ptr<R> fmap( F&& f, const Ptr<X>& p )
{
    return p != nullptr
        ? Ptr<R>( new R( std::forward<F>(f)(*p) ) )
        : nullptr;
}

A C++03 or TR1 programmer might first think to use std::enable_if and invent some solution that deduces to std::true_type on sequences and std::false_type on non-sequences; and ditto for pointers. This works, but we also want fmap to work on functions.

template< class F, class G, class C = Composition<F,G> >
C fmap( F f, G g ) {
    C( std::move(f), std::move(g) );
}

The std::enable_if for this would have to check that G is not a sequence nor a pointer. If one added another definition of fmap, the general case would again have to check for this. So I will not go over this solution.

A partial solution is to use decltype, which can be used as an std::enable_if at times.

template< class F, template<class...>class S, class X,
          class R = typename std::result_of<F(X)>::type >
auto fmap( F&& f, const S<X>& s )
    // Enable if std::begin(s) is defined.
    -> decltype( std::begin(s), std::declval<S<R>>() )
{
    S<R> r;
    std::transform( std::begin(s), std::end(s),
                    std::back_inserter(r),
                    std::forward<F>(f) );
    return r;
}

template< class F, template<class...>class Ptr, class X,
          class R = typename std::result_of<F(X)>::type >
Ptr<R> fmap( F&& f, const Ptr<X>& p )
    // Enable if p can be checked for null and dereferenced.
    -> decltype( *p, p==nullptr, std::declval<Ptr<R>>() )
{
    return p != nullptr
        ? Ptr<R>( new R( std::forward<F>(f)(*p) ) )
        : nullptr;
}


The problem still remains with the base case, composition. For that, even with decltype, one would have to fall back on disabling it for sequences and pointers, and any further types you specialize.


Tag Dispatch

Before std::enable_if, there was tag dispatch. The idea was you start with your tags.

struct sequence_tag {};
struct pointer_tag {};
struct other_tag {};

And then you define a traits class that defines the category of that type as a tag.

template< class X > struct fmap_traits {
    typedef other_tag category;
};

template< class X > struct fmap_traits< std::vector<X> > {
    typedef sequence_tag category;
};

template< class X > struct fmap_traits< std::unique_ptr<X> > {
    typedef pointer_tag category;
};

Now, rather than specializing fmap, we specialize fmap_impl which takes an extra argument, the category

template< class F, template<class...>class S, class X,
          class R = typename std::result_of<F(X)>::type >
S<R> fmap_impl( F&& f, const S<X>& s, sequence_tag ) {
    S<R> r;
    std::transform (
        std::begin(s), std::end(s),
        std::back_inserter(r),
        std::forward<F>(f)
    );
    return r;
}

template< class F, template<class...>class Ptr, class X,
    class R = typename std::result_of<F(X)>::type >
Ptr<R> fmap_impl( F&& f, const Ptr<X>& p, pointer_tag )
{
    return p != nullptr
        ? Ptr<R>( new R( std::forward<F>(f)(*p) ) )
        : nullptr;
}

The job of fmap is now to dispatch to the appropriate fmap_impl.

template< class F, class Functor,
          class C = typename fmap_traits<Functor<X>>::category >
auto fmap( F&& f, const Functor& fnct )
    -> decltype( fmap_impl( std::declval<F>(), fnct, C() ) );
{
    return fmap_impl( std::forward<F>(f), fnct, C() );
}

This technique originally allowed STL algorithms to choose the most efficient implementation based on whether an iterator supported random access (it+n) or whether it allowed for assignment (it2=it1) or not. The only problem is that we have to specialized fmap_traits for every single type on top of fmap_impl for each tag, though this is significantly less difficult than specializing fmap_impl for every type. Still, we can do better.


Type class dispatch.

First, instead of writing an fmap_traits class, we can use the decltype trick above to overload a function, category, that returns the correct tag, and just echos the type otherwise. We don't need to actually define it; a declaration will do.

struct sequence_tag {};
struct pointer_tag {};

template< class X >
X category( ... );

template< class S >
auto category( const S& s ) -> decltype( std::begin(s), sequence_tag() );

template< class Ptr >
auto category( const Ptr& p ) -> decltype( *p, p==nullptr, pointer_tag() );

Notice that if we supply category and int, it'll return an int, but if we give it a function pointer, it'll return pointer_tag! Why is that? Well, a function pointer is a pointer! You can dereference it and test it against null, so for this to work we have to add one extra layer of specialization.

template< class T > struct Category {
    using type = decltype( category<T>(std::declval<T>()) );
};

template< class R, class ... X > struct Category< R(&)(X...) > {
    using type = R(&)(X...);
};

template< class T >
using Cat = typename Category<T>::type;

category called on a function might return pointer_tag, but Cat<F>::type will be F.

Finally, instead of writing fmap_impl we will make a class called Functor that will implement fmap as a static member function. All we are doing is moving fmap_impl to Functor::fmap. fmap will then just call Functor::fmap.

template< class... > struct Functor;

template< class F, class FX, class Fun=Functor< Cat<FX> > >
auto fmap( F&& f, FX&& fx )
    -> decltype( Fun::fmap( std::declval<F>(), std::declval<FX>() ) )
{
    return Fun::fmap( std::forward<F>(f), std::forward<FX>(fx) );
}

// General case: composition
template< class Function > struct Functor<Function> {
    template< class F, class G, class C = Composition<F,G> >
    static C fmap( F f, G g ) {
        C( std::move(f), std::move(g) );
    }
};

template<> struct Functor< sequence_tag > {
    template< class F, template<class...>class S, class X,
              class R = typename std::result_of<F(X)>::type >
    static S<R> fmap( F&& f, const S<X>& s ) {
        S<R> r;
        std::transform( std::begin(s), std::end(s),
                        std::back_inserter(r),
                        std::forward<F>(f) );
        return r;
    }
};

emplate<> struct Functor< pointer_tag > {
    template< class F, template<class...>class Ptr, class X,
              class R = typename std::result_of<F(X)>::type >
    static Ptr<R> fmap( F&& f, const Ptr<X>& p )
    {
        return p != nullptr
            ? Ptr<R>( new R( std::forward<F>(f)(*p) ) )
            : nullptr;
    }
};

struct UserDefined { /* ... */ };
template<> struct Functor< UserDefined > {
    /* ... */
};

int main() {
    auto neg = [](int x){return -x;};
    std::unique_ptr<int> p( new int(5) );
    p = fmap( neg, fmap( neg, p ) );
    std::cout << "-5 = " << *p << std::endl;

    std::vector<int> w = { 1, 2, 3 };
    w = fmap( neg, w );
    std::copy( std::begin(w), std::end(w),
               std::ostream_iterator<int>(std::cout," ") );
    std::cout << std::endl;
}

It is very important that Functor<T>::fmap is static, or this will not work. One advantage is that we can still further specialize fmap for different types. For example, we can't call our fmap on an std::array since it has no member function push_back(). Instead, we can specialize fmap for std::array inside Functor<sequence>. A Functor specialization can overload as many or as few versions of fmap as it pleases.

At last, we not only have an fmap that works generically on STL containers, and all smart pointers, we have a technique that brings a different kind of polymorphism to C++. One that allows us to add specializations without modifying the previous ones. It's also surprisingly similar to the Haskell definition of Functor.

    class Functor f where
        fmap :: (a->b) -> f a -> f b

This is as if we had declared fmap like so:


template< class F, template<class...>class Fnct, class X,
          class R = typename std::result_of<F(X)>,
          class Fun=Functor< Cat<Fnct<X>> > >
Fnct<R> fmap( F&& f, const Fnct<X>& fx ) {
    return Fun::fmap( std::forward<F>(f), fx );
}



This is slightly less generic, however. A given fmap implementation might decide not to return a Fnct<R>. But just like how we instantiate template specializations, Haskellers create fmap instances, too!

    instance Functor Maybe where
        fmap f (Just x) = Just (f x)
        fmap f Nothing = Nothing

Our Functor<pointer_tag> is defined quite similarly to this! This form of specialization works equally well for other Haskell type classes like Monad (next article), Monoid, Applicative, Alternative, you name it! 


The complete final source code for this article can be found here: https://gist.github.com/3960343
Learn more about type classes: http://en.wikipedia.org/wiki/Type_class
Take a tour of some of Haskell's type classes: http://www.haskell.org/haskellwiki/Typeclassopedia

I've been using the same reference for tag dispatch for five years: http://www.generic-programming.org/languages/cpp/techniques.php