Lecture 11: Advanced iterators

The typical use for iterators is as positions into a container located somewhere in memory—a pointer into an array, or an iterator into a vector or list. But another purpose of this class is to show you interesting and important programming patterns you might not have seen before. This will give you more tools for your future life in programming and generally broaden your mind.

The three patterns for today: iterators that build up data structures, iterators without an underlying collection, and iterators that are themselves containers.


Constructor iterators

Here are two algorithms that eliminate undesired elements from a collection. First, here’s a simple one that overwrites a destination array.

template <typename T, typename P>
T* remove_copy_if(T* first, T* last, T* result, P predicate) {
   while (first != last) {
      if (!pred(*first)) {
         *result = *first;
         ++result;
      }
      ++first;
   }
   return result;
}

Here’s how it would work when passed an input array containing [4,5,7,0,8] and the following predicate:

bool is_odd(int x) {
  return (x % 2) != 0;
}

!(Figure 1)[l11-removecopy.gif]

Note how result moves forward only when predicate returns false. The function returns the final value of result so the caller knows how many items were added. The caller had better make sure they pass an array containing enough space!

A little tweak makes it work for any iterators, including lists, as long as result’s dereference operator (*result) returns a modifiable reference:

template <typename INPUT, typename OUTPUT, typename P>
OUTPUT remove_copy_if(INPUT first, INPUT last, OUTPUT result,
                      P predicate) {
   while (first != last) {
      if (!pred(*first)) {
         *result = *first;
         ++result;
      }
      ++first;
   }
   return result;
}

The code is the same, but it applies to more types, just like we wanted. However, what if we wanted to initialize a new container, rather than overwrite the contents of an existing container? We might first write code like this:

template <typename INPUT, typename CONTAINER, typename P>
void copy_into_container_unless(INPUT first, INPUT last,
                                CONTAINER &c, P predicate) {
   while (first != last) {
      if (!pred(*first))
         c.push_back(*first);
      ++first;
   }
}

The pattern is almost the same, except we call the container’s push_back() function to append new items to the container’s end.

It’s a shame to have to remember two functions that do almost the same thing. Can we design an algorithm that works equally well in both situations?

Yes, if we use C++’s ability to overload assignment. Normally when we perform an assignment, like x = y, the C++ compiler simply moves bits from one place to another. But users can override this, just like users can override operator== (comparison) or operator* (dereference).

A back_inserter iterator is an iterator that abstracts the position at the end of a container. When dereferenced, it returns a proxy object. The proxy is useless for anything but assignment. When assigned to, the proxy inserts a new element into the container.

We’ll write the proxy first:

template <typename T>
class back_insert_proxy {
public:
   /** Construct a back_insert_proxy for container @a c. */
   back_insert_proxy(T &c)
      : container_(c) {
   }
   /** Insert @a value into the proxy’s container using
       container.push_back(@a value). */
   template <typename V>
   back_insert_proxy<T>& operator=(V value) {
      container_.push_back(value);
      return *this;
   }
private:
   T &container_;
};

Here’s how this might work:

vector<int> v;
back_insert_proxy<vector<int>> proxy(v);    

v.push_back(4);
proxy = 5;
v.push_back(6);
proxy = 7;    

std::cout << v[0] << ',' << v[1] << ',' << v[2] << ',' << v[3];
    // prints "4 5 6 7"

Now, we complete our work by hooking the back_insert_proxy up to an iterator, the back_insert_iterator. Dereferencing a back_insert_iterator returns a back_insert_proxy. All other iterator operations do nothing! The abstract position that “inserts into the back of a container” never changes, so operator++ does nothing.

template <typename T>
class back_insert_iterator {
public:
   /** Construct a back_insert_iterator for container @a c. */
   back_insert_iterator(T &c)
      : container_(c) {
   }
   back_insert_proxy<T> operator*() const {
      return back_insert_proxy<T>(container_);
   }
   back_insert_iterator<T> &operator++() {
      return *this;
   }
   /** Test if @a x is a back_insert_iterator for the same
       container. */
   bool operator==(const back_insert_iterator<T> &x) const {
      // Test objects for equality by comparing their pointers
      return &container_ == &x.container_;
   }
private:
   T &container_;
};

We can make back_insert_iterators a little easier to use with a helper function.

template <typename T>
back_insert_iterator<T> back_inserter(T &container) {
   return back_insert_iterator<T>(container);
}

This is useful because the function “knows” the right type for a container’s back insert iterator, which in turn lets us use “auto”:

vector<int> container;
auto inserter = back_inserter(container);

// Which is much easier to type than:
back_insert_iterator<vector<int>> inserter(container);

The magic: Now this

remove_copy_if(first, last, back_inserter(container), is_odd);

does exactly the same thing as

copy_into_container_unless(first, last, container, is_odd);

The iterator pattern, suitably understood and generalized, lets us write a more generic algorithm, and our code is just as easy to read! Furthermore, it’s easy to write variants of the back_inserter pattern that insert at the front, or at a specific position in the middle. We can drop copy_into_container_unless altogether; it is redundant.

To recap:

The C++ standard library includes back_inserter and back_insert_iterator, as well as remove_copy_if.

(Think question: Do you really need a separate type for back_insert_proxy?)

Implicit collections: Enumerators

The back_insert_iterator abstracts a somewhat surprising position in a collection. We now turn to iterators that represent nonexistent collections—collections that would be too big or expensive to store directly in memory. This works because, as we’ve seen throughout, algorithms represent collections as iterator pairs. This lets them handle sub-collections transparently. It also means the algorithms don’t actually need a collection to exist separately in memory—they just need to iterate over it!

Here’s a particularly simple example: iterating over the integers. Since the iterator returns values from a collection that doesn’t exist, we will call it an enumerator.

class int_enumerator {
public:
   /** Construct an int_enumerator pointing at @a x. */
   int_enumerator(int x)
      : x_(x) {
   }
   int operator*() const {
      return x_;
   }
   int_enumerator &operator++() {
      ++x_;
      return *this;
   }
   bool operator==(const int_enumerator &x) const {
      return x_ == x.x_;
   }
   bool operator<(const int_enumerator &x) const {
      return x_ < x.x_;
   }
   // !=, <=, >=, > too
private:
   int x_;
};

Why is this useful? Well, here’s a simple way to create a vector that contains all the integers between 0 and 100000 that don’t satisfy some predicate:

vector<int> v;
remove_copy_if(int_enumerator(0), int_enumerator(100000),
               back_inserter(v), predicate);

We don’t explicitly construct a vector with all the values 0, 1, 2, …, 99999. The pair of int_enumerators will generate those values, in that order. The iterator pair defines an implicit collection. 8 bytes stand in for 400000—not a bad tradeoff.

Of course, the simple “C-like” code for this is easy too.

for (int i = 0; i < 100000; ++i)
   if (!predicate(i))
      v.push_back(i);

I would write this version in practice. The implicit iterator pattern provides more benefit with more complex algorithms, or more complex collections. We turn to such a collection next.

Abstract collections: Generators

The subset sum problem is defined as follows: Given a set s of integers s[0], s[1], …, s[n–1], return a nonempty subset x ⊂ s so that the sum of the elements of x equals zero.

Subset sum is an interesting and difficult problem. Several cryptographic algorithms use it to hide information from snoopers. It is NP-complete, meaning that we don’t know of any algorithm that can solve it in general in polynomial time, and most of us believe that no polynomial-time algorithm exists.

Before going further, think about how you would implement subset sum in a generic way, without making assumptions about how your integers are stored—or even whether they are integers at all (perhaps they are complex numbers). Perhaps you would simply enumerate all subsets, and store each subset in a separate vector or something? But there are 2n subsets of n elements! Will you have room for them all? And why copy the elements, anyway? Can you do it with less memory?


Here’s a generic implementation of subset sum that uses a constant amount of space.

struct subset_sum_predicate {
   template <typename T>
   bool operator()(T subset) {
      typename T::element_type sum = 0;
      for (auto it = subset.begin(); it != subset.end(); ++it)
         sum += *it;
      return sum == 0 && subset.begin() != subset.end();
   }
};

/** Return a nonempty subset of [first, last) that sums to 0.
 * If no nonempty subset exists, return an empty subset. */
template <typename IT>
subset_generator<IT> subset_sum(IT first, IT last) {
   return find_if(subsets_begin(first, last),
                  subsets_end(first, last),
                  subset_sum_predicate());
}

This extremely simple code relies on some pretty interesting technology, namely functions and types that know how to enumerate all subsets of an input range.

/** Generates all subsets of elements in a range [first, last). */
template <typename IT>
class subset_generator { ... };

/** Return the begin subset_generator for [first, last).
 * @return subset_generator x where *x is an empty subset
 *
 * The following loop will enumerate all the subsets of
 * [first,last) in some order, starting with the empty subset:
 *
 * for (auto it = subsets_begin(first, last);
 *      it != subsets_end(first, last);
 *      ++it)
 *   // do something with *it
 *
 * If it is a subset_generator, then (*it).begin() and (*it).end()
 * are iterators over some subset of [first, last).
 */
template <typename IT>
subset_generator<IT> subsets_begin(IT first, IT last) {
   ???
}

/** Return the end subset_generator for [first, last).
 * @return subset_generator x
 * @post *x is equivalent to an empty subset. */
template <typename IT>
subset_generator<IT> subsets_end(IT first, IT last) {
   ???
}

Things to note:

Thus, subsets_begin and subsets_end define an implicit collection of collections. We thus call the subset_generator class a generator to emphasize that it is generating collections.

It’s best to pause now to point out how awesome this is. A simple predicate and a simple standard function solve a very complicated problem. All we need is to define the subset_generator code itself. And that subset_generator code will be useful in many more contexts—wherever subsets are required—rather than remaining specific to the subset sum problem.

(Of course, a fast implementation of subset sum would likely a more complex algorithm than simply checking all subsets. But the software techniques above would still be useful in a more advanced subset sum. It’d be an interesting problem to figure out how!)

The code for subset_sum is now presented without further comment (and with minimal source comments). It is compilable with a C++11 compiler, and therefore has a bit more boilerplate and C++isms than our examples usually do (typename and std::iterator, for example). It uses 64-bit integers to represent subsets and therefore requires the original set contain at most 63 elements; extending it to larger sets wouldn’t be difficult. It’s highly recommended that you look at the code and think about how it works. The fix() pattern in subset_iterator, for example, will likely be useful in your own code.

#include <algorithm>
#include <iostream>
#include <stdint.h>
#include <stdlib.h>

typedef uint64_t subset_chooser_type;

/** @brief Iterates over a subset of the range [first, last).
 *
 * Only elements indicated by 1 bits in @a subset are part of the
 * range. */
template <typename IT>
class subset_iterator {
public:
   typedef typename std::iterator_traits<IT>::value_type value_type;

   subset_iterator(IT first, IT last, subset_chooser_type chooser)
      : first_(first), last_(last), chooser_(chooser) {
      fix();
   }
   value_type operator*() const {
      return *first_;
   }
   subset_iterator<IT>& operator++() {
      ++first_, chooser_ >>= 1;
      fix();
      return *this;
   }
   bool operator==(const subset_iterator<IT>& x) const {
      return first_ == x.first_;
   }
   bool operator!=(const subset_iterator<IT>& x) const {
      return !(*this == x);
   }

private:
   IT first_;
   IT last_;
   subset_chooser_type chooser_;

   void fix() {
      while (first_ != last_ && !(chooser_ & 1))
         ++first_, chooser_ >>= 1;
   }
};

/** @brief Generates all subsets of elements in a range
 * [first, last). */
template <typename IT>
class subset_generator
    : public std::iterator<std::input_iterator_tag,
                           subset_generator<IT> > {
public:
   typedef typename std::iterator_traits<IT>::value_type
       element_type;

   static subset_generator<IT> begin(IT first, IT last) {
      return subset_generator<IT>(first, last, 0);
   }
   static subset_generator<IT> end(IT first, IT last) {
      // We enumerate over subsets by repeatedly incrementing a
      // number with (last - first) bits. We're done when we hit
      // the number 2**(last - first).
      subset_chooser_type end_chooser = subset_chooser_type(1)
            << std::distance(first, last);
      return subset_generator<IT>(first, last, end_chooser);
   }

   const subset_generator<IT>& operator*() const {
      return *this;
   }
   subset_generator<IT>& operator++() {
      ++current_subset_;
      return *this;
   }
   bool operator==(const subset_generator<IT>& x) const {
      return first_ == x.first_ && last_ == x.last_
          && current_subset_ == x.current_subset_;
   }
   bool operator!=(const subset_generator<IT>& x) const {
      return !(*this == x);
   }

   /** Return an iterator to the beginning of the current subset. */
   subset_iterator<IT> begin() const {
      return subset_iterator<IT>(first_, last_,
           current_subset_);
   }
   /** Return an iterator to the end of the current subset. */
   subset_iterator<IT> end() const {
      return subset_iterator<IT>(last_, last_,
           current_subset_);
   }

private:
   IT first_;
   IT last_;
   subset_chooser_type current_subset_;

   subset_generator(IT first, IT last, subset_chooser_type sid)
      : first_(first), last_(last), current_subset_(sid) {
   }
};

template <typename IT>
subset_generator<IT> subsets_begin(IT first, IT last) {
   return subset_generator<IT>::begin(first, last);
}

template <typename IT>
subset_generator<IT> subsets_end(IT first, IT last) {
   return subset_generator<IT>::end(first, last);
}

struct subset_sum_predicate {
   template <typename T>
   bool operator()(T subset) {
      typename T::element_type sum = 0;
      for (auto it = subset.begin(); it != subset.end(); ++it)
         sum += *it;
      return sum == 0 && subset.begin() != subset.end();
   }
};

/** Return a nonempty subset of [first, last) that sums to 0.
 * If no nonempty subset exists, return an empty subset. */
template <typename IT>
subset_generator<IT> subset_sum(IT first, IT last) {
   return std::find_if(subsets_begin(first, last),
         subsets_end(first, last),
         subset_sum_predicate());
}

/** Run, for example, "./subset_sum 1 40 30 -100 3 75 -2 -4" */
int main(int argc, char **argv) {
   std::vector<int> v;
   for (int i = 1; i < argc; ++i)
      v.push_back(strtol(argv[i], 0, 0));

   auto subset = subset_sum(v.begin(), v.end());
   if (subset.begin() == subset.end()) {
      std::cout << "No subset sum\n";
      exit(1);
   } else {
      std::cout << "Subset:";
      for (auto it = subset.begin(); it != subset.end(); ++it)
         std::cout << ' ' << *it;
      std::cout << '\n';
      exit(0);
   }
}

Posted on February 18, 2012