String Tokenizer Iterator Class





4.00/5 (6 votes)
A string tokenizer iterator class that works with std::string
Introduction
As a part of a larger project I had to write some basic string utility functions and classes. One of the things needed was a flexible way of splitting strings into separate tokens.
As is often the case when it comes to programming, there are different ways to handle a problem like this. After reviewing my options I decided that an iterator based solution would be flexible enough for my needs.
Non-iterator based solutions to this particular problem often have the disadvantage of tying the user to a certain container type. With an iterator based tokenizer the programmer is free to chose any type of container (or no container at all). Many STL containers such as std::list
and std::vector
offer constructors that can populate the container from a set of iterators. This feature makes it very easy to use the tokenizer.
Example usage
std::vector<std::string> s(string_token_iterator("one two three"), string_token_iterator()); std::copy(s.begin(), s.end(), std::ostream_iterator<std::string>(std::cout,"\n")); // output: // one // two // three std::copy(string_token_iterator("one,two..,..three",",."), string_token_iterator(), std::ostream_iterator<std::string>(std::cout,"\n")); // same output as above
The code has been tested with Visual C++.NET and GCC 3.
The Code
#include <string> #include <iterator> struct string_token_iterator : public std::iterator<std::input_iterator_tag, std::string> { public: string_token_iterator() : str(0), start(0), end(0) {} string_token_iterator(const std::string & str_, const char * separator_ = " ") : separator(separator_), str(&str_), end(0) { find_next(); } string_token_iterator(const string_token_iterator & rhs) : separator(rhs.separator), str(rhs.str), start(rhs.start), end(rhs.end) { } string_token_iterator & operator++() { find_next(); return *this; } string_token_iterator operator++(int) { string_token_iterator temp(*this); ++(*this); return temp; } std::string operator*() const { return std::string(*str, start, end - start); } bool operator==(const string_token_iterator & rhs) const { return (rhs.str == str && rhs.start == start && rhs.end == end); } bool operator!=(const string_token_iterator & rhs) const { return !(rhs == *this); } private: void find_next(void) { start = str->find_first_not_of(separator, end); if(start == std::string::npos) { start = end = 0; str = 0; return; } end = str->find_first_of(separator, start); } const char * separator; const std::string * str; std::string::size_type start; std::string::size_type end; };