Effective STL
Item2 当心与容器无关(container-independent)的代码这个错觉
STL是基于泛型思想的,数组泛化为container,并根据它们所包含的对象类型而进行参数化。函数泛化为algorithms,并根据它们所使用的iterators类型而进行参数化。指针泛化为iterators,并根据它们所指向的对象类型而进行参数化。
但这只是个开始。单个container类型泛化为序列容器(sequence container)和关联容器(associative container),相似的container具有相似的功能。标准的连续内存container(见Item 1)提供了随机存取iterators;而标准的基于结点的container(见Item 1)提供了双向iterators。Sequence containers支持push_front和(或)push_back,而Associative containers则不支持。Associative containers提供了操作复杂度为时间的(译注:这是我的理解,不知道对不对)对数的lower_bound,upper_bound和equal_range成员函数,而Sequence containers则不提供。
随着整个这样的泛化过程的进行,很自然地我们也想加入进来。这种情绪是值得赞赏的,而且你在写自己的container、iterators和algorithms时,你当然也想要追求这种泛化过程。可惜的是,很多程序员是以一种不同的方式来进行的。
通常,他们不是在他们的软件中实现一个特定类型的container,而是试图泛化一个container(比如说,一个vector)的概念以便于使用,仍然保留了将来可以替代它的选择(比如一个deque或是一个list)——所有的这一切不用改变使用这个container的代码。换句话说,他们力图编写与container无关的代码。
这种类型的泛化,尽管是一种出于善意的考虑,终究还是走错了方向。即使是最为热心于写与container无关的代码的提倡者,很快他就会认识到,力图去写一个可以让Sequence containers和Associative containers同时运作的软件几乎是没有意义的。
许多的成员函数是只存在于一类container中的。例如,只有Sequence containers支持push_front或push_back,而只有Associative containers支持count和 lower_bound等等。即使是像insert和erase这种具有基本特征和语义的操作,也是随着container类型的不同而不同的。例如,当你插入一个对象到一个Sequence containers中时,它是处于插入位置的。但当你插入一个对象到一个Associative containers中时,container将把这个对象移动到一个按这个对象在container中的索引顺序排列的位置。另一个例子是,以一个iterators为输入的这种erase操作,在一个Sequence containers中调用时返回一个新的iterators,但在Associative containers中调用时则什么也不返回(Item 9给了一个例子,解释了这将会怎样影响你所写的代码)。
接着我们假定你很想写这样的代码,它可以使用最为常用的Sequence containers,如vector、deque和list。很明显,你必须要编程实现它们的功能的交集,这意味着不能使用reserve或capacity(见Item 14),因为deque和list没有提供这样的成员函数。list的存在也意味着你要放弃operator[],而且你要将自己的代码限制到双向iterators的所提供的能力。反过来说,这意味着你必须将你的代码不能使用要求随机存取iterators的algorithms,这些algorithms包括sort, stable_sort,partial_sort和nth_element (见Item 31)。
另一方面来说,你想要vector不使用push_front和pop_front,并且vector和deque不能使用splice和sort形式的成员函数。上述两种限制中,后者意味着在你的泛化了的Sequence containers中将没有sort形式的调用。
这是明显的情形,如果你违犯了上述的任何一条限制,你的代码会因为你想使用的至少其中一个container而不能编译通过。将要编译的代码将会有更多的隐患。
主犯是应用于不同的Sequence containers,使得iterators、指针和引用失效的规则是不同的。为了使所写的代码同vector、deque和list正确地运作,你必须假定对于任何一个使得在这样的任何一个container里的iterators、指针或是引用失效的操作,在你正在使用的container里也是失效的。从而你必须假定每一个对insert的调用总是失效的,因为deque::insert使得所有的iterators失效,而且由于缺乏调用capacity的能力,必须认为vector::insert操作使所有的指针和引用失效(Item 1解释了deque在某些时候是很独特的,它会让它的iterators失效而不使它的指针和引用实效)。同样的原因可以得出一个结论,必须认为每一个对erase的调用将使一切东西失效。
还想要更多的例子?好的,你不能传递container里的数据给C的接口,因为只有vector支持这个特性(见Item 16);你不能用bool作为所存储对象的类型实体化你的container,因为,正如Item 18所解释的那样,vector<bool>并不总是像一个vector那样表现,并且它从不实际存储bools;你不能认为list的插入和删除操作(的复杂度)是常量,因为vector和deque在做这些操作时(的复杂度)是线性的。
等该说的说完了该做的做完了以后,留给你的是这样的一个泛化了的Sequence containers:你不能调用reserve,capacity,operator[],push_front,pop_front,splice或者任何一个需要随机存取iterators的algorithms;这个container的每一个insert和erase调用的操作复杂度是线性的,并且它使所有的iterators、指针和引用失效;并且这个container不能同C兼容,不能存储bools。这样的container是你在想在你的应用程序中使用的吗?我想不是。
如果你压制住你的野心,决定放弃对list的支持。但是你仍然要放弃reserve,capacity,push_front和pop_front;你仍然必须假定所有对insert和erase的调用(的操作复杂度)是线性的并且使一切失效;你仍然要失去同C兼容的内存布局(译注:我想应该是内存布局吧);并且你仍然不能存储bools
如果你要放弃Sequence containers,而准备改为用不同的Associative containers来进行编码,情况也好不到那里去。为set和map写这样的代码基本上是不可能的,因为sets存储单个对象而maps存储一对对象。虽然可以为set和multiset(或map和multimap)做这样的强制编码,但是,同其它形式的形式相比较,只有一个值(译注:不好意思,没看源代码,我猜是参数)的insert成员函数对于sets/maps的返回值是不一样的。并且你必须谨慎地避免对在container中存储了多少份值的拷贝作任何的假设。对于map和multimap而言,你必须避免使用operator[],因为这个成员函数只有map才有。
面对现实:这样做是不值得的。不同的container在不同的情形有它们自己的优点和缺点。它们不是设计为可互换的,你基本上是不可能涵盖它们的。如果你想试的话,那你只不过是做春秋大梦,但是这个梦却不是那么的甜美。尽管如此,当你认识到你该选择一个什么样的container的时候,也是黎明破晓之时。嗯,尽管不是最好的,你需要使用一个不同的container类型。现在你明白了,当你改变container类型时,你不仅要修正你的编译器诊断出的所有问题。你也需要检验所用使用container的代码,一方面看看它是否需要根据新container的性能特征作出改变;一方面看看它使iterators、指针和引用失效的规则。
如果你不用vector而转为使用其它的container,你也必须要确定你不再依赖于vetor的与C兼容的内存布局;如果你由其它container转为使用vector,你必须确定你不使用它来存储bools。
尽管不可避免地要时常改变container类型,你也可以以通常的方式方便地作出这种改变,封装、封装,还是封装。其中,最简单的方式就是对container和iterators类型随意地使用typedefs,从而,不要这样编码:
class Widget { ... };
vector<Widget> vw;
Widget bestWidget;
... // give bestWidget a value
vector<Widget>::iterator i = // find a Widget with the
find(vw.begin(), vw.end(), bestWidget); // same value as bestWidget
应该这样写:
class Widget { ... };
typedef vector<Widget> WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw;
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget);
这将会使改变container类型要容易得多,这样做将会给你带来很大的方便,如果问题改变了,只需简单地加入一个自定义的allocator即可。(这样的改变不影响使iterators、指针和引用失效的规则)
class Widget { ... };
template<typename T> // see Item 10 for why this
SpecialAllocator { ... }; // needs to be a template
typedef vector<Widget, SpecialAllocator<Widget> > WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw; // still works
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget); // still works
如果typedefs的封装层面对你而言没有意义,你还是可能认可这样做所节省的工作量。例如,如果你有这样的一个对象类型
map<string,
vector<Widget>::iterator,
CIStringCompare> // CIStringCompare is “case-
// insensitive string compare;”
// Item 19 describes it
而且你想要用const_iterators遍历整个map,你真的想不止一次地这样拼写吗?
map<string, vector<Widget>::iterator, CIStringCompare>::const_iterator
在你用过STL一段时间以后,你会认识到typedefs是一个不错的朋友。一个typedef只是一些其它类型的同义词,所以它所提供的完全是字面上的封装。但是一个typedef不能阻止它的用户做(或是依赖)任何它们没有准备好的(或是依赖的)事。如果你想要限制你暴露给用户的container的选择的话,你需要加强封装性,你需要用类。
如果你想用另一个container来代替现有的一个,为了限制需要修改的代码,可将container封装到一个类里,并且限制可通过类的接口访问的与类相关的信息的数量。例如,如果你需要创建一个自定义的list,不要直接使用list。你应该创建一个CustomerList类,并封装一个list在它的私有部分:
class CustomerList {
private:
typedef list<Customer> CustomerContainer;
typedef CustomerContainer::iterator CCIterator;
CustomerContainer customers;
public: // limit the amount of list-specific
... // information visible through
}; // this interface
咋一看,这可能显得有点笨拙,毕竟一个自定义的list还是一个list,是吧?嗯,可能是的。随后你会发现你不必像你通常预期的那样在list的中部插入或是删除客户,你只是需要快速找出你的客户最前面20%的信息——一个对nth_elementalgorithms(见Item 31)的一个特制的任务。但是nth_elementalgorithms需要随机存取iterators,它不能用在一个list中。在这种情况里,你的自定义“list”可能用vector或是deque实现更好一些。
你在考虑这类变化时,你仍然必须检查每一个CustomerList的成员函数和每一个友元,看看它们是怎么被影响的(根据性能和iterators/指针/引用的失效规则,等等)。但是如果你已经对CustomerList的实现细节作了很好的封装以后,那么对于CustomerList用户的影响将会很小。你不能写与container无关的代码,但是这些代码可能能做到与container无关。
(译注:这是Scott Meyers的新书 《Effective STL: 提高STL使用技术的50招》其中一条,在Addison-Wesley网站上公布了其中几条,好东东不敢独享,特此让大家看看,希望有帮助,作这样的翻译是我的第一次尝试,很多地方我自己都觉得不满意,贴上去让大家指教一下好了)
以下是原文:
Item 2: Beware the illusion of container-independent code.
The STL is based on generalization. Arrays are generalized into containers and parameterized on the types of objects they contain. Functions are generalized into algorithms and parameterized on the types of iterators they use. Pointers are generalized into iterators and parameterized on the type of objects they point to.
That’s just the beginning. Individual container types are generalized into sequence and associative containers, and similar containers are given similar functionality. Standard contiguous-memory containers (see Item 1) offer random-aclearcase/" target="_blank" >ccess iterators, while standard node-based containers (again, see Item 1) provide bidirectional iterators. Sequence containers support push_front and/or push_back, while associative containers don’t. Associative containers offer logarithmic-time lower_bound, upper_bound, and equal_range member functions, but sequence containers don’t.
With all this generalization going on, it’s natural to want to join the movement. This sentiment is laudable, and when you write your own containers, iterators, and algorithms, you’ll certainly want to pursue it. Alas, many programmers try to pursue it in a different manner.
Instead of committing to particular types of containers in their software, they try to generalize the notion of a container so that they can use, say, a vector, but still preserve the option of replacing it with something like a deque or a list later — all without changing the code that uses it. That is, they strive to write container-independent code.
This kind of generalization, well-intentioned though it is, is almost always misguided. Even the most ardent advocate of container-independent code soon realizes that it makes little sense to try to write software that will work with both sequence and associative containers.
Many member functions exist for only one category of container, e.g., only sequence containers support push_front or push_back, and only associative containers support count and lower_bound, etc. Even such basics as insert and erase have signatures and semantics that vary from category to category. For example, when you insert an object into a sequence container, it stays where you put it, but if you insert an object into an associative container, the container moves the object to where it belongs in the container’s sort order. For another example, the form of erase taking an iterator returns a new iterator when invoked on a sequence container, but it returns nothing when invoked on an associative container. (Item 9 gives an example of how this can affect the code you write.)
Suppose, then, you aspire to write code that can be used with the most common sequence containers: vector, deque, and list. Clearly, you must program to the intersection of their capabilities, and that means no uses of reserve or capacity (see Item 14), because deque and list don’t offer them. The presence of list also means you give up operator[], and you limit yourself to the capabilities of bidirectional iterators. That, in turn, means you must stay away from algorithms that demand random access iterators, including sort, stable_sort, partial_sort, and nth_element (see Item 31).
On the other hand, your desire to support vector rules out use of push_front and pop_front, and both vector and deque put the kibosh on splice and the member form of sort. In conjunction with the constraints above, this latter prohibition means that there is no form of sort you can call on your “generalized sequence container.”
That’s the obvious stuff. If you violate any of those restrictions, your code will fail to compile with at least one of the containers you want to be able to use. The code that will compile is more insidious.
The main culprit is the different rules for invalidation of iterators, pointers, and references that apply to different sequence containers. To write code that will work correctly with vector, deque, and list, you must assume that any operation invalidating iterators, pointers, or references in any of those containers invalidates them in the container you’re using. Thus, you must assume that every call to insert invalidates everything, because deque::insert invalidates all iterators and, lacking the ability to call capacity, vector::insert must be assumed to invalidate all pointers and references. (Item 1 explains that deque is unique in sometimes invalidating its iterators without invalidating its pointers and references.) Similar reasoning leads to the conclusion that every call to erase must be assumed to invalidate everything.
Want more? You can’t pass the data in the container to a C interface, because only vector supports that (see Item 16). You can’t instantiate your container with bool as the type of objects to be stored, because, as Item 18 explains, vector<bool> doesn’t always behave like a vector, and it never actually stores bools. You can’t assume list’s constant-time insertions and erasures, because vector and deque take linear time to perform those operations.
When all is said and done, you’re left with a “generalized sequence container” where you can’t call reserve, capacity, operator[], push_front, pop_front, splice, or any algorithm requiring random access iterators; a container where every call to insert and erase takes linear time and invalidates all iterators, pointers, and references; and a container incompatible with C where bools can’t be stored. Is that really the kind of container you want to use in your applications? I suspect not.
If you rein in your ambition and decide you’re willing to drop support for list, you still give up reserve, capacity, push_front, and pop_front; you still must assume that all calls to insert and erase take linear time and invalidate everything; you still lose layout compatibility with C; and you still can’t store bools.
If you abandon the sequence containers and shoot instead for code that can work with different associative containers, the situation isn’t much better. Writing for both set and map is close to impossible, because sets store single objects while maps store pairs of objects. Even writing for both set and multiset (or map and multimap) is tough. The insert member function taking only a value has different return types for sets/maps than for their multi cousins, and you must religiously avoid making any assumptions about how many copies of a value are stored in a container. With map and multimap, you must avoid using operator[], because that member function exists only for map.
Face the truth: it’s not worth it. The different containers are different, and they have strengths and weaknesses that vary in significant ways. They’re not designed to be interchangeable, and there’s little you can do to paper that over. If you try, you’re merely tempting fate, and fate doesn’t like to be tempted. Still, the day will dawn when you’ll realize that a container choice you made was, er, suboptimal, and you’ll need to use a different container type. You now know that when you change container types, you’ll not only need to fix whatever problems your compilers diagnose, you’ll also need to examine all the code using the container to see what needs to be changed in light of the new container’s performance characteristics and rules for invalidation of iterators, pointers, and references.
If you switch from a vector to something else, you’ll also have to make sure you’re no longer relying on vector’s C-compatible memory layout, and if you switch to a vector, you’ll have to ensure that you’re not using it to store bools.
Given the inevitability of having to change container types from time to time, you can facilitate such changes in the usual manner: by encapsulating, encapsulating, encapsulating. One of the easiest ways to do this is through the liberal use of typedefs for container and iterator types. Hence, instead of writing this,
class Widget { ... };
vector<Widget> vw;
Widget bestWidget;
... // give bestWidget a value
vector<Widget>::iterator i = // find a Widget with the
find(vw.begin(), vw.end(), bestWidget); // same value as bestWidget
write this:
class Widget { ... };
typedef vector<Widget> WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw;
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget);
This makes it a lot easier to change container types, something that’s especially convenient if the change in question is simply to add a custom allocator. (Such a change doesn’t affect the rules for iterator/pointer/reference invalidation.)
class Widget { ... };
template<typename T> // see Item 10 for why this
SpecialAllocator { ... }; // needs to be a template
typedef vector<Widget, SpecialAllocator<Widget> > WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw; // still works
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget); // still works
If the encapsulating aspects of typedefs mean nothing to you, you’re still likely to appreciate the work they can save. For example, if you have an object of type
map<string,
vector<Widget>::iterator,
CIStringCompare> // CIStringCompare is “case-
// insensitive string compare;”
// Item 19 describes it
and you want to walk through the map using const_iterators, do you really want to spell out
map<string, vector<Widget>::iterator, CIStringCompare>::const_iterator
more than once? Once you’ve used the STL a little while, you’ll realize that typedefs are your friends. A typedef is just a synonym for some other type, so the encapsulation it affords is purely lexical. A typedef doesn’t prevent a client from doing (or depending on) anything they couldn’t already do (or depend on). You need bigger ammunition if you want to limit client exposure to the container choices you’ve made. You need classes.
To limit the code that may require modification if you replace one container type with another, hide the container in a class, and limit the amount of container-specific information visible through the class interface. For example, if you need to create a customer list, don’t use a list directly. Instead, create a CustomerList class, and hide a list in its private section:
class CustomerList {
private:
typedef list<Customer> CustomerContainer;
typedef CustomerContainer::iterator CCIterator;
CustomerContainer customers;
public: // limit the amount of list-specific
... // information visible through
}; // this interface
At first, this may seem silly. After all a customer list is a list, right? Well, maybe. Later you may discover that you don’t need to insert or erase customers from the middle of the list as often as you’d anticipated, but you do need to quickly identify the top 20% of your customers — a task tailor-made for the nth_element algorithm (see Item 31). But nth_element requires random access iterators. It won’t work with a list. In that case, your customer “list” might be better implemented as a vector or a deque.
When you consider this kind of change, you still have to check every CustomerList member function and every friend to see how they’ll be affected (in terms of performance and iterator/pointer/reference invalidation, etc.), but if you’ve done a good job of encapsulating Cus-tomerList’s implementation details, the impact on CustomerList clients should be small. You can’t write container-independent code, but they might be able to.