Featured image of post C++ vector 翻倍扩容策略是错误的认知

C++ vector 翻倍扩容策略是错误的认知

很多人认为 C++ 的 `std::vector` 扩容每次都是翻倍发生,但标准并没有强制这一点。本文对比了 GCC/Clang 与 MSVC 的实际行为,通过源码与实验数据分析它们各自的增长策略,以及在少量插入情况下如何预分配来优化性能。

C++ 中,std::vector 是一个动态数组,它能够在运行时根据需要自动调整其容量。许多初学者会认为 std::vector 的容量增长策略是简单的“翻倍”(即每次扩容为原来的 2 倍),但事实并非如此。

事实是,C++ 标准并没有规定 std::vector 的具体扩容策略。不同编译器(如 MSVC, GCC, Clang)的标准库实现可以自由选择它们认为最优的扩容策略。

我们通过代码验证一下,最简例子:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <iostream>
#include <vector>

int main()
{
    std::vector<int> vec;

    for (int i = 0; i < 100; i++) {
        vec.emplace_back(i);
        std::cout << "i:" << i << " size:" << vec.size() << " capacity:" << vec.capacity() << std::endl;
    }
    return 0;
}

这段代码在 MSVC 上的输出(输出简化):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
i:0 size:1 capacity:1
i:1 size:2 capacity:2
i:2 size:3 capacity:3
i:3 size:4 capacity:4
i:4 size:5 capacity:6
i:5 size:6 capacity:6
i:6 size:7 capacity:9
i:8 size:9 capacity:9
i:9 size:10 capacity:13
i:12 size:13 capacity:13
i:13 size:14 capacity:19
i:18 size:19 capacity:19
i:19 size:20 capacity:28
i:27 size:28 capacity:28
i:28 size:29 capacity:42
i:41 size:42 capacity:42
i:42 size:43 capacity:63
i:62 size:63 capacity:63
i:63 size:64 capacity:94
i:93 size:94 capacity:94
i:94 size:95 capacity:141
i:99 size:100 capacity:141

可以看到,std::vector 实例化后,容量(capacity)默认为 0,前四次追加都是扩容 1,第五次时,扩容 2,第七次时,扩容 3;4,6,9,14,21,31,47,非 2 倍扩容策略。

在 GCC 或 Clang 上输出:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
i:0 size:1 capacity:1
i:1 size:2 capacity:2
i:2 size:3 capacity:4
i:3 size:4 capacity:4
i:4 size:5 capacity:8
i:7 size:8 capacity:8
i:8 size:9 capacity:16
i:15 size:16 capacity:16
i:16 size:17 capacity:32
i:31 size:32 capacity:32
i:32 size:33 capacity:64
i:63 size:64 capacity:64
i:64 size:65 capacity:128
i:99 size:100 capacity:128

GCC/Clang 为 2 倍扩容策略。

以上为观测现象,接下来从源码实现角度分析。

MSVC 源码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
    _CONSTEXPR20 pointer _Emplace_reallocate(const pointer _Whereptr, _Valty&&... _Val) {
        // reallocate and insert by perfectly forwarding _Val at _Whereptr
        _Alty& _Al        = _Getal();
        auto& _My_data    = _Mypair._Myval2;
        pointer& _Myfirst = _My_data._Myfirst;
        pointer& _Mylast  = _My_data._Mylast;

        _STL_INTERNAL_CHECK(_Mylast == _My_data._Myend); // check that we have no unused capacity

        const auto _Whereoff = static_cast<size_type>(_Whereptr - _Myfirst);
        const auto _Oldsize  = static_cast<size_type>(_Mylast - _Myfirst);

        if (_Oldsize == max_size()) {
            _Xlength();
        }

        const size_type _Newsize = _Oldsize + 1;
        size_type _Newcapacity   = _Calculate_growth(_Newsize);

        const pointer _Newvec           = _STD _Allocate_at_least_helper(_Al, _Newcapacity);
        ...
    }

    _CONSTEXPR20 size_type _Calculate_growth(const size_type _Newsize) const {
        // given _Oldcapacity and _Newsize, calculate geometric growth
        const size_type _Oldcapacity = capacity();
        const auto _Max              = max_size();

        if (_Oldcapacity > _Max - _Oldcapacity / 2) {
            return _Max; // geometric growth would overflow
        }

        // 计算新容量
        const size_type _Geometric = _Oldcapacity + _Oldcapacity / 2;

        if (_Geometric < _Newsize) {
            return _Newsize; // geometric growth would be insufficient
        }

        return _Geometric; // geometric growth is sufficient
    }

GCC 源码:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
#if __cplusplus >= 201103L
  template<typename _Tp, typename _Alloc>
    template<typename... _Args>
#if __cplusplus > 201402L
      _GLIBCXX20_CONSTEXPR
      typename vector<_Tp, _Alloc>::reference
#else
      void
#endif
      vector<_Tp, _Alloc>::
      emplace_back(_Args&&... __args)
      {
	if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
	  {
	    _GLIBCXX_ASAN_ANNOTATE_GROW(1);
	    _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
				     std::forward<_Args>(__args)...);
	    ++this->_M_impl._M_finish;
	    _GLIBCXX_ASAN_ANNOTATE_GREW(1);
	  }
	else
	  _M_realloc_insert(end(), std::forward<_Args>(__args)...);
#if __cplusplus > 201402L
	return back();
#endif
      }
#endif

#if __cplusplus >= 201103L
  template<typename _Tp, typename _Alloc>
    template<typename... _Args>
      _GLIBCXX20_CONSTEXPR
      void
      vector<_Tp, _Alloc>::
      _M_realloc_insert(iterator __position, _Args&&... __args)
#else
  template<typename _Tp, typename _Alloc>
    void
    vector<_Tp, _Alloc>::
    _M_realloc_insert(iterator __position, const _Tp& __x)
#endif
    {
      // 计算新容量  
      const size_type __len =
	_M_check_len(size_type(1), "vector::_M_realloc_insert");
      pointer __old_start = this->_M_impl._M_start;
      pointer __old_finish = this->_M_impl._M_finish;
      const size_type __elems_before = __position - begin();
      pointer __new_start(this->_M_allocate(__len));
      pointer __new_finish(__new_start);
      __try
	{
	  // The order of the three operations is dictated by the C++11
	  // case, where the moves could alter a new element belonging
	  // to the existing vector.  This is an issue only for callers
	  // taking the element by lvalue ref (see last bullet of C++11
	  // [res.on.arguments]).
	  _Alloc_traits::construct(this->_M_impl,
				   __new_start + __elems_before,
#if __cplusplus >= 201103L
				   std::forward<_Args>(__args)...);
#else
				   __x);
#endif
	  __new_finish = pointer();

#if __cplusplus >= 201103L
	  if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
	    {
	      __new_finish = _S_relocate(__old_start, __position.base(),
					 __new_start, _M_get_Tp_allocator());

	      ++__new_finish;

	      __new_finish = _S_relocate(__position.base(), __old_finish,
					 __new_finish, _M_get_Tp_allocator());
	    }
	  else
#endif
	    {
	      __new_finish
		= std::__uninitialized_move_if_noexcept_a
		(__old_start, __position.base(),
		 __new_start, _M_get_Tp_allocator());

	      ++__new_finish;

	      __new_finish
		= std::__uninitialized_move_if_noexcept_a
		(__position.base(), __old_finish,
		 __new_finish, _M_get_Tp_allocator());
	    }
	}
      __catch(...)
	{
	  if (!__new_finish)
	    _Alloc_traits::destroy(this->_M_impl,
				   __new_start + __elems_before);
	  else
	    std::_Destroy(__new_start, __new_finish, _M_get_Tp_allocator());
	  _M_deallocate(__new_start, __len);
	  __throw_exception_again;
	}
#if __cplusplus >= 201103L
      if _GLIBCXX17_CONSTEXPR (!_S_use_relocate())
#endif
	std::_Destroy(__old_start, __old_finish, _M_get_Tp_allocator());
      _GLIBCXX_ASAN_ANNOTATE_REINIT;
      _M_deallocate(__old_start,
		    this->_M_impl._M_end_of_storage - __old_start);
      this->_M_impl._M_start = __new_start;
      this->_M_impl._M_finish = __new_finish;
      this->_M_impl._M_end_of_storage = __new_start + __len;
    }

      _GLIBCXX20_CONSTEXPR
      size_type
      _M_check_len(size_type __n, const char* __s) const
      {
	if (max_size() - size() < __n)
	  __throw_length_error(__N(__s));

    // 新容量 = 当前大小 + max(当前大小, 1)
	const size_type __len = size() + (std::max)(size(), __n);
	return (__len < size() || __len > max_size()) ? max_size() : __len;
      }

从源码得知,MSVC 采用近似 1.5 倍 的增长策略,GCC 和 Clang 通常采用**翻倍(2 倍)**策略。

综上,std::vector 的扩容策略完全取决于编译器实现,对于 GCC/Clang 来说,翻倍策略是正确的,但在 MSVC 上,则不是如此。

此外,与 Modern C++ 标准无关, 11/23 均是如此。

由此还可得出,在只插入少量数据时,对性能影响反而更大,内存重新分配频次高,最佳实践是对于少量数据,提前分配空间,减少扩容导致的内存分配开销。

Licensed under CC BY-NC-SA 4.0
最后更新于 Sep 20, 2025 22:48 +0800