Featured image of post In-depth Analysis of C++ std::async Asynchronous Programming

In-depth Analysis of C++ std::async Asynchronous Programming

Introduction

In modern C++ programming, asynchronous programming is an essential component that can significantly enhance program performance. std::async, as an important function template in the C++ standard library for implementing asynchronous operations, provides developers with a concise and powerful way to run asynchronous tasks. This article will delve into the functionality, usage, and differences between various compiler implementations of std::async, helping you better understand and use this powerful tool.

Basic Usage of std::async

std::async is defined in the <future> header file. Its basic functionality is to run a function asynchronously and return a std::future object that holds the result of the function call.

The declaration of std::async:

1
2
3
4
5
6
7
template <class Fn, class... ArgTypes>
future<typename result_of<Fn(ArgTypes...)>::type>
    async(Fn&& fn, ArgTypes&&... args);

template <class Fn, class... ArgTypes>
future<typename result_of<Fn(ArgTypes...)>::type>
    async(launch policy, Fn&& fn, ArgTypes&&... args);

In the second declaration, a launch policy can be specified. std::launch is an enumeration class.

  • launch::deferred: Indicates that the function call is deferred until the wait() or get() function is called.
  • launch::async: Indicates that the function is executed on a new, independent thread. (This new thread may be obtained from a thread pool or newly created, depending on the compiler implementation.)
  • launch::deferred | launch::async: The default parameter for std::async, allowing the system to decide whether to run asynchronously (create a new thread) or synchronously (not create a new thread).

Basic usage example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
int foo(int a) {
  return a;
}

int main() {
  // Default policy
  std::future<int> f = std::async(&foo, 10);

  // Launch with new thread
  std::future<int> f1 = std::async(std::launch::async, []() { return 0; });

  // Deferred call
  std::future<int> f2 = std::async(std::launch::deferred, []() { return 0; });
  
  std::println("result is: {}", f.get());
  std::println("result is: {}", f1.get());
  std::println("result is: {}", f2.get());
  return 0;
}

I won’t elaborate further on the basic usage. Let’s focus on analyzing the detailed aspects of std::async.

In-depth Analysis of std::async Policies

The C++ standard does not explicitly specify the default policy for std::async, but most compiler implementations (such as GCC, LLVM, and MSVC) have chosen std::launch::async | std::launch::deferred as the default policy. So, what policy is actually executed under different platforms?

GCC Platform

In GCC, the default option is launch::async|launch::deferred:

1
2
3
4
5
6
7
8
9
/// async, potential overload
template<typename _Fn, typename... _Args>
  _GLIBCXX_NODISCARD inline future<__async_result_of<_Fn, _Args...>>
  async(_Fn&& __fn, _Args&&... __args)
  {
    return std::async(launch::async|launch::deferred,
    std::forward<_Fn>(__fn),
    std::forward<_Args>(__args)...);
  }

In practice, the selected policy will be launch::async:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
 /// async
  template<typename _Fn, typename... _Args>
    _GLIBCXX_NODISCARD future<__async_result_of<_Fn, _Args...>>
    async(launch __policy, _Fn&& __fn, _Args&&... __args)
    {
      std::shared_ptr<__future_base::_State_base> __state;
      if ((__policy & launch::async) == launch::async)
 {
   __try
     {
       __state = __future_base::_S_make_async_state(
    std::thread::__make_invoker(std::forward<_Fn>(__fn),
           std::forward<_Args>(__args)...)
    );
     }
#if __cpp_exceptions
   catch(const system_error& __e)
     {
       if (__e.code() != errc::resource_unavailable_try_again
    || (__policy & launch::deferred) != launch::deferred)
  throw;
     }
#endif
 }
      if (!__state)
 {
   __state = __future_base::_S_make_deferred_state(
       std::thread::__make_invoker(std::forward<_Fn>(__fn),
       std::forward<_Args>(__args)...));
 }
      return future<__async_result_of<_Fn, _Args...>>(__state);
    }

LLVM

LLVM has a special launch policy for default options called launch::any:

1
2
3
4
5
6
7
8
template <class _Fp, class... _Args>
_LIBCPP_NODISCARD_AFTER_CXX17 inline _LIBCPP_INLINE_VISIBILITY
future<typename __invoke_of<typename decay<_Fp>::type, typename decay<_Args>::type...>::type>
async(_Fp&& __f, _Args&&... __args)
{
    return _VSTD::async(launch::any, _VSTD::forward<_Fp>(__f),
                                    _VSTD::forward<_Args>(__args)...);
}

In essence, it’s a combination of launch::async and launch::deferred.

1
2
3
4
5
6
enum class launch
{
    async = 1,
    deferred = 2,
    any = async | deferred
};

And LLVM’s actual chosen policy will be launch::async:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
template <class _Fp, class... _Args>
_LIBCPP_NODISCARD_AFTER_CXX17
future<typename __invoke_of<typename decay<_Fp>::type, typename decay<_Args>::type...>::type>
async(launch __policy, _Fp&& __f, _Args&&... __args)
{
    typedef __async_func<typename decay<_Fp>::type, typename decay<_Args>::type...> _BF;
    typedef typename _BF::_Rp _Rp;

#ifndef _LIBCPP_NO_EXCEPTIONS
    try
    {
#endif
        if (__does_policy_contain(__policy, launch::async))
        return _VSTD::__make_async_assoc_state<_Rp>(_BF(__decay_copy(_VSTD::forward<_Fp>(__f)),
                                                     __decay_copy(_VSTD::forward<_Args>(__args))...));
#ifndef _LIBCPP_NO_EXCEPTIONS
    }
    catch ( ... ) { if (__policy == launch::async) throw ; }
#endif

    if (__does_policy_contain(__policy, launch::deferred))
        return _VSTD::__make_deferred_assoc_state<_Rp>(_BF(__decay_copy(_VSTD::forward<_Fp>(__f)),
                                                        __decay_copy(_VSTD::forward<_Args>(__args))...));
    return future<_Rp>{};
}

MSVC

For MSVC, the default option is also launch::async | launch::deferred:

1
2
3
4
5
6
_EXPORT_STD template <class _Fty, class... _ArgTypes>
_NODISCARD_ASYNC future<_Invoke_result_t<decay_t<_Fty>, decay_t<_ArgTypes>...>> async(
    _Fty&& _Fnarg, _ArgTypes&&... _Args) {
    // manages a callable object launched with default policy
    return _STD async(launch::async | launch::deferred, _STD forward<_Fty>(_Fnarg), _STD forward<_ArgTypes>(_Args)...);
}

And the selected policy is launch::async:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
template <class _Ret, class _Fty>
_Associated_state<typename _P_arg_type<_Ret>::type>* _Get_associated_state(launch _Psync, _Fty&& _Fnarg) {
    // construct associated asynchronous state object for the launch type
    switch (_Psync) { // select launch type
    case launch::deferred:
        return new _Deferred_async_state<_Ret>(_STD forward<_Fty>(_Fnarg));
    case launch::async: // TRANSITION, fixed in vMajorNext, should create a new thread here
    default:
        return new _Task_async_state<_Ret>(_STD forward<_Fty>(_Fnarg));
    }
}

In-depth Analysis of std::launch::async

We know that std::launch::async indicates the function is executed on a new, independent thread. However, the C++ standard does not specify whether the thread is a new thread or a thread reused from a thread pool.

GCC

GCC calls __future_base::_S_make_async_state, which creates an instance of _Async_state_impl. Its constructor starts a new std::thread:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Shared state created by std::async().
// Starts a new thread that runs a function and makes the shared state ready.
template<typename _BoundFn, typename _Res>
  class __future_base::_Async_state_impl final
  : public __future_base::_Async_state_commonV2
  {
  public:
    explicit
    _Async_state_impl(_BoundFn&& __fn)
    : _M_result(new _Result<_Res>()), _M_fn(std::move(__fn))
    {
  _M_thread = std::thread{ [this] {
      __try
        {
   _M_set_result(_S_task_setter(_M_result, _M_fn));
        }
      __catch (const __cxxabiv1::__forced_unwind&)
        {
   // make the shared state ready on thread cancellation
   if (static_cast<bool>(_M_result))
     this->_M_break_promise(std::move(_M_result));
   __throw_exception_again;
        }
      } };
    }

LLVM

LLVM calls _VSTD::__make_async_assoc_state, which also starts a new std::thread:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
template <class _Rp, class _Fp>
future<_Rp>
#ifndef _LIBCPP_HAS_NO_RVALUE_REFERENCES
__make_async_assoc_state(_Fp&& __f)
#else
__make_async_assoc_state(_Fp __f)
#endif
{
    unique_ptr<__async_assoc_state<_Rp, _Fp>, __release_shared_count>
        __h(new __async_assoc_state<_Rp, _Fp>(_VSTD::forward<_Fp>(__f)));
    _VSTD::thread(&__async_assoc_state<_Rp, _Fp>::__execute, __h.get()).detach();
    return future<_Rp>(__h.get());
}

MSVC

Here’s where it gets interesting! MSVC creates an instance of _Task_async_state, which creates a concurrent task and passes a callable function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// CLASS TEMPLATE _Task_async_state
template <class _Rx>
class _Task_async_state : public _Packaged_state<_Rx()> {
    // class for managing associated synchronous state for asynchronous execution from async
public:
    using _Mybase     = _Packaged_state<_Rx()>;
    using _State_type = typename _Mybase::_State_type;

    template <class _Fty2>
    _Task_async_state(_Fty2&& _Fnarg) : _Mybase(_STD forward<_Fty2>(_Fnarg)) {
        _Task = ::Concurrency::create_task([this]() { // do it now
            this->_Call_immediate();
        });

        this->_Running = true;
    }

::Concurrency::create_task is part of Microsoft’s Parallel Patterns Library. According to MSDN documentation, the task class gets threads from the Windows ThreadPool instead of creating a new thread.

So it’s important to note that ThreadPool-based implementations cannot guarantee that thread_local variables will be destroyed when a thread completes, because threads acquired from a thread pool aren’t destroyed. As a result, you’ll find that after using std::async, the thread is not destroyed or released. This is equivalent to borrowing a thread from the system thread pool, which counts towards the user thread count, but this thread is not released, leading to the phenomenon that the more std::async is used, the more threads accumulate.

The number of concurrent threads executed by std::async is limited to the Windows thread pool default, which is 500 threads.

In-depth Analysis of std::future Returned by std::async

According to cppreference:

If a std::future obtained from std::async is not moved from or bound to a reference, the destructor of the std::future will block at the end of the full expression until the asynchronous computation completes, essentially making the following code synchronous:

1
2
std::async(std::launch::async, []{ f(); }); // destructor of the temporary waits for f()
std::async(std::launch::async, []{ g(); }); // does not start until f() completes

Note: The destructor of a std::future obtained by means other than a call to std::async does not block.

That is, the behavior of the destructor for a std::future returned by std::async differs from that of a std::future obtained from a std::promise. When these std::future objects are destroyed, their destructors call the wait() function, causing the thread generated at creation to join the main thread.

Using MSVC code as an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
~_Task_async_state() noexcept override {
    _Wait();
}

void _Wait() override { // wait for completion
    _Task.wait();
}

void WaitUntilStateChangedTo(_TaskCollectionState _State)
{
    ::std::unique_lock<::std::mutex> _Lock(_M_Cs);

    while(_M_State < _State)
    {
        _M_StateChanged.wait(_Lock);
    }
}

When _Task_async_state is destroyed, it calls wait(), which ultimately leads to _M_StateChanged.wait(_Lock);, which is the wait() function of a condition variable.

The implementation varies across platforms. In GCC and LLVM:

1
2
3
4
5
  ~_Async_state_impl()
  {
if (_M_thread.joinable())
  _M_thread.join();
  }

During destruction, it waits for the thread to join() and complete execution.

Conclusion

std::async is a high-level thread abstraction tool in the C++ standard library that simplifies the implementation of asynchronous operations and makes code more concise. However, due to differences between compiler implementations, developers need to carefully consider these factors when using it to avoid potential issues. Special attention should be paid to thread_local variables and the returned std::future.

Implementations of std::async and how they might Affect Applications | Dmitry Danilov

functions | Microsoft Learn

std::async - cppreference.com

《Asynchronous Programming with C++》

Licensed under CC BY-NC-SA 4.0