• Solving thundering Herd with glibc...

    From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 06:39:27 2025
    Well, I don't have any more time to mess around with this, but is Bonita right? does glibc 100% solve _all_ thundering herd problems? I know
    about wait morphing, however it is not a 100% solution.

    Thanks.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 07:10:46 2025
    Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
    Well, I don't have any more time to mess around with this, but is Bonita right? does glibc 100% solve _all_ thundering herd problems? I know
    about wait morphing, however it is not a 100% solution.

    With wait morphing thundering herd is impossible.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 07:33:40 2025
    On 4/24/2025 2:10 PM, Bonita Montero wrote:
    Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
    Well, I don't have any more time to mess around with this, but is
    Bonita right? does glibc 100% solve _all_ thundering herd problems? I
    know about wait morphing, however it is not a 100% solution.

    With wait morphing thundering herd is impossible.


    "there’s no thundering herd, ever!" because a controlled test didn't
    "show it" is like saying race conditions do not exist because your code "worked fine this time."? Fair enough?

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Fri Apr 25 07:35:30 2025
    Reply-To: slp53@pacbell.net

    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
    Well, I don't have any more time to mess around with this, but is Bonita
    right? does glibc 100% solve _all_ thundering herd problems? I know
    about wait morphing, however it is not a 100% solution.

    With wait morphing thundering herd is impossible.

    Wait morphing was removed from glibc in 2016, if I recall correctly.

    In any case, a programmer should never assume that the
    run-time system will support wait morphing when writing
    portable code. Release the mutex before signaling the
    condition variable and avoid pthread_cond_broadcast unless
    absolutely necessary.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 16:28:08 2025
    Am 24.04.2025 um 23:35 schrieb Scott Lurndal:

    Wait morphing was removed from glibc in 2016, if I recall correctly.

    I've shown with my first pingpong-code that there are no further
    wakeups beyond one per loop. Try to find the bug; there isn't a bug.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 18:37:04 2025
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't
    "show it" is like saying race conditions do not exist because your code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 21:27:28 2025
    Am 25.04.2025 um 10:37 schrieb Bonita Montero:

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    A thundering herd problem with a condvar should occur if I notify > 1
    threads and unlock the mutex, but it actually dosn't happen; never.
    So the glibc condvar is optimal.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sat Apr 26 06:01:21 2025
    On 4/25/2025 4:27 AM, Bonita Montero wrote:
    Am 25.04.2025 um 10:37 schrieb Bonita Montero:

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    A thundering herd problem with a condvar should occur if I notify > 1
    threads and unlock the mutex, but it actually dosn't happen; never.
    So the glibc condvar is optimal.


    If you say so... Too busy right now. Perhaps sometime later on tonight.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sat Apr 26 16:26:21 2025
    Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:

    If you say so... Too busy right now. Perhaps sometime later on tonight.


    If there would be a thundering herd problem with glibc's condvar it
    would happen very often with my code since I awake 31 threads at once
    with my machine.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sat Apr 26 17:25:22 2025
    Am 26.04.2025 um 08:26 schrieb Bonita Montero:
    Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:

    If you say so... Too busy right now. Perhaps sometime later on tonight.


    If there would be a thundering herd problem with glibc's condvar it
    would happen very often with my code since I awake 31 threads at once
    with my machine.

    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    for( size_t r = N; r; --r )
    {
    unique_lock lock( mtx );
    signalled = nClients;
    ai.store( nClients, memory_order_relaxed );
    lock.unlock();
    if( argc < 2 )
    cv.notify_all();
    else
    for( int c = nClients; c; cv.notify_one(), --c );
    bs.acquire();
    }

    The result: 7.500 context switches per thread, not 3.000.

    10000 rounds,
    7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 27 07:41:31 2025
    On 4/26/2025 12:25 AM, Bonita Montero wrote:
    Am 26.04.2025 um 08:26 schrieb Bonita Montero:
    Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:

    If you say so... Too busy right now. Perhaps sometime later on
    tonight.


    If there would be a thundering herd problem with glibc's condvar it
    would happen very often with my code since I awake 31 threads at once
    with my machine.

    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    ÿÿÿÿfor( size_t r = N; r; --r )
    ÿÿÿÿ{
    ÿÿÿÿÿÿÿ unique_lock lock( mtx );
    ÿÿÿÿÿÿÿ signalled = nClients;
    ÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
    ÿÿÿÿÿÿÿ lock.unlock();
    ÿÿÿÿÿÿÿ if( argc < 2 )
    ÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
    ÿÿÿÿÿÿÿ else
    ÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
    ÿÿÿÿÿÿÿ bs.acquire();
    ÿÿÿÿ}

    The result: 7.500 context switches per thread, not 3.000.

    ÿÿÿÿ10000 rounds,
    ÿÿÿÿ7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    So, do that. It's your software. Do what you like. This is a very old
    debate. Take your contrived test and just, roll with it. Whatever.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 27 13:18:02 2025
    Am 26.04.2025 um 23:41 schrieb Chris M. Thomasson:

    On 4/26/2025 12:25 AM, Bonita Montero wrote:


    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    ÿÿÿÿÿfor( size_t r = N; r; --r )
    ÿÿÿÿÿ{
    ÿÿÿÿÿÿÿÿ unique_lock lock( mtx );
    ÿÿÿÿÿÿÿÿ signalled = nClients;
    ÿÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
    ÿÿÿÿÿÿÿÿ lock.unlock();
    ÿÿÿÿÿÿÿÿ if( argc < 2 )
    ÿÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
    ÿÿÿÿÿÿÿÿ else
    ÿÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
    ÿÿÿÿÿÿÿÿ bs.acquire();
    ÿÿÿÿÿ}

    The result: 7.500 context switches per thread, not 3.000.

    ÿÿÿÿÿ10000 rounds,
    ÿÿÿÿÿ7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    So, do that. It's your software. Do what you like. This is a very old debate. Take your contrived test and just, roll with it. Whatever.

    There's nothing to debate, I've measured it: If you awake a single
    thread it doesn't matter if you awake from inside or outside, if you
    awake multiple threads awakening them from inside is multiple times
    faster.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Apr 30 05:26:41 2025
    On 4/26/2025 8:18 PM, Bonita Montero wrote:
    Am 26.04.2025 um 23:41 schrieb Chris M. Thomasson:

    On 4/26/2025 12:25 AM, Bonita Montero wrote:


    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    ÿÿÿÿÿfor( size_t r = N; r; --r )
    ÿÿÿÿÿ{
    ÿÿÿÿÿÿÿÿ unique_lock lock( mtx );
    ÿÿÿÿÿÿÿÿ signalled = nClients;
    ÿÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
    ÿÿÿÿÿÿÿÿ lock.unlock();
    ÿÿÿÿÿÿÿÿ if( argc < 2 )
    ÿÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
    ÿÿÿÿÿÿÿÿ else
    ÿÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
    ÿÿÿÿÿÿÿÿ bs.acquire();
    ÿÿÿÿÿ}

    The result: 7.500 context switches per thread, not 3.000.

    ÿÿÿÿÿ10000 rounds,
    ÿÿÿÿÿ7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    So, do that. It's your software. Do what you like. This is a very old
    debate. Take your contrived test and just, roll with it. Whatever.

    There's nothing to debate, I've measured it: If you awake a single
    thread it doesn't matter if you awake from inside or outside, if you
    awake multiple threads awakening them from inside is multiple times
    faster.


    Yawn.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Vir Campestris@3:633/280.2 to All on Thu May 15 01:05:02 2025
    On 25/04/2025 09:37, Bonita Montero wrote:
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't
    "show it" is like saying race conditions do not exist because your
    code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    Once upon a time I put a race in a bit of code.

    It took us 3 years to track down why our customers were reporting
    occasional faults :(

    (It turned out the trick to reproduce it was a combination of lots of
    CPU load and disk transfers not more than once every 30 seconds)

    --
    Do not listen to rumour, but, if you do, do not believe it.
    Ghandi.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Thu May 15 09:22:16 2025
    On 5/14/2025 8:05 AM, Vir Campestris wrote:
    On 25/04/2025 09:37, Bonita Montero wrote:
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't
    "show it" is like saying race conditions do not exist because your
    code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    Once upon a time I put a race in a bit of code.

    It took us 3 years to track down why our customers were reporting
    occasional faults :(

    Oh shit. Damn. Did it "totally" deadlock once in a blue moon or just
    give out some bugged data to the customers? I hope it just deadlocked...
    Wow. Shit man. ;^o


    (It turned out the trick to reproduce it was a combination of lots of
    CPU load and disk transfers not more than once every 30 seconds)

    Yup. I have been there done that for some of my tests. I even put a bug
    in on purpose just to see how long it would take to trigger it...
    Decades ago. ;^o Sigh.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Thu May 15 16:49:28 2025
    On 14/05/2025 17:05, Vir Campestris wrote:
    On 25/04/2025 09:37, Bonita Montero wrote:
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't
    "show it" is like saying race conditions do not exist because your
    code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    Once upon a time I put a race in a bit of code.

    It took us 3 years to track down why our customers were reporting
    occasional faults :(

    (It turned out the trick to reproduce it was a combination of lots of
    CPU load and disk transfers not more than once every 30 seconds)


    It's always fun dealing with a bug that only triggers in rare timing situations. We once had a mistake in a timing table in a program that
    could sometimes result in intermittent faults in the system if
    particular events occurred at the same time, on the 30th of September.
    Finding an issue that occurred at most once per year was a challenge!


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From candycanearter07@3:633/280.2 to All on Fri May 16 22:30:03 2025
    David Brown <david.brown@hesbynett.no> wrote at 06:49 this Thursday (GMT):
    On 14/05/2025 17:05, Vir Campestris wrote:
    On 25/04/2025 09:37, Bonita Montero wrote:
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't >>>> "show it" is like saying race conditions do not exist because your
    code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    Once upon a time I put a race in a bit of code.

    It took us 3 years to track down why our customers were reporting
    occasional faults :(

    (It turned out the trick to reproduce it was a combination of lots of
    CPU load and disk transfers not more than once every 30 seconds)


    It's always fun dealing with a bug that only triggers in rare timing situations. We once had a mistake in a timing table in a program that
    could sometimes result in intermittent faults in the system if
    particular events occurred at the same time, on the 30th of September. Finding an issue that occurred at most once per year was a challenge!


    If you knew what date the issue was happening on, could you force the
    system clock to be on that day?
    --
    user <candycane> is generated from /dev/urandom

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: the-candyden-of-code (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri May 16 23:24:17 2025
    Am 14.05.2025 um 17:05 schrieb Vir Campestris:

    It took us 3 years to track down why our customers were reporting
    occasional faults :(

    I guess that your code wasn't a 101 LOC test-code but with much
    more code.

    Here's the code where I stopped development:

    #if defined(_WIN32)
    #include <Windows.h>
    #endif
    #include <iostream>
    #include <thread>
    #include <mutex>
    #include <condition_variable>
    #include <atomic>
    #include <semaphore>
    #include <vector>
    #include <string_view>
    #if defined(__unix__)
    #include <sys/resource.h>
    #endif

    using namespace std;

    struct params
    {
    params( unsigned argc, char **argv );
    bool outside, add, all;
    };

    int main( int argc, char **argv )
    {
    constexpr size_t N = 10'000;
    cout << N << " rounds" << endl;
    int hc = thread::hardware_concurrency(), nClients = hc - 1;
    int64_t tLast = 0;
    for( unsigned outside = 0; outside <= 1; ++outside )
    {
    cout << (outside ? "outside:" : "inside:") << endl;
    for( unsigned all = 0; all <= 1; ++all )
    {
    cout << (all ? "\tall:" : "\tone:") << endl;
    mutex mtx;
    int signalled = 0;
    condition_variable cv;
    atomic_int ai( 0 );
    binary_semaphore bs( false );
    vector<jthread> threads;
    atomic_int64_t nVoluntary( 0 );
    atomic_bool stop( false );
    for( int c = nClients; c; --c )
    threads.emplace_back( [&]
    {
    for( size_t r = N; r; --r )
    {
    unique_lock lock( mtx );
    cv.wait( lock, [&] { return (bool)signalled; } );
    --signalled;
    lock.unlock();
    if( ai.fetch_sub( 1, memory_order_relaxed ) == 1 )
    bs.release( 1 );
    }
    #if defined(__unix__)
    rusage ru;
    getrusage( RUSAGE_THREAD, &ru );
    nVoluntary.fetch_add( ru.ru_nvcsw, memory_order_relaxed );
    #endif
    } );
    for( size_t r = N; r; --r )
    {
    auto notify = [&]
    {
    if( all )
    cv.notify_all();
    else
    for( int c = nClients; c; cv.notify_one(), --c );
    };
    unique_lock lock( mtx );
    signalled = nClients;
    if( !outside )
    notify();
    ai.store( nClients, memory_order_relaxed );
    lock.unlock();
    if( outside )
    notify();
    bs.acquire();
    }
    stop.store( true, memory_order_relaxed );
    threads.resize( 0 );
    #if defined(_WIN32)
    FILETIME ftDummy, ftKernel, ftUser;
    GetProcessTimes( GetCurrentProcess(), &ftDummy, &ftDummy, &ftKernel,
    &ftUser );
    auto ftToU64 = []( FILETIME const &ft ) { return (uint64_t)ft.dwHighDateTime << 32 | ft.dwLowDateTime; };
    int64_t t = ftToU64( ftKernel ) + ftToU64( ftUser );
    cout << "\t\t" << (double)(t - tLast) / 1.0e7 << " seconds" << endl;
    tLast = t;
    #elif defined(__unix__)
    rusage ru;
    getrusage( RUSAGE_SELF, &ru );
    auto tvToU64 = []( timeval const &tv ) { return (uint64_t)tv.tv_sec *
    1'000'000u + tv.tv_usec; };
    int64_t t = tvToU64( ru.ru_utime ) + tvToU64( ru.ru_stime );
    cout << "\t\t" << (double)nVoluntary.load( memory_order_relaxed ) /
    nClients << " context switches per thread" << endl;
    cout << "\t\t" << (double)(t - tLast) / 1.0e6 << " seconds" << endl;
    tLast = t;
    #endif
    }
    }
    }

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Fri May 16 23:33:10 2025
    On 16/05/2025 14:30, candycanearter07 wrote:
    David Brown <david.brown@hesbynett.no> wrote at 06:49 this Thursday (GMT):
    On 14/05/2025 17:05, Vir Campestris wrote:
    On 25/04/2025 09:37, Bonita Montero wrote:
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't >>>>> "show it" is like saying race conditions do not exist because your
    code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    Once upon a time I put a race in a bit of code.

    It took us 3 years to track down why our customers were reporting
    occasional faults :(

    (It turned out the trick to reproduce it was a combination of lots of
    CPU load and disk transfers not more than once every 30 seconds)


    It's always fun dealing with a bug that only triggers in rare timing
    situations. We once had a mistake in a timing table in a program that
    could sometimes result in intermittent faults in the system if
    particular events occurred at the same time, on the 30th of September.
    Finding an issue that occurred at most once per year was a challenge!


    If you knew what date the issue was happening on, could you force the
    system clock to be on that day?

    Yes, once we figured out that the issue was date-dependent. For the
    first few years, all we knew was that we were getting occasional rare
    bug reports and no one saw the coincidence. (This was a program on DOS
    - changing the system clock was easy.)


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From candycanearter07@3:633/280.2 to All on Mon May 19 05:20:05 2025
    David Brown <david.brown@hesbynett.no> wrote at 13:33 this Friday (GMT):
    On 16/05/2025 14:30, candycanearter07 wrote:
    David Brown <david.brown@hesbynett.no> wrote at 06:49 this Thursday (GMT): >>> On 14/05/2025 17:05, Vir Campestris wrote:
    On 25/04/2025 09:37, Bonita Montero wrote:
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't >>>>>> "show it" is like saying race conditions do not exist because your >>>>>> code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    Once upon a time I put a race in a bit of code.

    It took us 3 years to track down why our customers were reporting
    occasional faults :(

    (It turned out the trick to reproduce it was a combination of lots of
    CPU load and disk transfers not more than once every 30 seconds)


    It's always fun dealing with a bug that only triggers in rare timing
    situations. We once had a mistake in a timing table in a program that
    could sometimes result in intermittent faults in the system if
    particular events occurred at the same time, on the 30th of September.
    Finding an issue that occurred at most once per year was a challenge!


    If you knew what date the issue was happening on, could you force the
    system clock to be on that day?

    Yes, once we figured out that the issue was date-dependent. For the
    first few years, all we knew was that we were getting occasional rare
    bug reports and no one saw the coincidence. (This was a program on DOS
    - changing the system clock was easy.)


    Ah, fair enough.
    --
    user <candycane> is generated from /dev/urandom

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: the-candyden-of-code (3:633/280.2@fidonet)