• Re: signalling a condvar from inside vs. signalling a condvar von outs

    From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 13 05:33:57 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/12/2025 8:23 AM, Bonita Montero wrote:
    [...]
    It nearly doesn't matter in terms of numbers of context switches if
    you˙ signal a condvar from inside our outside. The above program run
    on a Zen4-CPU with WSL2:

    ˙˙˙˙inside: 20130
    ˙˙˙˙outside: 19811

    On a 28 core Skylake-CPU with Ubuntu:

    ˙˙˙˙inside: 19997
    ˙˙˙˙outside: 19888


    There is a scalability problem wrt signalling inside the critical
    section. Does your convdar impl use wait morphing?

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Mon Apr 14 01:38:19 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 12.04.2025 um 21:33 schrieb Chris M. Thomasson:

    There is a scalability problem wrt signalling inside the critical
    section. Does your convdar impl use wait morphing?

    There's no scalability problem with that since the kernel call to
    release a thread happens only when the mutex is accessible *and*
    the cv is signalled.



    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Apr 14 05:32:45 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/13/2025 8:38 AM, Bonita Montero wrote:
    Am 12.04.2025 um 21:33 schrieb Chris M. Thomasson:

    There is a scalability problem wrt signalling inside the critical
    section. Does your convdar impl use wait morphing?

    There's no scalability problem with that since the kernel call to
    release a thread happens only when the mutex is accessible *and*
    the cv is signalled.



    No. When you signal a condvar while holding the lock it means that
    waiters can wake and just instantly wait on the lock. This is why wait morphing was created. It helps, but only so much...

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Mon Apr 14 06:08:15 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 13.04.2025 um 21:32 schrieb Chris M. Thomasson:
    On 4/13/2025 8:38 AM, Bonita Montero wrote:
    Am 12.04.2025 um 21:33 schrieb Chris M. Thomasson:

    There is a scalability problem wrt signalling inside the critical
    section. Does your convdar impl use wait morphing?

    There's no scalability problem with that since the kernel call to
    release a thread happens only when the mutex is accessible *and*
    the cv is signalled.



    No. ...

    The numer of context-switches my code shows say that there's only
    one context-switchz per wait.

    When you signal a condvar while holding the lock it means that waiters can wake and just instantly wait on the lock. This is why wait
    morphing was created. It helps, but only so much...


    idiot.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Apr 14 07:40:40 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/13/2025 1:08 PM, Bonita Montero wrote:
    Am 13.04.2025 um 21:32 schrieb Chris M. Thomasson:
    On 4/13/2025 8:38 AM, Bonita Montero wrote:
    Am 12.04.2025 um 21:33 schrieb Chris M. Thomasson:

    There is a scalability problem wrt signalling inside the critical
    section. Does your convdar impl use wait morphing?

    There's no scalability problem with that since the kernel call to
    release a thread happens only when the mutex is accessible *and*
    the cv is signalled.



    No. ...

    The numer of context-switches my code shows say that there's only
    one context-switchz per wait.

    You code is hard to read. Sigh. Signalling while locked or unlocked was
    an old debate. Think of signalling while holding the lock. A thread gets
    woken and immediately sees that the lock is held. Oh well. Wait morphing
    can help with that. However, signal outside when you can...


    When you signal a condvar while holding the lock it means that
    waiters can wake and just instantly wait on the lock. This is why wait
    morphing was created. It helps, but only so much...


    idiot.


    Whatever.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Tue Apr 15 19:41:06 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 13.04.2025 um 23:40 schrieb Chris M. Thomasson:

    You code is hard to read. ...

    The code is beautiful.

    Signalling while locked or unlocked was an old debate. Think of signalling while holding the lock. A thread gets woken and immediately sees that the lock is held. Oh well. Wait morphing can help with that. However, signal outside when you can...

    The number of context-switches determined via getrusage() is twice per
    loop iteration, i.e. on switch to the thread and one switch from the
    thread; so everything is optimal with glibc.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Apr 16 05:07:23 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/15/2025 2:41 AM, Bonita Montero wrote:
    Am 13.04.2025 um 23:40 schrieb Chris M. Thomasson:

    You code is hard to read. ...

    The code is beautiful.

    I can understand it, but, well, shit happens. I would not say it's beautiful... But, that is just me. Again, shit happens.


    Signalling while locked or unlocked was an old debate. Think of
    signalling
    while holding the lock. A thread gets woken and immediately sees that the
    lock is held. Oh well. Wait morphing˙ can help with that. However, signal
    outside when you can...

    The number of context-switches determined via getrusage() is twice per
    loop iteration, i.e. on switch to the thread and one switch from the
    thread; so everything is optimal with glibc.

    In real applications there is generally more going on in those critical sections vs your test... Well, I have seen some horror shows in my life.

    Again, think of a scenario where the lock is held. The thread signals... Another thread wakes up and has to instantly block on a wait morphing
    queue in the kernel. This is not "ideal". A signal outside of the mutex
    can be beneficial. Signalling outside can give a signaled thread a
    possible fast-path into the critical section, completely eliminating the
    need for kernel interaction. Now, an adaptive mutex can try to help with
    this via limited spinning... However, try to signal outside when you can.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Wed Apr 16 14:12:20 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 15.04.2025 um 21:07 schrieb Chris M. Thomasson:

    In real applications there is generally more going on in those critical sections vs your test... Well, I have seen some horror shows in my life.

    Again, think of a scenario where the lock is held. The thread signals... Another thread wakes up and has to instantly block on a wait morphing
    queue in the kernel.

    Not with glibc.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Thu Apr 17 08:59:09 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/15/2025 9:12 PM, Bonita Montero wrote:
    Am 15.04.2025 um 21:07 schrieb Chris M. Thomasson:

    In real applications there is generally more going on in those
    critical sections vs your test... Well, I have seen some horror shows
    in my life.

    Again, think of a scenario where the lock is held. The thread
    signals... Another thread wakes up and has to instantly block on a
    wait morphing queue in the kernel.

    Not with glibc.


    Sigh. I would need to see how glibc works internally. But that is
    besides the point. Try to signal/broadcast outside when possible.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Thu Apr 17 13:58:31 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 17.04.2025 um 00:59 schrieb Chris M. Thomasson:

    Sigh. I would need to see how glibc works internally. But that is
    besides the point. Try to signal/broadcast outside when possible.

    As I've shown that's not necessary with glibc; the number of context
    switches and the CPU time is nearly the same for both cases.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Thu Apr 17 15:26:21 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/16/2025 8:58 PM, Bonita Montero wrote:
    Am 17.04.2025 um 00:59 schrieb Chris M. Thomasson:

    Sigh. I would need to see how glibc works internally. But that is
    besides the point. Try to signal/broadcast outside when possible.

    As I've shown that's not necessary with glibc; the number of context
    switches and the CPU time is nearly the same for both cases.


    So, signal wherever you like! I don't care. I will signal outside when I
    can. That's that.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Thu Apr 17 21:20:40 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 17.04.2025 um 07:26 schrieb Chris M. Thomasson:

    So, signal wherever you like! I don't care. I will signal outside when I can. That's that.

    Of course you can, but it doesn't matter if you signal from outside or
    inside.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 18 03:51:58 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/17/2025 4:20 AM, Bonita Montero wrote:
    Am 17.04.2025 um 07:26 schrieb Chris M. Thomasson:

    So, signal wherever you like! I don't care. I will signal outside when
    I can. That's that.

    Of course you can, but it doesn't matter if you signal from outside or inside.


    The only thing I can say is that signalling, especially broadcasting,
    from the outside is ideal no matter what lib's you are using. If the lib
    has a very clever wait morph, so be it. Were are talking about scaling mutexes, so, well, ugggh.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 18 15:56:19 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 17.04.2025 um 19:51 schrieb Chris M. Thomasson:

    The only thing I can say is that signalling, especially broadcasting,
    from the outside is ideal no matter what lib's you are using. ..

    With broadcasting it also doesn't matter if you broadcast from inside
    or outside.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sat Apr 19 05:42:15 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/17/2025 10:56 PM, Bonita Montero wrote:
    Am 17.04.2025 um 19:51 schrieb Chris M. Thomasson:

    The only thing I can say is that signalling, especially broadcasting,
    from the outside is ideal no matter what lib's you are using. ..

    With broadcasting it also doesn't matter if you broadcast from inside
    or outside.


    We have to agree to disagree? Fair enough?

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sat Apr 19 06:05:09 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 18.04.2025 um 21:42 schrieb Chris M. Thomasson:
    On 4/17/2025 10:56 PM, Bonita Montero wrote:
    Am 17.04.2025 um 19:51 schrieb Chris M. Thomasson:

    The only thing I can say is that signalling, especially broadcasting,
    from the outside is ideal no matter what lib's you are using. ..

    With broadcasting it also doesn't matter if you broadcast from inside
    or outside.


    We have to agree to disagree? Fair enough?

    But there's one interesting fact to learn at last: broadcasting is more efficient than unicasting. That's while I have a wait -counter with my thread-queue and depending on how much items I've inserted in one run
    I broadcast (N >= waiting) or I multiple times have a unicast. A n-cast
    would be nice to have with that.

    template<typename Entity>
    template<std::forward_iterator ForwardIt>
    void thread_queue<Entity>::enqueue( ForwardIt &begin, ForwardIt end )
    requires std::convertible_to<std::iter_value_t<ForwardIt>, Entity>
    {
    using namespace std;
    // return if there's nothing to do
    if( begin == end ) [[unlikely]]
    return;
    lock_guard lock( m_mtx );
    // throw if there are emergencies
    if( m_producerEmergencies.size() ) [[unlikely]]
    throw thread_queue_emergency( *this, m_producerEmergencies );
    // number of items pushed so far
    size_t n = 0;
    // notify threads while unrolling
    defer notify( [&]
    {
    // no threads to be awakened ?
    if( !m_nWaiting )
    // yes: nothing to do
    return;
    // more items pushed than waiting ?
    if( n >= m_nWaiting )
    // yes: notify them all (doesn't throw)
    m_cv.notify_all();
    else
    // no: notify n threads
    do
    m_cv.notify_one();
    while( --n );
    } );
    do [[likely]]
    {
    // push and increment n
    m_queue.emplace_back( move( *begin ) );
    ++n;
    } while( ++begin != end );
    }

    defer is sth. like experimental::scope_exit.
    Usually there are more items inserted as threads listening so that
    I could do a broadcast, which is much more efficient.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sat Apr 19 19:25:05 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/18/2025 1:05 PM, Bonita Montero wrote:
    Am 18.04.2025 um 21:42 schrieb Chris M. Thomasson:
    On 4/17/2025 10:56 PM, Bonita Montero wrote:
    Am 17.04.2025 um 19:51 schrieb Chris M. Thomasson:

    The only thing I can say is that signalling, especially
    broadcasting, from the outside is ideal no matter what lib's you are
    using. ..

    With broadcasting it also doesn't matter if you broadcast from inside
    or outside.


    We have to agree to disagree? Fair enough?

    But there's one interesting fact to learn at last: broadcasting is more efficient than unicasting.

    Ugggg... Only broadcast when you absolutely have to! Not willy nilly.
    Argh! Anyway...

    [...]


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sat Apr 19 19:29:39 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 19.04.2025 um 11:25 schrieb Chris M. Thomasson:

    But there's one interesting fact to learn at last: broadcasting is more
    efficient than unicasting.

    Ugggg... Only broadcast when you absolutely have to! Not willy nilly.
    Argh! Anyway...

    That's not true. As you can see from my source I'm broadcasting when
    there are more or equal elements than waiting threads. That's much
    more efficient.
    And I don't wanted to say that broadcasting should be preferred
    mostly but I've measured with Windows and Linux that if broadcasting
    is eligible in the mentioned way it's more efficient even when there's
    only a single waiting threads.
    You should have dropped your objection if you first read my source.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sat Apr 19 20:10:10 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 2:29 AM, Bonita Montero wrote:
    Am 19.04.2025 um 11:25 schrieb Chris M. Thomasson:

    But there's one interesting fact to learn at last: broadcasting is more
    efficient than unicasting.

    Ugggg... Only broadcast when you absolutely have to! Not willy nilly.
    Argh! Anyway...

    That's not true.

    Uggg... A broadcast is a special case. Well, when would you use a
    broadcast vs a single signal? Ugggg...



    As you can see from my source I'm broadcasting when
    there are more or equal elements than waiting threads. That's much
    more efficient.
    And I don't wanted to say that broadcasting should be preferred
    mostly but I've measured with Windows and Linux that if broadcasting
    is eligible in the mentioned way it's more efficient even when there's
    only a single waiting threads.
    You should have dropped your objection if you first read my source.



    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sat Apr 19 20:12:49 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 3:10 AM, Chris M. Thomasson wrote:
    On 4/19/2025 2:29 AM, Bonita Montero wrote:
    Am 19.04.2025 um 11:25 schrieb Chris M. Thomasson:

    But there's one interesting fact to learn at last: broadcasting is more >>>> efficient than unicasting.

    Ugggg... Only broadcast when you absolutely have to! Not willy nilly.
    Argh! Anyway...

    That's not true.

    Uggg... A broadcast is a special case. Well, when would you use a
    broadcast vs a single signal? Ugggg...



    As you can see from my source I'm broadcasting when
    there are more or equal elements than waiting threads. That's much
    more efficient.
    And I don't wanted to say that broadcasting should be preferred
    mostly but I've measured with Windows and Linux that if broadcasting
    is eligible in the mentioned way it's more efficient even when there's
    only a single waiting threads.
    You should have dropped your objection if you first read my source.



    I have had to debug horror shows, where it mostly seems to work if we
    use a broadcast. Trying to cover up a racer in some senses. Ahh shit....

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sat Apr 19 22:09:50 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 19.04.2025 um 12:10 schrieb Chris M. Thomasson:

    Uggg... A broadcast is a special case. Well, when would you use a
    broadcast vs a single signal? Ugggg...

    If you have more items in the queue than there are waiting threads
    a broadcast is much more efficient. I suspect you have ADHD or a
    similar mental disorder that leads to hasty conclusions

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 20 00:34:14 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 5:09 AM, Bonita Montero wrote:
    Am 19.04.2025 um 12:10 schrieb Chris M. Thomasson:

    Uggg... A broadcast is a special case. Well, when would you use a
    broadcast vs a single signal? Ugggg...

    If you have more items in the queue than there are waiting threads
    a broadcast is much more efficient. I suspect you have ADHD or a
    similar mental disorder that leads to hasty conclusions


    Depends on the nature of the queue, and the nature of the wakes. A
    broadcast should be only when you _really_ need it!

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 20 00:35:34 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 7:34 AM, Chris M. Thomasson wrote:
    On 4/19/2025 5:09 AM, Bonita Montero wrote:
    Am 19.04.2025 um 12:10 schrieb Chris M. Thomasson:

    Uggg... A broadcast is a special case. Well, when would you use a
    broadcast vs a single signal? Ugggg...

    If you have more items in the queue than there are waiting threads
    a broadcast is much more efficient. I suspect you have ADHD or a
    similar mental disorder that leads to hasty conclusions


    Depends on the nature of the queue, and the nature of the wakes. A
    broadcast should be only when you _really_ need it!

    why broadcast inside of a locked mutex? all of those threads are going
    to fight, and/or go into the morph wrt the kernel Yawn.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 20 01:24:26 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 19.04.2025 um 16:34 schrieb Chris M. Thomasson:

    Depends on the nature of the queue, and the nature of the wakes.
    A broadcast should be only when you _really_ need it!

    You've got a problem with hasty conclusions.
    If you awake all awaking threads at once this is more efficient
    than awaking them individually. Any even if you have a single
    waiting thrad a notify_all() is more efficient with Linux and
    Windows.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 20 01:25:46 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 19.04.2025 um 16:55 schrieb Scott Lurndal:

    That fails on logical grounds, and is very dependent upon how
    many waiters exist at the time of the broadcast and how many
    processing elements are available.

    I do that only if the number of waiters is equal or smaller than
    the number of enqueued items.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 20 12:33:49 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 8:24 AM, Bonita Montero wrote:
    Am 19.04.2025 um 16:34 schrieb Chris M. Thomasson:

    Depends on the nature of the queue, and the nature of the wakes.
    A broadcast should be only when you _really_ need it!

    You've got a problem with hasty conclusions.
    If you awake all awaking threads at once this is more efficient
    than awaking them individually. Any even if you have a single
    waiting thrad a notify_all() is more efficient with Linux and
    Windows.


    Yawn.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 20 14:19:05 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 20.04.2025 um 00:59 schrieb Scott Lurndal:


    N-1 of the waiters will attempt to acquire the mutex and fail; ...

    With glibc only one thread is scheduled.

    causing unnecessary coherency traffic and unnecessary context
    switches.

    Silly.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 20 14:29:25 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    That's what I do after pushing n items to the queue:

    if( !m_nWaiting )
    return;
    if( n >= m_nWaiting )
    m_cv.notify_all();
    else
    do
    m_cv.notify_one();
    while( --n );


    What's more efficient, notifying all threads individually or
    notifying all threads at once ?

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 20 15:43:44 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 9:29 PM, Bonita Montero wrote:
    That's what I do after pushing n items to the queue:

    ˙˙˙˙˙˙˙˙˙˙˙ if( !m_nWaiting )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ return;
    ˙˙˙˙˙˙˙˙˙˙˙ if( n >= m_nWaiting )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_all();
    ˙˙˙˙˙˙˙˙˙˙˙ else
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_one();
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ while( --n );


    What's more efficient, notifying all threads individually or
    notifying all threads at once ?

    again, only broadcast hen you have to! wow.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 20 15:45:58 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 20.04.2025 um 07:43 schrieb Chris M. Thomasson:
    On 4/19/2025 9:29 PM, Bonita Montero wrote:
    That's what I do after pushing n items to the queue:

    ˙˙˙˙˙˙˙˙˙˙˙˙ if( !m_nWaiting )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ return;
    ˙˙˙˙˙˙˙˙˙˙˙˙ if( n >= m_nWaiting )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_all();
    ˙˙˙˙˙˙˙˙˙˙˙˙ else
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_one();
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ while( --n );


    What's more efficient, notifying all threads individually or
    notifying all threads at once ?

    again, only broadcast hen you have to! wow.

    You're simply silly. If I have more items in the queue
    than waiting threads a broadcast is more efficient.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 20 15:49:56 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 10:45 PM, Bonita Montero wrote:
    Am 20.04.2025 um 07:43 schrieb Chris M. Thomasson:
    On 4/19/2025 9:29 PM, Bonita Montero wrote:
    That's what I do after pushing n items to the queue:

    ˙˙˙˙˙˙˙˙˙˙˙˙ if( !m_nWaiting )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ return;
    ˙˙˙˙˙˙˙˙˙˙˙˙ if( n >= m_nWaiting )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_all();
    ˙˙˙˙˙˙˙˙˙˙˙˙ else
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_one();
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ while( --n );


    What's more efficient, notifying all threads individually or
    notifying all threads at once ?

    again, only broadcast hen you have to! wow.

    You're simply silly. If I have more items in the queue
    than waiting threads a broadcast is more efficient.

    Barf! Are you daft? Only broadcast when you need to and well, try to
    strive to do it outside of the locked region, when you can. wow. Daft Punk?

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 20 17:07:28 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 20.04.2025 um 07:49 schrieb Chris M. Thomasson:

    Barf! Are you daft? Only broadcast when you need to and well, try to
    strive to do it outside of the locked region, when you can. wow. Daft Punk?

    If I have just inserted N items in the queue and I have M waiting
    threads and N >= M a broadcast is more efficient since you have
    only one wakeup call and not not N.
    This doesn't lead to more context-switches or coherency-traffic
    as Scott mentioned.
    I measured the number of context switches and the overall CPU
    -time with Linux but you are only talking of things which are
    not thought to the end.



    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Apr 21 07:17:15 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/20/2025 12:07 AM, Bonita Montero wrote:
    Am 20.04.2025 um 07:49 schrieb Chris M. Thomasson:

    Barf! Are you daft? Only broadcast when you need to and well, try to
    strive to do it outside of the locked region, when you can. wow. Daft
    Punk?

    If I have just inserted N items in the queue and I have M waiting
    threads and N >= M a broadcast is more efficient since you have
    only one wakeup call and not not N.
    This doesn't lead to more context-switches or coherency-traffic
    as Scott mentioned.
    I measured the number of context switches and the overall CPU
    -time with Linux but you are only talking of things which are
    not thought to the end.



    Like I said, only broadcast when you have to! Simple.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Apr 21 07:18:50 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/19/2025 3:59 PM, Scott Lurndal wrote:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2025 um 16:55 schrieb Scott Lurndal:

    That fails on logical grounds, and is very dependent upon how
    many waiters exist at the time of the broadcast and how many
    processing elements are available.

    I do that only if the number of waiters is equal or smaller than
    the number of enqueued items.


    Doesn't matter. With broadcast, they're all scheduled and have
    you heard the term 'thundering herd'?

    Perhaps he does not know about it? Not totally sure.



    N-1 of the waiters will attempt to acquire the mutex and fail;
    causing unnecessary coherency traffic and unnecessary context
    switches.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Mon Apr 21 14:55:05 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 20.04.2025 um 23:18 schrieb Chris M. Thomasson:

    Doesn't matter.˙ With broadcast, they're all scheduled and have
    you heard the term 'thundering herd'?

    Perhaps he does not know about it? Not totally sure.

    There's no thundering herd with current condvar implementations.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Tue Apr 22 10:28:43 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/20/2025 9:55 PM, Bonita Montero wrote:
    Am 20.04.2025 um 23:18 schrieb Chris M. Thomasson:

    Doesn't matter.˙ With broadcast, they're all scheduled and have
    you heard the term 'thundering herd'?

    Perhaps he does not know about it? Not totally sure.

    There's no thundering herd with current condvar implementations.


    That makes no sense? Wait Morphing Doesn't Guarantee Elimination of the Thundering Herd...

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Tue Apr 22 15:16:38 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 22.04.2025 um 02:28 schrieb Chris M. Thomasson:

    That makes no sense? Wait Morphing Doesn't Guarantee Elimination of the Thundering Herd...

    Of course wait morphing helps here because with WM only one thread
    sees the mutex unlocked and not n threads.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Apr 23 07:36:44 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/21/2025 10:16 PM, Bonita Montero wrote:
    Am 22.04.2025 um 02:28 schrieb Chris M. Thomasson:

    That makes no sense? Wait Morphing Doesn't Guarantee Elimination of
    the Thundering Herd...

    Of course wait morphing helps here because with WM only one thread
    sees the mutex unlocked and not n threads.

    It was created to try to _help_ with thundering herd. There can be some interesting issues... Think of pushing 12 items in the queue, broadcast
    inside the mutex, then 42 threads go onto the morph list. Depending on implementation, any new threads will not be able to acquire the mutex
    until those 42 threads are out of the morph. This can be bad for several reasons... If a thread is in the morph its in line waiting for it... So,
    it's a bit screwed and can't do anything else. If another thread can
    acquire the mutex before the morph is done, then it's basically
    thundering heard all over again. Fwiw, trying to scale mutexes is sort
    of odd anyway. We can make the best mutex ever, but they have trouble
    scaling.

    Now, there is an interesting usage pattern I made a while back wrt
    falling back for a thread to do something else without waiting on a
    locked mutex. I posted some info about it before. It's akin to an
    adaptive mutex, however, instead of spinning, we can choose do other
    work, if its available...

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Wed Apr 23 15:13:13 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 22.04.2025 um 23:36 schrieb Chris M. Thomasson:

    It was created to try to _help_ with thundering herd. There can be some interesting issues... Think of pushing 12 items in the queue, broadcast inside the mutex, then 42 threads go onto the morph list. Depending on implementation, any new threads will not be able to acquire the mutex
    until those 42 threads are out of the morph. This can be bad for several reasons... If a thread is in the morph its in line waiting for it... So, it's a bit screwed and can't do anything else. If another thread can
    acquire the mutex before the morph is done, then it's basically
    thundering heard all over again. Fwiw, trying to scale mutexes is sort
    of odd anyway. We can make the best mutex ever, but they have trouble scaling.

    Do you have any source for your assumptions ?


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Wed Apr 23 15:46:34 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Now I wrote a little program to test if there's thundering herd problem
    with glibc's mutex / condvar. This it is:

    #include <iostream>
    #include <thread>
    #include <mutex>
    #include <condition_variable>
    #include <atomic>
    #include <semaphore>
    #include <vector>
    #include <sys/resource.h>

    using namespace std;

    int main()
    {
    constexpr size_t N = 10'000;
    int nClients = thread::hardware_concurrency() - 1;
    mutex mtx;
    int signalled = 0;
    condition_variable cv;
    atomic_int ai( 0 );
    binary_semaphore bs( false );
    vector<jthread> clients;
    atomic_int64_t nVoluntary( 0 );
    for( int c = nClients; c; --c )
    clients.emplace_back( [&]
    {
    for( size_t r = N; r; --r )
    {
    unique_lock lock( mtx );
    cv.wait( lock, [&] { return (bool)signalled; } );
    --signalled;
    lock.unlock();
    if( ai.fetch_sub( 1, memory_order_relaxed ) == 1 )
    bs.release( 1 );
    }
    rusage ru;
    getrusage( RUSAGE_THREAD, &ru );
    nVoluntary.fetch_add( ru.ru_nvcsw, memory_order_relaxed );
    } );
    for( size_t r = N; r; --r )
    {
    unique_lock lock( mtx );
    signalled = nClients;
    cv.notify_all();
    ai.store( nClients, memory_order_relaxed );
    lock.unlock();
    bs.acquire();
    }
    clients.resize( 0 );
    cout << N << " rounds," << endl;
    cout << (double)nVoluntary.load( memory_order_relaxed ) / nClients << "
    context switches pe thread" << endl;
    }

    It spawns one less threads than ther are hardware threads. These
    all wait for a condvar and a counter which is initially the number
    of threads and that must be > 0 for the wait to succeed. This counter
    is decremented by each thread. Then the threads decrement an atomic
    and if it becomes zero the last thread raises a semaphore, thereby
    waking up the main thread.
    This are the results for 10'000 rounds on a 32-thread machine:

    10000 rounds,
    2777.29 context switches pe thread

    So there are less context-switches than rounds and there's no
    thundering herd with glibc.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Thu Apr 24 05:08:54 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/22/2025 10:13 PM, Bonita Montero wrote:
    Am 22.04.2025 um 23:36 schrieb Chris M. Thomasson:

    It was created to try to _help_ with thundering herd. There can be
    some interesting issues... Think of pushing 12 items in the queue,
    broadcast inside the mutex, then 42 threads go onto the morph list.
    Depending on implementation, any new threads will not be able to
    acquire the mutex until those 42 threads are out of the morph. This
    can be bad for several reasons... If a thread is in the morph its in
    line waiting for it... So, it's a bit screwed and can't do anything
    else. If another thread can acquire the mutex before the morph is
    done, then it's basically thundering heard all over again. Fwiw,
    trying to scale mutexes is sort of odd anyway. We can make the best
    mutex ever, but they have trouble scaling.

    Do you have any source for your assumptions ?


    For trying to scale mutexes? Look up clever mutex solutions vs, say,
    RCU. They bite the dust.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Thu Apr 24 14:30:09 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:
    On 4/22/2025 10:13 PM, Bonita Montero wrote:
    Am 22.04.2025 um 23:36 schrieb Chris M. Thomasson:

    It was created to try to _help_ with thundering herd. There can be
    some interesting issues... Think of pushing 12 items in the queue,
    broadcast inside the mutex, then 42 threads go onto the morph list.
    Depending on implementation, any new threads will not be able to
    acquire the mutex until those 42 threads are out of the morph. This
    can be bad for several reasons... If a thread is in the morph its in
    line waiting for it... So, it's a bit screwed and can't do anything
    else. If another thread can acquire the mutex before the morph is
    done, then it's basically thundering heard all over again. Fwiw,
    trying to scale mutexes is sort of odd anyway. We can make the best
    mutex ever, but they have trouble scaling.

    Do you have any source for your assumptions ?


    For trying to scale mutexes? Look up clever mutex solutions vs, say,
    RCU. They bite the dust.

    You make random asssumptions. I've measured that there's not
    a thundering herd problem with condvars and glibc; look at the
    parallel posting in this thread.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 03:02:52 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:

    For trying to scale mutexes? Look up clever mutex solutions vs, say,
    RCU. They bite the dust.

    A mutex and a condvar is for producer-consumer-relationships;
    there's no way to handle that with RCU.
    And I've written a shared_obj<>-class, which is similar to shared_ptr<>
    and a thsared_obj<>, which is similar to an atomic<shared_ptr<>>. The
    latter uses a mutex but the pointer to the actual object is an atomic
    pointer. Before doing any locking while assigning a tshared_obj<> to
    a shared_obj<> I simply compare the pointer in the shared_obj<> with
    the pointer in the thared_obj<>, where the latter is loaded lazyly
    with relaxed_memory_order. As with RCU-like patterns the central
    tshared_obj<> is rarely updated but frequently compared in the men-
    tioned way. Only when the compare fails and the central tshared_obj<>
    has become poiting to a different object the mutex is locked. The most
    likely case that both pointers are equal takes only 1,5 nanoseconds on
    my computer; no need for tricks like URCU and I guess my solution is
    more efficient since the update is just several instructions and all participating cachelines stay in shared mode accross all cores to
    there's almost no interconnect-traffic.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 05:59:17 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/24/2025 10:02 AM, Bonita Montero wrote:
    Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:

    For trying to scale mutexes? Look up clever mutex solutions vs, say,
    RCU. They bite the dust.

    A mutex and a condvar is for producer-consumer-relationships;
    there's no way to handle that with RCU.

    Are you 100% sure about that?


    And I've written a shared_obj<>-class, which is similar to shared_ptr<>
    and a thsared_obj<>, which is similar to an atomic<shared_ptr<>>. The
    latter uses a mutex but the pointer to the actual object is an atomic pointer. Before doing any locking while assigning a tshared_obj<> to
    a shared_obj<> I simply compare the pointer in the shared_obj<> with
    the pointer in the thared_obj<>, where the latter is loaded lazyly
    with relaxed_memory_order. As with RCU-like patterns the central tshared_obj<> is rarely updated but frequently compared in the men-
    tioned way. Only when the compare fails and the central tshared_obj<>
    has become poiting to a different object the mutex is locked. The most
    likely case that both pointers are equal takes only 1,5 nanoseconds on
    my computer; no need for tricks like URCU and I guess my solution is
    more efficient since the update is just several instructions and all participating cachelines stay in shared mode accross all cores to
    there's almost no interconnect-traffic.


    Are you delusional? Well, you did say "I guess". So, perhaps not.

    Are you familiar with differential counting? No mutex involved. Btw, if
    you think that URCU needs to be reference counted in any way, you are
    wrong. I have no time right now to get into it, but shit happens.

    Sigh.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 06:00:51 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/23/2025 9:30 PM, Bonita Montero wrote:
    Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:
    On 4/22/2025 10:13 PM, Bonita Montero wrote:
    Am 22.04.2025 um 23:36 schrieb Chris M. Thomasson:

    It was created to try to _help_ with thundering herd. There can be
    some interesting issues... Think of pushing 12 items in the queue,
    broadcast inside the mutex, then 42 threads go onto the morph list.
    Depending on implementation, any new threads will not be able to
    acquire the mutex until those 42 threads are out of the morph. This
    can be bad for several reasons... If a thread is in the morph its in
    line waiting for it... So, it's a bit screwed and can't do anything
    else. If another thread can acquire the mutex before the morph is
    done, then it's basically thundering heard all over again. Fwiw,
    trying to scale mutexes is sort of odd anyway. We can make the best
    mutex ever, but they have trouble scaling.

    Do you have any source for your assumptions ?


    For trying to scale mutexes? Look up clever mutex solutions vs, say,
    RCU. They bite the dust.

    You make random asssumptions.

    LOL! You are projecting yourself on me now.


    I've measured that there's not
    a thundering herd problem with condvars and glibc; look at the
    parallel posting in this thread.

    Whatever you say man, whatever you say. wow.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 06:05:08 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/22/2025 10:46 PM, Bonita Montero wrote:
    Now I wrote a little program to test if there's thundering herd problem
    with glibc's mutex / condvar. This it is:

    #include <iostream>
    #include <thread>
    #include <mutex>
    #include <condition_variable>
    #include <atomic>
    #include <semaphore>
    #include <vector>
    #include <sys/resource.h>

    using namespace std;

    int main()
    {
    ˙˙˙˙constexpr size_t N = 10'000;
    ˙˙˙˙int nClients = thread::hardware_concurrency() - 1;
    ˙˙˙˙mutex mtx;
    ˙˙˙˙int signalled = 0;
    ˙˙˙˙condition_variable cv;
    ˙˙˙˙atomic_int ai( 0 );
    ˙˙˙˙binary_semaphore bs( false );
    ˙˙˙˙vector<jthread> clients;
    ˙˙˙˙atomic_int64_t nVoluntary( 0 );
    ˙˙˙˙for( int c = nClients; c; --c )
    ˙˙˙˙˙˙˙ clients.emplace_back( [&]
    ˙˙˙˙˙˙˙˙˙˙˙ {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ for( size_t r = N; r; --r )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ unique_lock lock( mtx );
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ cv.wait( lock, [&] { return (bool)signalled; } );
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ --signalled;
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ lock.unlock();
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if( ai.fetch_sub( 1, memory_order_relaxed ) == 1 )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ bs.release( 1 );
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ rusage ru;
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ getrusage( RUSAGE_THREAD, &ru );
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ nVoluntary.fetch_add( ru.ru_nvcsw, memory_order_relaxed );
    ˙˙˙˙˙˙˙˙˙˙˙ } );
    ˙˙˙˙for( size_t r = N; r; --r )
    ˙˙˙˙{
    ˙˙˙˙˙˙˙ unique_lock lock( mtx );
    ˙˙˙˙˙˙˙ signalled = nClients;
    ˙˙˙˙˙˙˙ cv.notify_all();
    ˙˙˙˙˙˙˙ ai.store( nClients, memory_order_relaxed );
    ˙˙˙˙˙˙˙ lock.unlock();
    ˙˙˙˙˙˙˙ bs.acquire();
    ˙˙˙˙}
    ˙˙˙˙clients.resize( 0 );
    ˙˙˙˙cout << N << " rounds," << endl;
    ˙˙˙˙cout << (double)nVoluntary.load( memory_order_relaxed ) / nClients
    << " context switches pe thread" << endl;
    }

    It spawns one less threads than ther are hardware threads. These
    all wait for a condvar and a counter which is initially the number
    of threads and that must be > 0 for the wait to succeed. This counter
    is decremented by each thread. Then the threads decrement an atomic
    and if it becomes zero the last thread raises a semaphore, thereby
    waking up the main thread.
    This are the results for 10'000 rounds on a 32-thread machine:

    ˙˙˙˙10000 rounds,
    ˙˙˙˙2777.29 context switches pe thread

    So there are less context-switches than rounds and there's no
    thundering herd with glibc.


    Sigh... Here is a challenge for you. Get it working in a race detector,
    say, Relacy?

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 06:15:57 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/24/2025 1:05 PM, Chris M. Thomasson wrote:
    On 4/22/2025 10:46 PM, Bonita Montero wrote:
    Now I wrote a little program to test if there's thundering herd problem
    with glibc's mutex / condvar. This it is:

    #include <iostream>
    #include <thread>
    #include <mutex>
    #include <condition_variable>
    #include <atomic>
    #include <semaphore>
    #include <vector>
    #include <sys/resource.h>

    using namespace std;

    int main()
    {
    ˙˙˙˙˙constexpr size_t N = 10'000;
    ˙˙˙˙˙int nClients = thread::hardware_concurrency() - 1;
    ˙˙˙˙˙mutex mtx;
    ˙˙˙˙˙int signalled = 0;
    ˙˙˙˙˙condition_variable cv;
    ˙˙˙˙˙atomic_int ai( 0 );
    ˙˙˙˙˙binary_semaphore bs( false );
    ˙˙˙˙˙vector<jthread> clients;
    ˙˙˙˙˙atomic_int64_t nVoluntary( 0 );
    ˙˙˙˙˙for( int c = nClients; c; --c )
    ˙˙˙˙˙˙˙˙ clients.emplace_back( [&]
    ˙˙˙˙˙˙˙˙˙˙˙˙ {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ for( size_t r = N; r; --r )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ unique_lock lock( mtx );
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ cv.wait( lock, [&] { return (bool)signalled; } );

    I must be missing something here. Where is your predicate for your
    cv.wait?

    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ --signalled;

    [...]


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 06:17:58 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    On 4/24/2025 1:15 PM, Chris M. Thomasson wrote:
    [...]
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ for( size_t r = N; r; --r )
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ unique_lock lock( mtx );
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ cv.wait( lock, [&] { return (bool)signalled; } );

    I must be missing something here.˙ Where is your predicate for your
    cv.wait?

    Oh shit. I see. { return (bool)signalled; } is it, right?

    Sorry.



    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ --signalled;

    [...]



    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 07:10:00 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 24.04.2025 um 22:05 schrieb Chris M. Thomasson:

    Sigh... Here is a challenge for you. Get it working in a race detector,
    say, Relacy?

    Not necessary, the code is trivial.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 07:11:57 2025
    Subject: Re: signalling a condvar from inside vs. signalling a condvar von
    outside

    Am 24.04.2025 um 21:59 schrieb Chris M. Thomasson:
    On 4/24/2025 10:02 AM, Bonita Montero wrote:
    Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:

    For trying to scale mutexes? Look up clever mutex solutions vs, say,
    RCU. They bite the dust.

    A mutex and a condvar is for producer-consumer-relationships;
    there's no way to handle that with RCU.

    Are you 100% sure about that?


    And I've written a shared_obj<>-class, which is similar to shared_ptr<>
    and a thsared_obj<>, which is similar to an atomic<shared_ptr<>>. The
    latter uses a mutex but the pointer to the actual object is an atomic
    pointer. Before doing any locking while assigning a tshared_obj<> to
    a shared_obj<> I simply compare the pointer in the shared_obj<> with
    the pointer in the thared_obj<>, where the latter is loaded lazyly
    with relaxed_memory_order. As with RCU-like patterns the central
    tshared_obj<> is rarely updated but frequently compared in the men-
    tioned way. Only when the compare fails and the central tshared_obj<>
    has become poiting to a different object the mutex is locked. The most
    likely case that both pointers are equal takes only 1,5 nanoseconds on
    my computer; no need for tricks like URCU and I guess my solution is
    more efficient since the update is just several instructions and all
    participating cachelines stay in shared mode accross all cores to
    there's almost no interconnect-traffic.


    Are you delusional? Well, you did say "I guess". So, perhaps not.

    1.5ns are only a few instruction; I guess that makes nearly no or
    absolutely no difference against a URCU-solution.


    Are you familiar with differential counting? No mutex involved. Btw, if
    you think that URCU needs to be reference counted in any way, you are
    wrong. I have no time right now to get into it, but shit happens.

    Sigh.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)