It nearly doesn't matter in terms of numbers of context switches if
you˙ signal a condvar from inside our outside. The above program run
on a Zen4-CPU with WSL2:
˙˙˙˙inside: 20130
˙˙˙˙outside: 19811
On a 28 core Skylake-CPU with Ubuntu:
˙˙˙˙inside: 19997
˙˙˙˙outside: 19888
There is a scalability problem wrt signalling inside the critical
section. Does your convdar impl use wait morphing?
Am 12.04.2025 um 21:33 schrieb Chris M. Thomasson:
There is a scalability problem wrt signalling inside the critical
section. Does your convdar impl use wait morphing?
There's no scalability problem with that since the kernel call to
release a thread happens only when the mutex is accessible *and*
the cv is signalled.
On 4/13/2025 8:38 AM, Bonita Montero wrote:
Am 12.04.2025 um 21:33 schrieb Chris M. Thomasson:
There is a scalability problem wrt signalling inside the critical
section. Does your convdar impl use wait morphing?
There's no scalability problem with that since the kernel call to
release a thread happens only when the mutex is accessible *and*
the cv is signalled.
No. ...
When you signal a condvar while holding the lock it means that waiters can wake and just instantly wait on the lock. This is why wait
morphing was created. It helps, but only so much...
Am 13.04.2025 um 21:32 schrieb Chris M. Thomasson:
On 4/13/2025 8:38 AM, Bonita Montero wrote:
Am 12.04.2025 um 21:33 schrieb Chris M. Thomasson:
There is a scalability problem wrt signalling inside the critical
section. Does your convdar impl use wait morphing?
There's no scalability problem with that since the kernel call to
release a thread happens only when the mutex is accessible *and*
the cv is signalled.
No. ...
The numer of context-switches my code shows say that there's only
one context-switchz per wait.
When you signal a condvar while holding the lock it means that
waiters can wake and just instantly wait on the lock. This is why wait
morphing was created. It helps, but only so much...
idiot.
You code is hard to read. ...
Signalling while locked or unlocked was an old debate. Think of signalling while holding the lock. A thread gets woken and immediately sees that the lock is held. Oh well. Wait morphing can help with that. However, signal outside when you can...
Am 13.04.2025 um 23:40 schrieb Chris M. Thomasson:
You code is hard to read. ...
The code is beautiful.
Signalling while locked or unlocked was an old debate. Think of
signalling
while holding the lock. A thread gets woken and immediately sees that the
lock is held. Oh well. Wait morphing˙ can help with that. However, signal
outside when you can...
The number of context-switches determined via getrusage() is twice per
loop iteration, i.e. on switch to the thread and one switch from the
thread; so everything is optimal with glibc.
In real applications there is generally more going on in those critical sections vs your test... Well, I have seen some horror shows in my life.
Again, think of a scenario where the lock is held. The thread signals... Another thread wakes up and has to instantly block on a wait morphing
queue in the kernel.
Am 15.04.2025 um 21:07 schrieb Chris M. Thomasson:
In real applications there is generally more going on in those
critical sections vs your test... Well, I have seen some horror shows
in my life.
Again, think of a scenario where the lock is held. The thread
signals... Another thread wakes up and has to instantly block on a
wait morphing queue in the kernel.
Not with glibc.
Sigh. I would need to see how glibc works internally. But that is
besides the point. Try to signal/broadcast outside when possible.
Am 17.04.2025 um 00:59 schrieb Chris M. Thomasson:
Sigh. I would need to see how glibc works internally. But that is
besides the point. Try to signal/broadcast outside when possible.
As I've shown that's not necessary with glibc; the number of context
switches and the CPU time is nearly the same for both cases.
So, signal wherever you like! I don't care. I will signal outside when I can. That's that.
Am 17.04.2025 um 07:26 schrieb Chris M. Thomasson:
So, signal wherever you like! I don't care. I will signal outside when
I can. That's that.
Of course you can, but it doesn't matter if you signal from outside or inside.
The only thing I can say is that signalling, especially broadcasting,
from the outside is ideal no matter what lib's you are using. ..
Am 17.04.2025 um 19:51 schrieb Chris M. Thomasson:
The only thing I can say is that signalling, especially broadcasting,
from the outside is ideal no matter what lib's you are using. ..
With broadcasting it also doesn't matter if you broadcast from inside
or outside.
On 4/17/2025 10:56 PM, Bonita Montero wrote:
Am 17.04.2025 um 19:51 schrieb Chris M. Thomasson:
The only thing I can say is that signalling, especially broadcasting,
from the outside is ideal no matter what lib's you are using. ..
With broadcasting it also doesn't matter if you broadcast from inside
or outside.
We have to agree to disagree? Fair enough?
Am 18.04.2025 um 21:42 schrieb Chris M. Thomasson:
On 4/17/2025 10:56 PM, Bonita Montero wrote:
Am 17.04.2025 um 19:51 schrieb Chris M. Thomasson:
The only thing I can say is that signalling, especially
broadcasting, from the outside is ideal no matter what lib's you are
using. ..
With broadcasting it also doesn't matter if you broadcast from inside
or outside.
We have to agree to disagree? Fair enough?
But there's one interesting fact to learn at last: broadcasting is more efficient than unicasting.
But there's one interesting fact to learn at last: broadcasting is more
efficient than unicasting.
Ugggg... Only broadcast when you absolutely have to! Not willy nilly.
Argh! Anyway...
Am 19.04.2025 um 11:25 schrieb Chris M. Thomasson:
But there's one interesting fact to learn at last: broadcasting is more
efficient than unicasting.
Ugggg... Only broadcast when you absolutely have to! Not willy nilly.
Argh! Anyway...
That's not true.
As you can see from my source I'm broadcasting when
there are more or equal elements than waiting threads. That's much
more efficient.
And I don't wanted to say that broadcasting should be preferred
mostly but I've measured with Windows and Linux that if broadcasting
is eligible in the mentioned way it's more efficient even when there's
only a single waiting threads.
You should have dropped your objection if you first read my source.
On 4/19/2025 2:29 AM, Bonita Montero wrote:
Am 19.04.2025 um 11:25 schrieb Chris M. Thomasson:
But there's one interesting fact to learn at last: broadcasting is more >>>> efficient than unicasting.
Ugggg... Only broadcast when you absolutely have to! Not willy nilly.
Argh! Anyway...
That's not true.
Uggg... A broadcast is a special case. Well, when would you use a
broadcast vs a single signal? Ugggg...
As you can see from my source I'm broadcasting when
there are more or equal elements than waiting threads. That's much
more efficient.
And I don't wanted to say that broadcasting should be preferred
mostly but I've measured with Windows and Linux that if broadcasting
is eligible in the mentioned way it's more efficient even when there's
only a single waiting threads.
You should have dropped your objection if you first read my source.
Uggg... A broadcast is a special case. Well, when would you use a
broadcast vs a single signal? Ugggg...
Am 19.04.2025 um 12:10 schrieb Chris M. Thomasson:
Uggg... A broadcast is a special case. Well, when would you use a
broadcast vs a single signal? Ugggg...
If you have more items in the queue than there are waiting threads
a broadcast is much more efficient. I suspect you have ADHD or a
similar mental disorder that leads to hasty conclusions
On 4/19/2025 5:09 AM, Bonita Montero wrote:
Am 19.04.2025 um 12:10 schrieb Chris M. Thomasson:
Uggg... A broadcast is a special case. Well, when would you use a
broadcast vs a single signal? Ugggg...
If you have more items in the queue than there are waiting threads
a broadcast is much more efficient. I suspect you have ADHD or a
similar mental disorder that leads to hasty conclusions
Depends on the nature of the queue, and the nature of the wakes. A
broadcast should be only when you _really_ need it!
Depends on the nature of the queue, and the nature of the wakes.
A broadcast should be only when you _really_ need it!
That fails on logical grounds, and is very dependent upon how
many waiters exist at the time of the broadcast and how many
processing elements are available.
Am 19.04.2025 um 16:34 schrieb Chris M. Thomasson:
Depends on the nature of the queue, and the nature of the wakes.
A broadcast should be only when you _really_ need it!
You've got a problem with hasty conclusions.
If you awake all awaking threads at once this is more efficient
than awaking them individually. Any even if you have a single
waiting thrad a notify_all() is more efficient with Linux and
Windows.
N-1 of the waiters will attempt to acquire the mutex and fail; ...
causing unnecessary coherency traffic and unnecessary context
switches.
That's what I do after pushing n items to the queue:
˙˙˙˙˙˙˙˙˙˙˙ if( !m_nWaiting )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ return;
˙˙˙˙˙˙˙˙˙˙˙ if( n >= m_nWaiting )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_all();
˙˙˙˙˙˙˙˙˙˙˙ else
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ do
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_one();
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ while( --n );
What's more efficient, notifying all threads individually or
notifying all threads at once ?
On 4/19/2025 9:29 PM, Bonita Montero wrote:
That's what I do after pushing n items to the queue:
˙˙˙˙˙˙˙˙˙˙˙˙ if( !m_nWaiting )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ return;
˙˙˙˙˙˙˙˙˙˙˙˙ if( n >= m_nWaiting )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_all();
˙˙˙˙˙˙˙˙˙˙˙˙ else
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ do
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_one();
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ while( --n );
What's more efficient, notifying all threads individually or
notifying all threads at once ?
again, only broadcast hen you have to! wow.
Am 20.04.2025 um 07:43 schrieb Chris M. Thomasson:
On 4/19/2025 9:29 PM, Bonita Montero wrote:
That's what I do after pushing n items to the queue:
˙˙˙˙˙˙˙˙˙˙˙˙ if( !m_nWaiting )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ return;
˙˙˙˙˙˙˙˙˙˙˙˙ if( n >= m_nWaiting )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_all();
˙˙˙˙˙˙˙˙˙˙˙˙ else
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ do
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ m_cv.notify_one();
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ while( --n );
What's more efficient, notifying all threads individually or
notifying all threads at once ?
again, only broadcast hen you have to! wow.
You're simply silly. If I have more items in the queue
than waiting threads a broadcast is more efficient.
Barf! Are you daft? Only broadcast when you need to and well, try to
strive to do it outside of the locked region, when you can. wow. Daft Punk?
Am 20.04.2025 um 07:49 schrieb Chris M. Thomasson:
Barf! Are you daft? Only broadcast when you need to and well, try to
strive to do it outside of the locked region, when you can. wow. Daft
Punk?
If I have just inserted N items in the queue and I have M waiting
threads and N >= M a broadcast is more efficient since you have
only one wakeup call and not not N.
This doesn't lead to more context-switches or coherency-traffic
as Scott mentioned.
I measured the number of context switches and the overall CPU
-time with Linux but you are only talking of things which are
not thought to the end.
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 19.04.2025 um 16:55 schrieb Scott Lurndal:
That fails on logical grounds, and is very dependent upon how
many waiters exist at the time of the broadcast and how many
processing elements are available.
I do that only if the number of waiters is equal or smaller than
the number of enqueued items.
Doesn't matter. With broadcast, they're all scheduled and have
you heard the term 'thundering herd'?
N-1 of the waiters will attempt to acquire the mutex and fail;
causing unnecessary coherency traffic and unnecessary context
switches.
Doesn't matter.˙ With broadcast, they're all scheduled and have
you heard the term 'thundering herd'?
Perhaps he does not know about it? Not totally sure.
Am 20.04.2025 um 23:18 schrieb Chris M. Thomasson:
Doesn't matter.˙ With broadcast, they're all scheduled and have
you heard the term 'thundering herd'?
Perhaps he does not know about it? Not totally sure.
There's no thundering herd with current condvar implementations.
That makes no sense? Wait Morphing Doesn't Guarantee Elimination of the Thundering Herd...
Am 22.04.2025 um 02:28 schrieb Chris M. Thomasson:
That makes no sense? Wait Morphing Doesn't Guarantee Elimination of
the Thundering Herd...
Of course wait morphing helps here because with WM only one thread
sees the mutex unlocked and not n threads.
It was created to try to _help_ with thundering herd. There can be some interesting issues... Think of pushing 12 items in the queue, broadcast inside the mutex, then 42 threads go onto the morph list. Depending on implementation, any new threads will not be able to acquire the mutex
until those 42 threads are out of the morph. This can be bad for several reasons... If a thread is in the morph its in line waiting for it... So, it's a bit screwed and can't do anything else. If another thread can
acquire the mutex before the morph is done, then it's basically
thundering heard all over again. Fwiw, trying to scale mutexes is sort
of odd anyway. We can make the best mutex ever, but they have trouble scaling.
Am 22.04.2025 um 23:36 schrieb Chris M. Thomasson:
It was created to try to _help_ with thundering herd. There can be
some interesting issues... Think of pushing 12 items in the queue,
broadcast inside the mutex, then 42 threads go onto the morph list.
Depending on implementation, any new threads will not be able to
acquire the mutex until those 42 threads are out of the morph. This
can be bad for several reasons... If a thread is in the morph its in
line waiting for it... So, it's a bit screwed and can't do anything
else. If another thread can acquire the mutex before the morph is
done, then it's basically thundering heard all over again. Fwiw,
trying to scale mutexes is sort of odd anyway. We can make the best
mutex ever, but they have trouble scaling.
Do you have any source for your assumptions ?
On 4/22/2025 10:13 PM, Bonita Montero wrote:
Am 22.04.2025 um 23:36 schrieb Chris M. Thomasson:
It was created to try to _help_ with thundering herd. There can be
some interesting issues... Think of pushing 12 items in the queue,
broadcast inside the mutex, then 42 threads go onto the morph list.
Depending on implementation, any new threads will not be able to
acquire the mutex until those 42 threads are out of the morph. This
can be bad for several reasons... If a thread is in the morph its in
line waiting for it... So, it's a bit screwed and can't do anything
else. If another thread can acquire the mutex before the morph is
done, then it's basically thundering heard all over again. Fwiw,
trying to scale mutexes is sort of odd anyway. We can make the best
mutex ever, but they have trouble scaling.
Do you have any source for your assumptions ?
For trying to scale mutexes? Look up clever mutex solutions vs, say,
RCU. They bite the dust.
For trying to scale mutexes? Look up clever mutex solutions vs, say,
RCU. They bite the dust.
Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:
For trying to scale mutexes? Look up clever mutex solutions vs, say,
RCU. They bite the dust.
A mutex and a condvar is for producer-consumer-relationships;
there's no way to handle that with RCU.
And I've written a shared_obj<>-class, which is similar to shared_ptr<>
and a thsared_obj<>, which is similar to an atomic<shared_ptr<>>. The
latter uses a mutex but the pointer to the actual object is an atomic pointer. Before doing any locking while assigning a tshared_obj<> to
a shared_obj<> I simply compare the pointer in the shared_obj<> with
the pointer in the thared_obj<>, where the latter is loaded lazyly
with relaxed_memory_order. As with RCU-like patterns the central tshared_obj<> is rarely updated but frequently compared in the men-
tioned way. Only when the compare fails and the central tshared_obj<>
has become poiting to a different object the mutex is locked. The most
likely case that both pointers are equal takes only 1,5 nanoseconds on
my computer; no need for tricks like URCU and I guess my solution is
more efficient since the update is just several instructions and all participating cachelines stay in shared mode accross all cores to
there's almost no interconnect-traffic.
Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:
On 4/22/2025 10:13 PM, Bonita Montero wrote:
Am 22.04.2025 um 23:36 schrieb Chris M. Thomasson:
It was created to try to _help_ with thundering herd. There can be
some interesting issues... Think of pushing 12 items in the queue,
broadcast inside the mutex, then 42 threads go onto the morph list.
Depending on implementation, any new threads will not be able to
acquire the mutex until those 42 threads are out of the morph. This
can be bad for several reasons... If a thread is in the morph its in
line waiting for it... So, it's a bit screwed and can't do anything
else. If another thread can acquire the mutex before the morph is
done, then it's basically thundering heard all over again. Fwiw,
trying to scale mutexes is sort of odd anyway. We can make the best
mutex ever, but they have trouble scaling.
Do you have any source for your assumptions ?
For trying to scale mutexes? Look up clever mutex solutions vs, say,
RCU. They bite the dust.
You make random asssumptions.
I've measured that there's not
a thundering herd problem with condvars and glibc; look at the
parallel posting in this thread.
Now I wrote a little program to test if there's thundering herd problem
with glibc's mutex / condvar. This it is:
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <semaphore>
#include <vector>
#include <sys/resource.h>
using namespace std;
int main()
{
˙˙˙˙constexpr size_t N = 10'000;
˙˙˙˙int nClients = thread::hardware_concurrency() - 1;
˙˙˙˙mutex mtx;
˙˙˙˙int signalled = 0;
˙˙˙˙condition_variable cv;
˙˙˙˙atomic_int ai( 0 );
˙˙˙˙binary_semaphore bs( false );
˙˙˙˙vector<jthread> clients;
˙˙˙˙atomic_int64_t nVoluntary( 0 );
˙˙˙˙for( int c = nClients; c; --c )
˙˙˙˙˙˙˙ clients.emplace_back( [&]
˙˙˙˙˙˙˙˙˙˙˙ {
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ for( size_t r = N; r; --r )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ {
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ unique_lock lock( mtx );
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ cv.wait( lock, [&] { return (bool)signalled; } );
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ --signalled;
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ lock.unlock();
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if( ai.fetch_sub( 1, memory_order_relaxed ) == 1 )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ bs.release( 1 );
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ }
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ rusage ru;
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ getrusage( RUSAGE_THREAD, &ru );
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ nVoluntary.fetch_add( ru.ru_nvcsw, memory_order_relaxed );
˙˙˙˙˙˙˙˙˙˙˙ } );
˙˙˙˙for( size_t r = N; r; --r )
˙˙˙˙{
˙˙˙˙˙˙˙ unique_lock lock( mtx );
˙˙˙˙˙˙˙ signalled = nClients;
˙˙˙˙˙˙˙ cv.notify_all();
˙˙˙˙˙˙˙ ai.store( nClients, memory_order_relaxed );
˙˙˙˙˙˙˙ lock.unlock();
˙˙˙˙˙˙˙ bs.acquire();
˙˙˙˙}
˙˙˙˙clients.resize( 0 );
˙˙˙˙cout << N << " rounds," << endl;
˙˙˙˙cout << (double)nVoluntary.load( memory_order_relaxed ) / nClients
<< " context switches pe thread" << endl;
}
It spawns one less threads than ther are hardware threads. These
all wait for a condvar and a counter which is initially the number
of threads and that must be > 0 for the wait to succeed. This counter
is decremented by each thread. Then the threads decrement an atomic
and if it becomes zero the last thread raises a semaphore, thereby
waking up the main thread.
This are the results for 10'000 rounds on a 32-thread machine:
˙˙˙˙10000 rounds,
˙˙˙˙2777.29 context switches pe thread
So there are less context-switches than rounds and there's no
thundering herd with glibc.
On 4/22/2025 10:46 PM, Bonita Montero wrote:
Now I wrote a little program to test if there's thundering herd problem
with glibc's mutex / condvar. This it is:
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <semaphore>
#include <vector>
#include <sys/resource.h>
using namespace std;
int main()
{
˙˙˙˙˙constexpr size_t N = 10'000;
˙˙˙˙˙int nClients = thread::hardware_concurrency() - 1;
˙˙˙˙˙mutex mtx;
˙˙˙˙˙int signalled = 0;
˙˙˙˙˙condition_variable cv;
˙˙˙˙˙atomic_int ai( 0 );
˙˙˙˙˙binary_semaphore bs( false );
˙˙˙˙˙vector<jthread> clients;
˙˙˙˙˙atomic_int64_t nVoluntary( 0 );
˙˙˙˙˙for( int c = nClients; c; --c )
˙˙˙˙˙˙˙˙ clients.emplace_back( [&]
˙˙˙˙˙˙˙˙˙˙˙˙ {
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ for( size_t r = N; r; --r )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ {
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ unique_lock lock( mtx );
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ cv.wait( lock, [&] { return (bool)signalled; } );
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ --signalled;
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ for( size_t r = N; r; --r )
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ {
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ unique_lock lock( mtx );
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ cv.wait( lock, [&] { return (bool)signalled; } );
I must be missing something here.˙ Where is your predicate for your
cv.wait?
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ --signalled;
[...]
Sigh... Here is a challenge for you. Get it working in a race detector,
say, Relacy?
On 4/24/2025 10:02 AM, Bonita Montero wrote:
Am 23.04.2025 um 21:08 schrieb Chris M. Thomasson:
For trying to scale mutexes? Look up clever mutex solutions vs, say,
RCU. They bite the dust.
A mutex and a condvar is for producer-consumer-relationships;
there's no way to handle that with RCU.
Are you 100% sure about that?
And I've written a shared_obj<>-class, which is similar to shared_ptr<>
and a thsared_obj<>, which is similar to an atomic<shared_ptr<>>. The
latter uses a mutex but the pointer to the actual object is an atomic
pointer. Before doing any locking while assigning a tshared_obj<> to
a shared_obj<> I simply compare the pointer in the shared_obj<> with
the pointer in the thared_obj<>, where the latter is loaded lazyly
with relaxed_memory_order. As with RCU-like patterns the central
tshared_obj<> is rarely updated but frequently compared in the men-
tioned way. Only when the compare fails and the central tshared_obj<>
has become poiting to a different object the mutex is locked. The most
likely case that both pointers are equal takes only 1,5 nanoseconds on
my computer; no need for tricks like URCU and I guess my solution is
more efficient since the update is just several instructions and all
participating cachelines stay in shared mode accross all cores to
there's almost no interconnect-traffic.
Are you delusional? Well, you did say "I guess". So, perhaps not.
Are you familiar with differential counting? No mutex involved. Btw, if
you think that URCU needs to be reference counted in any way, you are
wrong. I have no time right now to get into it, but shit happens.
Sigh.
Sysop: | Tetrazocine |
---|---|
Location: | Melbourne, VIC, Australia |
Users: | 7 |
Nodes: | 8 (0 / 8) |
Uptime: | 173:49:53 |
Calls: | 154 |
Files: | 21,500 |
Messages: | 76,570 |