and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
On 5/26/2025 2:58 PM, jseigh wrote:
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
Interesting, thanks Joe! Fwiw, I think, something like FlushProcessWriteBuffers() should be in a new C++ std? Perhaps, allow a system to use DWCAS without resorting to not lock-free wrt the checks?
Damn it. ;^o Ouch.
https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/ nf-processthreadsapi-flushprocesswritebuffers
Sometimes I think that function was specifically designed for exotic
RCU, ect...
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
Joe Seigh
On 5/26/2025 2:58 PM, jseigh wrote:
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
Joe Seigh
Actually wrt:
_______________
hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
T* local = ptr.load(std::memory_order_relaxed);
hz_ptr.store(local, std::memory_order_relaxed);
_______________
For some damn reason, it kind of reminds me of an older way to use only single word exchange to push into a lock-free stack.
Iirc, something like this pseudo code for push:
_____________
cur->next = WAIT_STATE;
node* prev = exchange(head, cur);
cur->next = prev;
_____________
There is a "window" in there where cur->next will be WAIT_STATE. So,
when a thread iterating the list, say after pop all, flush, it can do a couple of spins or do something else during iteration on a cur->next
being in WAIT_STATE. It's a way to get exchange on push and pop of a stack.
The window is very small. I am having trouble finding on this group
where I posted about it before in a working program...
On 5/28/25 18:34, Chris M. Thomasson wrote:
On 5/26/2025 2:58 PM, jseigh wrote:
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
Joe Seigh
Actually wrt:
_______________
hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
T* local = ptr.load(std::memory_order_relaxed);
hz_ptr.store(local, std::memory_order_relaxed);
_______________
For some damn reason, it kind of reminds me of an older way to use
only single word exchange to push into a lock-free stack.
Iirc, something like this pseudo code for push:
_____________
cur->next = WAIT_STATE;
node* prev = exchange(head, cur);
cur->next = prev;
_____________
There is a "window" in there where cur->next will be WAIT_STATE. So,
when a thread iterating the list, say after pop all, flush, it can do
a couple of spins or do something else during iteration on a cur->next
being in WAIT_STATE. It's a way to get exchange on push and pop of a
stack.
The window is very small. I am having trouble finding on this group
where I posted about it before in a working program...
Push may wait-free but pop isn't event lock-free.
Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.
Joe Seigh
On 5/28/25 18:34, Chris M. Thomasson wrote:
On 5/26/2025 2:58 PM, jseigh wrote:
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
Joe Seigh
Actually wrt:
_______________
hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
T* local = ptr.load(std::memory_order_relaxed);
hz_ptr.store(local, std::memory_order_relaxed);
_______________
For some damn reason, it kind of reminds me of an older way to use
only single word exchange to push into a lock-free stack.
Iirc, something like this pseudo code for push:
_____________
cur->next = WAIT_STATE;
node* prev = exchange(head, cur);
cur->next = prev;
_____________
There is a "window" in there where cur->next will be WAIT_STATE. So,
when a thread iterating the list, say after pop all, flush, it can do
a couple of spins or do something else during iteration on a cur->next
being in WAIT_STATE. It's a way to get exchange on push and pop of a
stack.
The window is very small. I am having trouble finding on this group
where I posted about it before in a working program...
Push may wait-free but pop isn't event lock-free.
Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.
Joe Seigh
On 5/29/2025 2:17 PM, jseigh wrote:>>>
Push may wait-free but pop isn't event lock-free.
Actually, I like my futex stack experiment as is. normal CAS for push,
SWAP for pop. It needs to have multiple stacks and distribute across
them because popping all from a single stack can mean a thread gets too
much work. Push lock-free, pop wait-free.
Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.
On 6/1/25 16:31, Chris M. Thomasson wrote:
On 5/29/2025 2:17 PM, jseigh wrote:>>>
Push may wait-free but pop isn't event lock-free.
Well you actually have a lot of leeway in concurrent queue semantics.
Actually, I like my futex stack experiment as is. normal CAS for push,
SWAP for pop. It needs to have multiple stacks and distribute across
them because popping all from a single stack can mean a thread gets
too much work. Push lock-free, pop wait-free.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.
And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world applications which is one reason I not going to implement it.
Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.
I did post on my blog how to improve reclaim forward progress for the wait-free hazard pointer so it's equivalent to a proxy collector.
On 6/2/2025 3:07 PM, jseigh wrote:
On 6/1/25 16:31, Chris M. Thomasson wrote:
On 5/29/2025 2:17 PM, jseigh wrote:>>>Well you actually have a lot of leeway in concurrent queue semantics.
Push may wait-free but pop isn't event lock-free.
Actually, I like my futex stack experiment as is. normal CAS for
push, SWAP for pop. It needs to have multiple stacks and distribute
across them because popping all from a single stack can mean a thread
gets too much work. Push lock-free, pop wait-free.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.
And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world
applications which is one reason I not going to implement it.
Perhaps. However, using a loop-free exchange is "faster" than a loop
based CAS? On Intel, XCHG vs a CMPXCHG loop?
Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.
I did post on my blog how to improve reclaim forward progress for the
wait-free hazard pointer so it's equivalent to a proxy collector.
On 6/2/2025 3:07 PM, jseigh wrote:
On 6/1/25 16:31, Chris M. Thomasson wrote:
On 5/29/2025 2:17 PM, jseigh wrote:>>>Well you actually have a lot of leeway in concurrent queue semantics.
Push may wait-free but pop isn't event lock-free.
Actually, I like my futex stack experiment as is. normal CAS for
push, SWAP for pop. It needs to have multiple stacks and distribute
across them because popping all from a single stack can mean a thread
gets too much work. Push lock-free, pop wait-free.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.
And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world
applications which is one reason I not going to implement it.
Perhaps. However, using a loop-free exchange is "faster" than a loop
based CAS? On Intel, XCHG vs a CMPXCHG loop?
Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.
I did post on my blog how to improve reclaim forward progress for the
wait-free hazard pointer so it's equivalent to a proxy collector.
On 6/3/25 15:09, Chris M. Thomasson wrote:
On 6/2/2025 3:07 PM, jseigh wrote:
On 6/1/25 16:31, Chris M. Thomasson wrote:
On 5/29/2025 2:17 PM, jseigh wrote:>>>Well you actually have a lot of leeway in concurrent queue semantics.
Push may wait-free but pop isn't event lock-free.
Actually, I like my futex stack experiment as is. normal CAS for
push, SWAP for pop. It needs to have multiple stacks and distribute
across them because popping all from a single stack can mean a
thread gets too much work. Push lock-free, pop wait-free.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.
And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world
applications which is one reason I not going to implement it.
Perhaps. However, using a loop-free exchange is "faster" than a loop
based CAS? On Intel, XCHG vs a CMPXCHG loop?
Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.
I did post on my blog how to improve reclaim forward progress for the
wait-free hazard pointer so it's equivalent to a proxy collector.
I've been doing reader lock-free too long. I just realized how
trivially simple it is to do read write lock-free for almost any data structure. That's slightly embarrassing.
On 6/23/2025 4:03 AM, jseigh wrote:
I've been doing reader lock-free too long. I just realized how
trivially simple it is to do read write lock-free for almost any data
structure. That's slightly embarrassing.
:^) Indeed. But, its not a bad skill to have. Well, at least to me.
On 6/27/25 01:26, Chris M. Thomasson wrote:
On 6/23/2025 4:03 AM, jseigh wrote:
I've been doing reader lock-free too long. I just realized how
trivially simple it is to do read write lock-free for almost any data
structure. That's slightly embarrassing.
:^) Indeed. But, its not a bad skill to have. Well, at least to me.
I'm not talking about skill but a general lock-free technique for
adding or removing nodes from an arbitrary linked data structures.
It might be extendable to multiple nodes but I think there would
be restrictions.
I'm doing everything as though exercises these days. Looks good
so far. :)
How does the technique adapt to different methods of creating linked
data structures?
On 6/28/25 18:41, Chris M. Thomasson wrote:
How does the technique adapt to different methods of creating linked
data structures?
Working through that. It's low priority work. There's a bunch of
Java stuff I need to get to.
On 6/29/2025 9:16 AM, jseigh wrote:
On 6/28/25 18:41, Chris M. Thomasson wrote:
How does the technique adapt to different methods of creating linked
data structures?
Working through that. It's low priority work. There's a bunch of
Java stuff I need to get to.
On 7/4/25 17:56, Chris M. Thomasson wrote:
On 6/29/2025 9:16 AM, jseigh wrote:
On 6/28/25 18:41, Chris M. Thomasson wrote:
How does the technique adapt to different methods of creating linked
data structures?
Working through that. It's low priority work. There's a bunch of
Java stuff I need to get to.
I did figure out how to do certain operations on linked lists. It could
work on other linked data structures provided the operations met certain criteria. Something to use if I can think of any interesting
applications.
On 7/20/2025 8:25 AM, jseigh wrote:
On 7/4/25 17:56, Chris M. Thomasson wrote:
On 6/29/2025 9:16 AM, jseigh wrote:
On 6/28/25 18:41, Chris M. Thomasson wrote:
How does the technique adapt to different methods of creating
linked data structures?
Working through that. It's low priority work. There's a bunch of
Java stuff I need to get to.
I did figure out how to do certain operations on linked lists. It could
work on other linked data structures provided the operations met certain
criteria. Something to use if I can think of any interesting
applications.
Iirc, you made an old post on c.p.t many years ago about a somewhat
general case using RCU and/or your atomic-ptr-plus proxy collector, for linked node based tree algorithms in the past? Is this related to it?
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
On 5/26/25 17:58, jseigh wrote:
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
For some reason I though restartable sequences (RSEQ) couldn't be
inlined but apparently it can. I did a quick and dirty perf test
of hp protect and it looks like it runs 3x faster. But this will
only work on linux and only with asymmetric memory barriers.
On 7/27/2025 10:13 AM, jseigh wrote:
On 5/26/25 17:58, jseigh wrote:
and asymmetric memory barriers of course.
https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/
Looks like it runs faster without the conditional branch.
Not sure it's a new idea. The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.
For some reason I though restartable sequences (RSEQ) couldn't be
inlined but apparently it can. I did a quick and dirty perf test
of hp protect and it looks like it runs 3x faster. But this will
only work on linux and only with asymmetric memory barriers.
Please excuse my ignorance, by why only on Linux? windows has asymmetric memory barriers...
Sysop: | Tetrazocine |
---|---|
Location: | Melbourne, VIC, Australia |
Users: | 11 |
Nodes: | 8 (0 / 8) |
Uptime: | 50:06:32 |
Calls: | 166 |
Files: | 21,502 |
Messages: | 77,727 |