Forum: d0p3 BBS

Wait-free Hazard Pointers Using Std Atomics

From jseigh@3:633/280.2 to All on Tue May 27 07:58:58 2025

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers-using-std-atomics/

Looks like it runs faster without the conditional branch.

Not sure it's a new idea. The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.

Joe Seigh

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Chris M. Thomasson@3:633/280.2 to All on Tue May 27 09:17:24 2025

On 5/26/2025 2:58 PM, jseigh wrote:

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

Looks like it runs faster without the conditional branch.

Not sure it's a new idea.� The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.

Interesting, thanks Joe! Fwiw, I think, something like FlushProcessWriteBuffers() should be in a new C++ std? Perhaps, allow a
system to use DWCAS without resorting to not lock-free wrt the checks?
Damn it. ;^o Ouch.

https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-flushprocesswritebuffers

Sometimes I think that function was specifically designed for exotic
RCU, ect...

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From jseigh@3:633/280.2 to All on Wed May 28 02:26:10 2025

On 5/26/25 19:17, Chris M. Thomasson wrote:

On 5/26/2025 2:58 PM, jseigh wrote:

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/

Looks like it runs faster without the conditional branch.

Not sure it's a new idea.� The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.

Interesting, thanks Joe! Fwiw, I think, something like FlushProcessWriteBuffers() should be in a new C++ std? Perhaps, allow a system to use DWCAS without resorting to not lock-free wrt the checks?
Damn it. ;^o Ouch.

https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/ nf-processthreadsapi-flushprocesswritebuffers

Sometimes I think that function was specifically designed for exotic
RCU, ect...

It's basically the equivalent of linux membarrier(). I don't know
why they just focus on writes, it's a full membar AFAIK.

As far as the wait-free hazard pointer goes, I though there would
be forward progress issues with the reader threads but it's not
an issue other schemes like URCU has with quiescent states.

Joe Seigh

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From jseigh@3:633/280.2 to All on Thu May 29 04:09:05 2025

On 5/26/25 17:58, jseigh wrote:

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

Added an additional post with a bit more explanation.

Joe Seigh

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Chris M. Thomasson@3:633/280.2 to All on Thu May 29 08:34:05 2025

On 5/26/2025 2:58 PM, jseigh wrote:

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

Looks like it runs faster without the conditional branch.

Not sure it's a new idea.� The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.

Joe Seigh

Actually wrt:
_______________
hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
T* local = ptr.load(std::memory_order_relaxed);
hz_ptr.store(local, std::memory_order_relaxed);
_______________

For some damn reason, it kind of reminds me of an older way to use only
single word exchange to push into a lock-free stack.

Iirc, something like this pseudo code for push:
_____________
cur->next = WAIT_STATE;
node* prev = exchange(head, cur);
cur->next = prev;
_____________

There is a "window" in there where cur->next will be WAIT_STATE. So,
when a thread iterating the list, say after pop all, flush, it can do a
couple of spins or do something else during iteration on a cur->next
being in WAIT_STATE. It's a way to get exchange on push and pop of a stack.

The window is very small. I am having trouble finding on this group
where I posted about it before in a working program...

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From jseigh@3:633/280.2 to All on Fri May 30 07:17:08 2025

On 5/28/25 18:34, Chris M. Thomasson wrote:

On 5/26/2025 2:58 PM, jseigh wrote:

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/

Looks like it runs faster without the conditional branch.

Not sure it's a new idea.� The only wait-free version of hazard pointers
I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.

Joe Seigh

Actually wrt:
_______________
hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
T* local = ptr.load(std::memory_order_relaxed);
hz_ptr.store(local, std::memory_order_relaxed);
_______________

For some damn reason, it kind of reminds me of an older way to use only single word exchange to push into a lock-free stack.

Iirc, something like this pseudo code for push:
_____________
cur->next = WAIT_STATE;
node* prev = exchange(head, cur);
cur->next = prev;
_____________

There is a "window" in there where cur->next will be WAIT_STATE. So,
when a thread iterating the list, say after pop all, flush, it can do a couple of spins or do something else during iteration on a cur->next
being in WAIT_STATE. It's a way to get exchange on push and pop of a stack.

The window is very small. I am having trouble finding on this group
where I posted about it before in a working program...

Push may wait-free but pop isn't event lock-free.

Anyway, after some consideration, I'm off on this
wait-free hazard pointer. The lock-free version is
more than performant.

Joe Seigh

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Chris M. Thomasson@3:633/280.2 to All on Mon Jun 2 06:19:32 2025

On 5/29/2025 2:17 PM, jseigh wrote:

On 5/28/25 18:34, Chris M. Thomasson wrote:

On 5/26/2025 2:58 PM, jseigh wrote:

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/

Looks like it runs faster without the conditional branch.

Not sure it's a new idea.� The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.

Joe Seigh

Actually wrt:
_______________
hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
T* local = ptr.load(std::memory_order_relaxed);
hz_ptr.store(local, std::memory_order_relaxed);
_______________

For some damn reason, it kind of reminds me of an older way to use
only single word exchange to push into a lock-free stack.

Iirc, something like this pseudo code for push:
_____________
cur->next = WAIT_STATE;
node* prev = exchange(head, cur);
cur->next = prev;
_____________

There is a "window" in there where cur->next will be WAIT_STATE. So,
when a thread iterating the list, say after pop all, flush, it can do
a couple of spins or do something else during iteration on a cur->next
being in WAIT_STATE. It's a way to get exchange on push and pop of a
stack.

The window is very small. I am having trouble finding on this group
where I posted about it before in a working program...

Push may wait-free but pop isn't event lock-free.

Well, the fast path of the "special" stack pop is a simple exchange, so
that action should be wait-free? I need to code this up, but I am
thinking along the lines of, wrt integrating the single exchange push
with the futex, might be something like this crude pseudo-code. I just
need to implement it.

cur->next = WAIT_STATE;

node* prev = head.exchange(cur);

if (! prev || prev == WAIT_FUTEX)
{
cur->next = nullptr;
}

else
{
cur->next = prev;
}

// now, wait state is cleared.

if (prev == WAIT_FUTEX)
signal_futex;

Now during iteration on flush, we can detect a WAIT_STATE while
iterating through the flushed stack. It might work. Just is just off the
top of my head.

Anyway, after some consideration, I'm off on this
wait-free hazard pointer.� The lock-free version is
more than performant.

Joe Seigh

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Chris M. Thomasson@3:633/280.2 to All on Mon Jun 2 06:31:20 2025

On 5/29/2025 2:17 PM, jseigh wrote:

On 5/28/25 18:34, Chris M. Thomasson wrote:

On 5/26/2025 2:58 PM, jseigh wrote:

and asymmetric memory barriers of course.

https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
pointers- using-std-atomics/

Looks like it runs faster without the conditional branch.

Not sure it's a new idea.� The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
might not be available on all platforms.

Joe Seigh

Actually wrt:
_______________
hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
T* local = ptr.load(std::memory_order_relaxed);
hz_ptr.store(local, std::memory_order_relaxed);
_______________

For some damn reason, it kind of reminds me of an older way to use
only single word exchange to push into a lock-free stack.

Iirc, something like this pseudo code for push:
_____________
cur->next = WAIT_STATE;
node* prev = exchange(head, cur);
cur->next = prev;
_____________

There is a "window" in there where cur->next will be WAIT_STATE. So,
when a thread iterating the list, say after pop all, flush, it can do
a couple of spins or do something else during iteration on a cur->next
being in WAIT_STATE. It's a way to get exchange on push and pop of a
stack.

The window is very small. I am having trouble finding on this group
where I posted about it before in a working program...

Push may wait-free but pop isn't event lock-free.

Well, yeah in a sense, keep in mind that the fast path for the pop is a
single exchange. I guess it depends on how we process the nodes with a
stack that can have a window for a next pointer to be in a WAIT_STATE
wrt the fun goal (experiment) of having exchange for push and exchange
for pop, no CAS, let alone DWCAS. One way, is instead of a worker thread spinning on a WAIT_STATE during iteration, the node has another next
pointer so the thread can link the waits into a per-thread stack. Then
try something else, or switch processing to another stack. It pushes the
node with a WAIT_STATE into a per-thread stack, then tries another stack
in the distributed multi-stack. If it does execute a unit of work, that
right there can be a natural backoff to trigger it to try the per-thread
stack of wait states. However, this would make a node have two node
links instead of one. It just might be fun to mock up and run.

Actually, I like my futex stack experiment as is. normal CAS for push,
SWAP for pop. It needs to have multiple stacks and distribute across
them because popping all from a single stack can mean a thread gets too
much work. Push lock-free, pop wait-free.

Anyway, after some consideration, I'm off on this
wait-free hazard pointer.� The lock-free version is
more than performant.

Joe Seigh

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From jseigh@3:633/280.2 to All on Tue Jun 3 08:07:22 2025

On 6/1/25 16:31, Chris M. Thomasson wrote:

On 5/29/2025 2:17 PM, jseigh wrote:>>>

Push may wait-free but pop isn't event lock-free.

Actually, I like my futex stack experiment as is. normal CAS for push,
SWAP for pop. It needs to have multiple stacks and distribute across
them because popping all from a single stack can mean a thread gets too
much work. Push lock-free, pop wait-free.

Well you actually have a lot of leeway in concurrent queue semantics.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.

And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world applications which is one reason I not going to implement it.

Anyway, after some consideration, I'm off on this
wait-free hazard pointer.� The lock-free version is
more than performant.

I did post on my blog how to improve reclaim forward progress for the
wait-free hazard pointer so it's equivalent to a proxy collector.

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Chris M. Thomasson@3:633/280.2 to All on Wed Jun 4 05:09:09 2025

On 6/2/2025 3:07 PM, jseigh wrote:

On 6/1/25 16:31, Chris M. Thomasson wrote:

On 5/29/2025 2:17 PM, jseigh wrote:>>>

Push may wait-free but pop isn't event lock-free.

Actually, I like my futex stack experiment as is. normal CAS for push,
SWAP for pop. It needs to have multiple stacks and distribute across
them because popping all from a single stack can mean a thread gets
too much work. Push lock-free, pop wait-free.

Well you actually have a lot of leeway in concurrent queue semantics.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.

And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world applications which is one reason I not going to implement it.

Perhaps. However, using a loop-free exchange is "faster" than a loop
based CAS? On Intel, XCHG vs a CMPXCHG loop?

Anyway, after some consideration, I'm off on this
wait-free hazard pointer.� The lock-free version is
more than performant.

I did post on my blog how to improve reclaim forward progress for the wait-free hazard pointer so it's equivalent to a proxy collector.

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Chris M. Thomasson@3:633/280.2 to All on Tue Jun 10 18:59:21 2025

On 6/3/2025 12:09 PM, Chris M. Thomasson wrote:

On 6/2/2025 3:07 PM, jseigh wrote:

On 6/1/25 16:31, Chris M. Thomasson wrote:

On 5/29/2025 2:17 PM, jseigh wrote:>>>

Push may wait-free but pop isn't event lock-free.

Actually, I like my futex stack experiment as is. normal CAS for
push, SWAP for pop. It needs to have multiple stacks and distribute
across them because popping all from a single stack can mean a thread
gets too much work. Push lock-free, pop wait-free.

Well you actually have a lot of leeway in concurrent queue semantics.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.

And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world
applications which is one reason I not going to implement it.

Perhaps. However, using a loop-free exchange is "faster" than a loop
based CAS? On Intel, XCHG vs a CMPXCHG loop?

Anyway, after some consideration, I'm off on this
wait-free hazard pointer.� The lock-free version is
more than performant.

I did post on my blog how to improve reclaim forward progress for the
wait-free hazard pointer so it's equivalent to a proxy collector.

Ummm. This is a working lock-free stack from an AI, grok. Oh wow.

// Lock-free stack for fallback
template<typename T>
class LockFreeStack {
private:
struct Node {
T data;
std::atomic<Node*> next;
Node(const T& d) : data(d), next(nullptr) {}
};
std::atomic<Node*> head;

public:
LockFreeStack() : head(nullptr) {}

void Push(T data) {
Node* new_node = new Node(data);
Node* old_head = head.load();
do {
new_node->next = old_head;
} while (!head.compare_exchange_weak(old_head, new_node));
}

bool Pop(T& data) {
Node* old_head = head.load();
while (old_head && !head.compare_exchange_weak(old_head, old_head->next)) {}
if (old_head) {
data = old_head->data;
delete old_head;
return true;
}
return false;
}
};

WTF!!!!!

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From jseigh@3:633/280.2 to All on Mon Jun 23 21:03:13 2025

On 6/3/25 15:09, Chris M. Thomasson wrote:

On 6/2/2025 3:07 PM, jseigh wrote:

On 6/1/25 16:31, Chris M. Thomasson wrote:

On 5/29/2025 2:17 PM, jseigh wrote:>>>

Push may wait-free but pop isn't event lock-free.

Actually, I like my futex stack experiment as is. normal CAS for
push, SWAP for pop. It needs to have multiple stacks and distribute
across them because popping all from a single stack can mean a thread
gets too much work. Push lock-free, pop wait-free.

Well you actually have a lot of leeway in concurrent queue semantics.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.

And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world
applications which is one reason I not going to implement it.

Perhaps. However, using a loop-free exchange is "faster" than a loop
based CAS? On Intel, XCHG vs a CMPXCHG loop?

Anyway, after some consideration, I'm off on this
wait-free hazard pointer.� The lock-free version is
more than performant.

I did post on my blog how to improve reclaim forward progress for the
wait-free hazard pointer so it's equivalent to a proxy collector.

I've been doing reader lock-free too long. I just realized how
trivially simple it is to do read write lock-free for almost any data structure. That's slightly embarrassing.

Joe Seigh

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Chris M. Thomasson@3:633/280.2 to All on Fri Jun 27 15:26:53 2025

On 6/23/2025 4:03 AM, jseigh wrote:

On 6/3/25 15:09, Chris M. Thomasson wrote:

On 6/2/2025 3:07 PM, jseigh wrote:

On 6/1/25 16:31, Chris M. Thomasson wrote:

On 5/29/2025 2:17 PM, jseigh wrote:>>>

Push may wait-free but pop isn't event lock-free.

Actually, I like my futex stack experiment as is. normal CAS for
push, SWAP for pop. It needs to have multiple stacks and distribute
across them because popping all from a single stack can mean a
thread gets too much work. Push lock-free, pop wait-free.

Well you actually have a lot of leeway in concurrent queue semantics.
Drop the FIFO requirements since they're not really observable for
all practical purposes and just concentrate on throughput forward
progress.

And wait-free is highly overrated. It seems to be important only in
theses, technical papers and obscure blogs, not so much is real world
applications which is one reason I not going to implement it.

Perhaps. However, using a loop-free exchange is "faster" than a loop
based CAS? On Intel, XCHG vs a CMPXCHG loop?

Anyway, after some consideration, I'm off on this
wait-free hazard pointer.� The lock-free version is
more than performant.

I did post on my blog how to improve reclaim forward progress for the
wait-free hazard pointer so it's equivalent to a proxy collector.

I've been doing reader lock-free too long.� I just realized how
trivially simple it is to do read write lock-free for almost any data structure.� That's slightly embarrassing.

:^) Indeed. But, its not a bad skill to have. Well, at least to me.

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

Who's Online
Recent Visitors
- Hiro
  Wed May 21 00:36:55 2025
  from Pennsylvania via Telnet
- Hiro
  Wed May 21 00:36:35 2025
  from Pennsylvania via Telnet
- Guest
  Wed Jun 4 14:17:37 2025
  from Sadf via Telnet
- Guest
  Wed Jun 4 14:05:02 2025
  from Lijhjkl via Telnet

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	8
Nodes:	8 (0 / 8)
Uptime:	106:24:10
Calls:	161
Files:	21,502
Messages:	78,622

Wait-free Hazard Pointers Using Std Atomics

Who's Online

Recent Visitors

System Info