• Wait-free Hazard Pointers Using Std Atomics

    From jseigh@3:633/280.2 to All on Tue May 27 07:58:58 2025
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers-using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea. The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Joe Seigh

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Tue May 27 09:17:24 2025
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.ÿ The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Interesting, thanks Joe! Fwiw, I think, something like FlushProcessWriteBuffers() should be in a new C++ std? Perhaps, allow a
    system to use DWCAS without resorting to not lock-free wrt the checks?
    Damn it. ;^o Ouch.

    https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-flushprocesswritebuffers

    Sometimes I think that function was specifically designed for exotic
    RCU, ect...

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From jseigh@3:633/280.2 to All on Wed May 28 02:26:10 2025
    On 5/26/25 19:17, Chris M. Thomasson wrote:
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
    pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.ÿ The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Interesting, thanks Joe! Fwiw, I think, something like FlushProcessWriteBuffers() should be in a new C++ std? Perhaps, allow a system to use DWCAS without resorting to not lock-free wrt the checks?
    Damn it. ;^o Ouch.

    https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/ nf-processthreadsapi-flushprocesswritebuffers

    Sometimes I think that function was specifically designed for exotic
    RCU, ect...


    It's basically the equivalent of linux membarrier(). I don't know
    why they just focus on writes, it's a full membar AFAIK.

    As far as the wait-free hazard pointer goes, I though there would
    be forward progress issues with the reader threads but it's not
    an issue other schemes like URCU has with quiescent states.

    Joe Seigh

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From jseigh@3:633/280.2 to All on Thu May 29 04:09:05 2025
    On 5/26/25 17:58, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

    Added an additional post with a bit more explanation.

    Joe Seigh

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Thu May 29 08:34:05 2025
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.ÿ The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Joe Seigh

    Actually wrt:
    _______________
    hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
    T* local = ptr.load(std::memory_order_relaxed);
    hz_ptr.store(local, std::memory_order_relaxed);
    _______________

    For some damn reason, it kind of reminds me of an older way to use only
    single word exchange to push into a lock-free stack.

    Iirc, something like this pseudo code for push:
    _____________
    cur->next = WAIT_STATE;
    node* prev = exchange(head, cur);
    cur->next = prev;
    _____________

    There is a "window" in there where cur->next will be WAIT_STATE. So,
    when a thread iterating the list, say after pop all, flush, it can do a
    couple of spins or do something else during iteration on a cur->next
    being in WAIT_STATE. It's a way to get exchange on push and pop of a stack.

    The window is very small. I am having trouble finding on this group
    where I posted about it before in a working program...


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From jseigh@3:633/280.2 to All on Fri May 30 07:17:08 2025
    On 5/28/25 18:34, Chris M. Thomasson wrote:
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
    pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.ÿ The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Joe Seigh

    Actually wrt:
    _______________
    hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
    T* local = ptr.load(std::memory_order_relaxed);
    hz_ptr.store(local, std::memory_order_relaxed);
    _______________

    For some damn reason, it kind of reminds me of an older way to use only single word exchange to push into a lock-free stack.

    Iirc, something like this pseudo code for push:
    _____________
    cur->next = WAIT_STATE;
    node* prev = exchange(head, cur);
    cur->next = prev;
    _____________

    There is a "window" in there where cur->next will be WAIT_STATE. So,
    when a thread iterating the list, say after pop all, flush, it can do a couple of spins or do something else during iteration on a cur->next
    being in WAIT_STATE. It's a way to get exchange on push and pop of a stack.

    The window is very small. I am having trouble finding on this group
    where I posted about it before in a working program...


    Push may wait-free but pop isn't event lock-free.

    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer. The lock-free version is
    more than performant.

    Joe Seigh





    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Jun 2 06:19:32 2025
    On 5/29/2025 2:17 PM, jseigh wrote:
    On 5/28/25 18:34, Chris M. Thomasson wrote:
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
    pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.ÿ The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Joe Seigh

    Actually wrt:
    _______________
    hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
    T* local = ptr.load(std::memory_order_relaxed);
    hz_ptr.store(local, std::memory_order_relaxed);
    _______________

    For some damn reason, it kind of reminds me of an older way to use
    only single word exchange to push into a lock-free stack.

    Iirc, something like this pseudo code for push:
    _____________
    cur->next = WAIT_STATE;
    node* prev = exchange(head, cur);
    cur->next = prev;
    _____________

    There is a "window" in there where cur->next will be WAIT_STATE. So,
    when a thread iterating the list, say after pop all, flush, it can do
    a couple of spins or do something else during iteration on a cur->next
    being in WAIT_STATE. It's a way to get exchange on push and pop of a
    stack.

    The window is very small. I am having trouble finding on this group
    where I posted about it before in a working program...


    Push may wait-free but pop isn't event lock-free.

    Well, the fast path of the "special" stack pop is a simple exchange, so
    that action should be wait-free? I need to code this up, but I am
    thinking along the lines of, wrt integrating the single exchange push
    with the futex, might be something like this crude pseudo-code. I just
    need to implement it.

    cur->next = WAIT_STATE;

    node* prev = head.exchange(cur);

    if (! prev || prev == WAIT_FUTEX)
    {
    cur->next = nullptr;
    }

    else
    {
    cur->next = prev;
    }

    // now, wait state is cleared.

    if (prev == WAIT_FUTEX)
    signal_futex;


    Now during iteration on flush, we can detect a WAIT_STATE while
    iterating through the flushed stack. It might work. Just is just off the
    top of my head.



    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.ÿ The lock-free version is
    more than performant.

    Joe Seigh






    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Jun 2 06:31:20 2025
    On 5/29/2025 2:17 PM, jseigh wrote:
    On 5/28/25 18:34, Chris M. Thomasson wrote:
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
    pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.ÿ The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Joe Seigh

    Actually wrt:
    _______________
    hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
    T* local = ptr.load(std::memory_order_relaxed);
    hz_ptr.store(local, std::memory_order_relaxed);
    _______________

    For some damn reason, it kind of reminds me of an older way to use
    only single word exchange to push into a lock-free stack.

    Iirc, something like this pseudo code for push:
    _____________
    cur->next = WAIT_STATE;
    node* prev = exchange(head, cur);
    cur->next = prev;
    _____________

    There is a "window" in there where cur->next will be WAIT_STATE. So,
    when a thread iterating the list, say after pop all, flush, it can do
    a couple of spins or do something else during iteration on a cur->next
    being in WAIT_STATE. It's a way to get exchange on push and pop of a
    stack.

    The window is very small. I am having trouble finding on this group
    where I posted about it before in a working program...


    Push may wait-free but pop isn't event lock-free.

    Well, yeah in a sense, keep in mind that the fast path for the pop is a
    single exchange. I guess it depends on how we process the nodes with a
    stack that can have a window for a next pointer to be in a WAIT_STATE
    wrt the fun goal (experiment) of having exchange for push and exchange
    for pop, no CAS, let alone DWCAS. One way, is instead of a worker thread spinning on a WAIT_STATE during iteration, the node has another next
    pointer so the thread can link the waits into a per-thread stack. Then
    try something else, or switch processing to another stack. It pushes the
    node with a WAIT_STATE into a per-thread stack, then tries another stack
    in the distributed multi-stack. If it does execute a unit of work, that
    right there can be a natural backoff to trigger it to try the per-thread
    stack of wait states. However, this would make a node have two node
    links instead of one. It just might be fun to mock up and run.

    Actually, I like my futex stack experiment as is. normal CAS for push,
    SWAP for pop. It needs to have multiple stacks and distribute across
    them because popping all from a single stack can mean a thread gets too
    much work. Push lock-free, pop wait-free.




    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.ÿ The lock-free version is
    more than performant.

    Joe Seigh






    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From jseigh@3:633/280.2 to All on Tue Jun 3 08:07:22 2025
    On 6/1/25 16:31, Chris M. Thomasson wrote:
    On 5/29/2025 2:17 PM, jseigh wrote:>>>

    Push may wait-free but pop isn't event lock-free.



    Actually, I like my futex stack experiment as is. normal CAS for push,
    SWAP for pop. It needs to have multiple stacks and distribute across
    them because popping all from a single stack can mean a thread gets too
    much work. Push lock-free, pop wait-free.


    Well you actually have a lot of leeway in concurrent queue semantics.
    Drop the FIFO requirements since they're not really observable for
    all practical purposes and just concentrate on throughput forward
    progress.

    And wait-free is highly overrated. It seems to be important only in
    theses, technical papers and obscure blogs, not so much is real world applications which is one reason I not going to implement it.


    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.ÿ The lock-free version is
    more than performant.

    I did post on my blog how to improve reclaim forward progress for the
    wait-free hazard pointer so it's equivalent to a proxy collector.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Jun 4 05:09:09 2025
    On 6/2/2025 3:07 PM, jseigh wrote:
    On 6/1/25 16:31, Chris M. Thomasson wrote:
    On 5/29/2025 2:17 PM, jseigh wrote:>>>

    Push may wait-free but pop isn't event lock-free.



    Actually, I like my futex stack experiment as is. normal CAS for push,
    SWAP for pop. It needs to have multiple stacks and distribute across
    them because popping all from a single stack can mean a thread gets
    too much work. Push lock-free, pop wait-free.


    Well you actually have a lot of leeway in concurrent queue semantics.
    Drop the FIFO requirements since they're not really observable for
    all practical purposes and just concentrate on throughput forward
    progress.

    And wait-free is highly overrated. It seems to be important only in
    theses, technical papers and obscure blogs, not so much is real world applications which is one reason I not going to implement it.

    Perhaps. However, using a loop-free exchange is "faster" than a loop
    based CAS? On Intel, XCHG vs a CMPXCHG loop?




    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.ÿ The lock-free version is
    more than performant.

    I did post on my blog how to improve reclaim forward progress for the wait-free hazard pointer so it's equivalent to a proxy collector.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Tue Jun 10 18:59:21 2025
    On 6/3/2025 12:09 PM, Chris M. Thomasson wrote:
    On 6/2/2025 3:07 PM, jseigh wrote:
    On 6/1/25 16:31, Chris M. Thomasson wrote:
    On 5/29/2025 2:17 PM, jseigh wrote:>>>

    Push may wait-free but pop isn't event lock-free.



    Actually, I like my futex stack experiment as is. normal CAS for
    push, SWAP for pop. It needs to have multiple stacks and distribute
    across them because popping all from a single stack can mean a thread
    gets too much work. Push lock-free, pop wait-free.


    Well you actually have a lot of leeway in concurrent queue semantics.
    Drop the FIFO requirements since they're not really observable for
    all practical purposes and just concentrate on throughput forward
    progress.

    And wait-free is highly overrated. It seems to be important only in
    theses, technical papers and obscure blogs, not so much is real world
    applications which is one reason I not going to implement it.

    Perhaps. However, using a loop-free exchange is "faster" than a loop
    based CAS? On Intel, XCHG vs a CMPXCHG loop?




    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.ÿ The lock-free version is
    more than performant.

    I did post on my blog how to improve reclaim forward progress for the
    wait-free hazard pointer so it's equivalent to a proxy collector.



    Ummm. This is a working lock-free stack from an AI, grok. Oh wow.

    // Lock-free stack for fallback
    template<typename T>
    class LockFreeStack {
    private:
    struct Node {
    T data;
    std::atomic<Node*> next;
    Node(const T& d) : data(d), next(nullptr) {}
    };
    std::atomic<Node*> head;

    public:
    LockFreeStack() : head(nullptr) {}

    void Push(T data) {
    Node* new_node = new Node(data);
    Node* old_head = head.load();
    do {
    new_node->next = old_head;
    } while (!head.compare_exchange_weak(old_head, new_node));
    }

    bool Pop(T& data) {
    Node* old_head = head.load();
    while (old_head && !head.compare_exchange_weak(old_head, old_head->next)) {}
    if (old_head) {
    data = old_head->data;
    delete old_head;
    return true;
    }
    return false;
    }
    };

    WTF!!!!!

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From jseigh@3:633/280.2 to All on Mon Jun 23 21:03:13 2025
    On 6/3/25 15:09, Chris M. Thomasson wrote:
    On 6/2/2025 3:07 PM, jseigh wrote:
    On 6/1/25 16:31, Chris M. Thomasson wrote:
    On 5/29/2025 2:17 PM, jseigh wrote:>>>

    Push may wait-free but pop isn't event lock-free.



    Actually, I like my futex stack experiment as is. normal CAS for
    push, SWAP for pop. It needs to have multiple stacks and distribute
    across them because popping all from a single stack can mean a thread
    gets too much work. Push lock-free, pop wait-free.


    Well you actually have a lot of leeway in concurrent queue semantics.
    Drop the FIFO requirements since they're not really observable for
    all practical purposes and just concentrate on throughput forward
    progress.

    And wait-free is highly overrated. It seems to be important only in
    theses, technical papers and obscure blogs, not so much is real world
    applications which is one reason I not going to implement it.

    Perhaps. However, using a loop-free exchange is "faster" than a loop
    based CAS? On Intel, XCHG vs a CMPXCHG loop?




    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.ÿ The lock-free version is
    more than performant.

    I did post on my blog how to improve reclaim forward progress for the
    wait-free hazard pointer so it's equivalent to a proxy collector.


    I've been doing reader lock-free too long. I just realized how
    trivially simple it is to do read write lock-free for almost any data structure. That's slightly embarrassing.

    Joe Seigh

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Jun 27 15:26:53 2025
    On 6/23/2025 4:03 AM, jseigh wrote:
    On 6/3/25 15:09, Chris M. Thomasson wrote:
    On 6/2/2025 3:07 PM, jseigh wrote:
    On 6/1/25 16:31, Chris M. Thomasson wrote:
    On 5/29/2025 2:17 PM, jseigh wrote:>>>

    Push may wait-free but pop isn't event lock-free.



    Actually, I like my futex stack experiment as is. normal CAS for
    push, SWAP for pop. It needs to have multiple stacks and distribute
    across them because popping all from a single stack can mean a
    thread gets too much work. Push lock-free, pop wait-free.


    Well you actually have a lot of leeway in concurrent queue semantics.
    Drop the FIFO requirements since they're not really observable for
    all practical purposes and just concentrate on throughput forward
    progress.

    And wait-free is highly overrated. It seems to be important only in
    theses, technical papers and obscure blogs, not so much is real world
    applications which is one reason I not going to implement it.

    Perhaps. However, using a loop-free exchange is "faster" than a loop
    based CAS? On Intel, XCHG vs a CMPXCHG loop?




    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.ÿ The lock-free version is
    more than performant.

    I did post on my blog how to improve reclaim forward progress for the
    wait-free hazard pointer so it's equivalent to a proxy collector.


    I've been doing reader lock-free too long.ÿ I just realized how
    trivially simple it is to do read write lock-free for almost any data structure.ÿ That's slightly embarrassing.

    :^) Indeed. But, its not a bad skill to have. Well, at least to me.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)