open
https://gitlab.synchro.net/main/sbbs/-/issues/1098
# Bug Report: SEGV in JS_IsRunning when using load(true,...) from terminal sessions
## Summary
Background threads spawned via `load(true, ...)` from interactive terminal sessions crash with a segfault in `JS_IsRunning()`. The background thread's operation callback reads the parent thread's JSContext activation chain without synchronization, causing a use-after-free when the parent thread is actively executing JS code.
## Environment
- Synchronet v3.21a (compiled master/869dac47e, Sep 15 2025)
- Linux x86_64 (Debian)
- SpiderMonkey 1.8.5 (bundled)
## What We Were Trying To Do
We added an RSS headline ticker to our custom terminal shell. The ticker needs to periodically fetch an RSS feed via HTTP. Since `HTTPRequest.Get()` is synchronous and blocking, and there is no async HTTP mechanism in the Synchronet JS environment, we used `load(true, ...)` to spawn a background thread that performs the HTTP fetch and writes the parsed result back via the parent queue:
```javascript
// In the main terminal session thread:
var queue = load(true, 'ticker_fetch.js', feedUrl);
// ticker_fetch.js (background thread):
var http = new HTTPRequest();
http.timeout = 5;
var doc = http.Get(url);
// ... parse RSS ...
parent_queue.write(result);
```
This is the documented pattern for non-blocking operations in Synchronet JS.
## What Happens
The entire `sbbs` process crashes with a segfault. `dmesg` output:
```
sbbs/jsBackgrnd[PID]: segfault at fff8800000000001 ip 00007fXXXXXd1d41 sp 00007fXXXXXfbaf8 error 5 in libsbbs.so[...]
```
The crash always occurs at the same instruction: `JS_IsRunning + 0x21` (ELF virtual address `0x2d1d41` in libsbbs.so).
## Frequency
With the ticker fetching every 5 minutes during idle terminal sessions (screensaver running), we observed 10 crashes over 5 days. The crash is non-deterministic but highly reproducible under load.
## Root Cause Analysis
The crash occurs in `js_global.c`, line ~190, in the background thread's operation callback:
```c
static JSBool
js_OperationCallback(JSContext *cx)
{
// ...
if (bg->parent_cx != NULL && !JS_IsRunning(bg->parent_cx)) { /* die when parent dies */
JS_SetOperationCallback(cx, js_OperationCallback);
return JS_FALSE;
}
// ...
}
```
`JS_IsRunning()` walks the parent context's activation record chain — a linked list of `JSStackFrame` structures accessed via `cx->fp`:
```c
// From jsapi.cpp (SpiderMonkey 1.8.5):
JS_IsRunning(JSContext *cx)
{
StackFrame *fp = cx->fp(); // Read parent's current frame pointer
while (fp) {
if (!(fp->flags & JSFRAME_DUMMY))
return true;
fp = fp->down; // Walk the linked list
}
return false;
}
```
The disassembly at the crash site:
```
JS_IsRunning:
mov 0x68(%rdi),%rax ; cx->fp()
test %rax,%rax
je return_false
mov 0x10(%rax),%rax ; fp->down (or similar offset)
test %rax,%rax
jne loop_entry
jmp return_false
loop:
mov 0x20(%rax),%rax ; walk fp->down
test %rax,%rax
je return_false
loop_entry:
testb $0x4,(%rax) ; <-- CRASH HERE: dereference fp->flags
jne loop
mov $0x1,%eax
ret
```
The instruction `testb $0x4, (%rax)` dereferences `%rax`, which contains a NaN-boxed JS value instead of a valid `JSStackFrame*`. The faulting addresses across multiple crashes confirm this:
| Crash | Faulting address | NaN-box decode | |-------|-----------------|----------------|
| Mar 5 12:41 | `0xfff9000000000000` | NaN-box tag |
| Mar 8 01:08 | `0xfff8800000000001` | Integer 1 |
| Mar 9 12:34 | `0xfff8800000000001` | Integer 1 |
| Mar 9 20:48 | `0xfff8800000000001` | Integer 1 |
| Mar 9 22:41 | `0xfff8800000000001` | Integer 1 |
| Mar 10 01:23 | `0xfff8800000000002` | Integer 2 |
The parent thread's main loop runs screensaver animations, Frame operations, and timer callbacks — deeply nested JS function calls that push and pop activation records rapidly. The background thread walks this chain concurrently. When a stack frame is popped by the parent and its memory is reused for JS value storage, the background thread dereferences it as a `JSStackFrame*` and hits NaN-boxed data instead.
## Why It Doesn't Crash in json-service
`json-service.js` also uses `load(true, ...)` but runs in a service thread context, not a terminal session. Service threads sit in tight event loops with minimal activation chain depth changes, making the race window vanishingly small. Terminal sessions running UI code (frame animations, timer callbacks, nested function calls) churn the activation chain constantly.
## Suggested Fix
Replace the unsafe `JS_IsRunning(bg->parent_cx)` call with a thread-safe mechanism. For example, a shared flag:
```c
// In background_data_t:
volatile int parent_alive; // Set to 1 at creation, 0 when parent exits
// In js_OperationCallback:
if (bg->parent_cx != NULL && !bg->parent_alive) { // No cross-thread JSContext access
JS_SetOperationCallback(cx, js_OperationCallback);
return JS_FALSE;
}
// In parent cleanup (when the parent script finishes/exits):
parent_alive = 0;
```
This preserves the "die when parent dies" semantics without accessing another thread's JSContext internals.
The same pattern would apply to `js_log()` at line ~213 which calls `JS_GetContextPrivate(bg->parent_cx)` and `JS_CallFunctionName()` using the parent's context private data — also unsafe if the parent has exited.
## Workaround
We moved the HTTP fetch inline (synchronous in the main thread with a short timeout) and stopped using `load(true, ...)` from terminal sessions entirely. This eliminates the crash but means the UI blocks briefly during the fetch.
## Steps to Reproduce
1. From a terminal node session (telnet/SSH login, not jsexec), spawn a background thread:
```javascript
var q = load(true, 'bg_worker.js');
```
2. Have the background script loop doing lightweight work (so operation callbacks fire):
```javascript
// bg_worker.js
while (!js.terminated) {
for (var i = 0; i < 50000; i++) Math.random();
mswait(1);
}
parent_queue.write('done');
```
3. In the main thread, keep the JS activation chain churning with nested function calls:
```javascript
function a() { return b(); }
function b() { return c(); }
function c() { return Math.random(); }
var start = Date.now();
while (Date.now() - start < 60000) {
for (var i = 0; i < 100000; i++) a();
mswait(1);
}
```
4. The process will segfault within seconds to minutes. `dmesg` will show the crash in `jsBackgrnd` thread at `JS_IsRunning+0x21` in `libsbbs.so`.
## Crash Log (dmesg)
```
Mar 05 12:41:21 kernel: sbbs/jsBackgrnd[3073645]: segfault at fff9000000000000 ip 00007fbc7fed1d41 sp 00007fbc10dfbaf8 error 5 in libsbbs.so[7fbc7fcb2000+5f8000]
Mar 05 13:28:19 kernel: sbbs/jsBackgrnd[3094933]: segfault at fff880000000000c ip 00007f3b8a0d1d41 sp 00007f3b0dbf8af8 error 5 in libsbbs.so[7f3b89eb2000+5f8000]
Mar 06 02:31:20 kernel: sbbs/jsBackgrnd[3558994]: segfault at fffbff011a07c548 ip 00007f01736d1d41 sp 00007f00e7ffeaf8 error 5 in libsbbs.so[7f01734b2000+5f8000]
Mar 08 00:50:36 kernel: sbbs/jsBackgrnd[1235283]: segfault at fff880000000000c ip 00007f052f4d1d41 sp 00007f0488dfeaf8 error 5 in libsbbs.so[7f052f2b2000+5f8000]
Mar 08 01:08:48 kernel: sbbs/jsBackgrnd[1246731]: segfault at fff8800000000001 ip 00007f8a6bed1d41 sp 00007f8a055feaf8 error 5 in libsbbs.so[7f8a6bcb2000+5f8000]
Mar 09 10:49:45 kernel: sbbs/jsBackgrnd[2289335]: segfault at fffbff715f4f4630 ip 00007f71aacd1d41 sp 00007f7139bfaaf8 error 5 in libsbbs.so[7f71aaab2000+5f8000]
Mar 09 12:34:59 kernel: sbbs/jsBackgrnd[2358142]: segfault at fff8800000000001 ip 00007f034e0d1d41 sp 00007f02d7ffeaf8 error 5 in libsbbs.so[7f034deb2000+5f8000]
Mar 09 20:48:26 kernel: sbbs/jsBackgrnd[2655112]: segfault at fff8800000000001 ip 00007f334f4d1d41 sp 00007f334f2b2000+5f8000]
Mar 09 22:41:57 kernel: sbbs/jsBackgrnd[2708621]: segfault at fff8800000000001 ip 00007f1761ed1d41 sp 00007f17067fbaf8 error 5 in libsbbs.so[7f1761cb2000+5f8000]
Mar 10 01:23:49 kernel: sbbs/jsBackgrnd[2790105]: segfault at fff8800000000002 ip 00007f2e8aed1d41 sp 00007f2e115fcaf8 error 5 in libsbbs.so[7f2e8acb2000+5f8000]
```
Note: Every crash is at the identical instruction offset within `JS_IsRunning`, confirming a single deterministic code path.
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)