Forum: d0p3 BBS

Re: a MSVC and glibc-compatible fmod()

From Bonita Montero@3:633/280.2 to All on Wed Mar 12 00:08:04 2025

Am 11.03.2025 um 13:55 schrieb Michael S:

But not when library routine does not use FPU. Or uses FPU only for comparison ops.

Then the library routine shout use fesetexcept() as you do it yourself.

The point is, it does not sound right if SNAN is *silently* converted
to QNAN. That type of conversion has to be loud i.e. accompanied by
setting of FE_INVALID.

Interestingly even conversion-operations from double to float do that.
That's not what I epected.
And there a some operations that should do that but actually keep the signalling bit zeroed like a sign-change since this is usually done with
a XOR-operation.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Wed Mar 12 00:46:47 2025

On Tue, 11 Mar 2025 14:55:43 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Tue, 11 Mar 2025 13:10:31 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:

Am 11.03.2025 um 12:34 schrieb Michael S:

Exactly. Both options are legal. MS's decision to not set
FE_INVALID is as good as glibc decision to set it.

If I do a SSE-/AVX-operation where either operand is a signalling
NaN I get a FE_INVALID; since the FPU behaves this way the MSVC
runtime should do that also.

BTW, what is the output of MS library in that case? SNAN or QNAN?

Results with SNaN parameters are always QNaN, that shoud be common
with any FPU.

But not when library routine does not use FPU. Or uses FPU only for comparison ops.
The point is, it does not sound right if SNAN is *silently* converted
to QNAN. That type of conversion has to be loud i.e. accompanied by
setting of FE_INVALID.

I tested. It appears that MSVC implementation made a mistake in
cases of fmod(snan, qnan):

MSVC gcc
x y result FE_INVALID) result FE_INVALID
snan 1 snan 0 qnan 1
snan 0 snan 0 qnan 1
snan inf snan 0 qnan 1
snan qnan qnan 0 !!! qnan 1
1 snan snan 0 qnan 1
0 snan snan 0 qnan 1
inf snan snan 0 qnan 1
qnan snan snan 0 qnan 1

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From James Kuyper@3:633/280.2 to All on Wed Mar 12 05:21:02 2025

On 3/11/25 06:29, Michael S wrote:
....

Pay attention that fmod() has no requirements w.r.t. to such exceptions
as FE_INEXACT, FE_UNDERFLOW and non-standard FE_DENORMAL.
Strictly speaking, even raising FE_OVERFLOW is not illegal, but doing
so would be bad quality of implementation.
Also spec does not say what happens to FE_INVALID when one of the
inputs is signalling NAN.

I've got Bonita killfiled, so the oldest message I can see on this
thread is one posted by you that indicated you were interested in IEEE
754 (==IOS/IEC 60559) conformance.

The C++ standard cross-references the C standard for such issues. The C standard specifies that, for an implementation which pre#defines __STDC__IEC_60559__BFP__, floating point exception handling is very
tightly specified for conformance with ISO/IEC 60559:

"The double version of fmod behaves as though implemented by
#include <math.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
double fmod(double x, double y)
{
double result;
result = remainder(fabs(x), (y = fabs(y)));
if (signbit(result)) result += y;
return copysign(result, x);
}
" (F10.7.1)

"Operations defined in 6.5 and functions and macros defined for the
standard libraries change floating-point status flags and control modes
just as indicated by their specifications (including conformance to IEC
60559). They do not change flags or modes (so as to be detectable by the
user) in any other cases." (F8.6)

"... signbit ... raise[s] no floating-point exceptions, even if an
argument is a signaling NaN." (F3p6)

"fabs(x) raises no floating-point exceptions, even if x is a signaling
NaN." (F10.4.3)

"— remainder(x, y) returns a NaN and raises the "invalid" floating-point exception for x infinite or y zero (and neither is a NaN)." (F.10.7.2)

"copysign(x, y) raises no floating-point exceptions, even if x or y is a signaling NaN." (F10.8.1)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Wed Mar 12 05:28:12 2025

On Mon, 10 Mar 2025 20:38:18 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Mon, 10 Mar 2025 19:00:06 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:

Your idea is really elegant

I'd rather call it "simple" or "straightforward". "Elegant" in my book
is something else. For example, the code above is closer to what I
consider elegant.
May be, later today or tomorrow, I'll show you solution that I
consider bright. Bright, but impractical.

Here, here!
A bright part is in lines 18 to 29. The rest are hopefully competent technicalities.

#include <string.h>
#include <stdint.h>
#include <math.h>
#include <fenv.h>

static uint64_t
umulrem(uint64_t x, uint64_t y, uint64_t den) {
#ifdef _MSC_VER
uint64_t hi, lo = _umul128(x, y, &hi);
uint64_t rem;
_udiv128(hi, lo, den, &rem);
return rem;
#else
return ((unsigned __int128)x * y) % den;
#endif
}

// Calculate mod(2**e, y) where y < 2**53
static uint64_t pow2_mod(uint64_t y, unsigned e) {
if (e < 64) {
uint64_t x = (uint64_t)1 << e;
if (x < y)
return x;
return x % y;
}
uint64_t x1 = pow2_mod(y, e/2);
uint64_t x2 = x1 << (e & 1);
return umulrem(x1, x2, y);
}

static double u2d(uint64_t u) {
double d;
memcpy(&d, &u, sizeof(d));
return d;
}

static uint64_t d2u(double d) {
uint64_t u;
memcpy(&u, &d, sizeof(u));
return u;
}

// raise FE_INVALID and return nan
static double raise_fe_invalid_ret_nan(double x)
{
const uint64_t SNAN_BITS = ~(1ull << 51);
double snan = u2d(SNAN_BITS);
#ifndef __clang__
return snan + x;
#else
volatile double v_snan = snan;
return v_snan + x;
#endif
}

double my_fmod(double x, double y)
{
const uint64_t INF_EXP = 2047;
const uint64_t INF2 = INF_EXP << 53;
const uint64_t HIDDEN_BIT = (uint64_t)1 << 52;
const uint64_t MANT_MASK = HIDDEN_BIT - 1;
const uint64_t SIGN_BIT = (uint64_t)1 << 63;

uint64_t ux = d2u(x);
uint64_t uy = d2u(y);
uint64_t ux2 = ux*2;
uint64_t uy2 = uy*2;
uint64_t sx = ux & SIGN_BIT;

// process non-finite x
if (ux2 >= INF2) { // x is inf or nan
if (ux2 > INF2) // x is nan
return x + y; // raises FE_INVALID when either x or y is sNAN
// x is inf
if (uy2 > INF2) // y is nan
return x + y;
// y is finite or inf
return raise_fe_invalid_ret_nan(x);
}
// x is finite

// process non-finite and zero y
if (uy2-1 >= INF2-1) { // y is inf or nan or 0
if (uy2 > INF2) // y is nan
return y;
// x is inf
if (uy2 == INF2) // y is inf
return x;
// y is 0
return raise_fe_invalid_ret_nan(x);
}

// y is finite non-zero
if (ux2 < uy2)
return x; // abs(x) < abs(y)

// extract mantissa and exponent
uint64_t mantX = (ux2 >> 1) & MANT_MASK;
uint64_t mantY = (uy2 >> 1) & MANT_MASK;
int expX = ux2 >> 53;
if (expX == 0) { // X subnormal
// Y is also subnormal, so we can use simple integer reduction
return u2d((mantX % mantY) | sx);
}

int expY = uy2 >> 53;
if (expY == 0) { // Y subnormal
mantY |= HIDDEN_BIT; // removed below
expY = 1;
}

mantY ^= HIDDEN_BIT;
mantX ^= HIDDEN_BIT;
if (mantX >= mantY) {
mantX -= mantY;
if (mantX >= mantY) // can happen when y is subnormal
mantX %= mantY;
}

int dExp = expX - expY;
uint64_t f = (dExp <= 63) ?
(uint64_t)1 << dExp : // quick path
pow2_mod(mantY, dExp); // slow path
mantX = umulrem(mantX, f, mantY);

// apply exponent of Y to mantX
uint64_t ures0 = ((uint64_t)expY << 52) | sx;
uint64_t ures = ures0 | (mantX & MANT_MASK);
if (mantX & HIDDEN_BIT)
ures0 = 0;
return u2d(ures) - u2d(ures0);
}

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Sun Mar 16 22:48:03 2025

On Tue, 11 Mar 2025 20:28:12 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Mon, 10 Mar 2025 20:38:18 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Mon, 10 Mar 2025 19:00:06 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:

Your idea is really elegant

I'd rather call it "simple" or "straightforward". "Elegant" in my
book is something else. For example, the code above is closer to
what I consider elegant.
May be, later today or tomorrow, I'll show you solution that I
consider bright. Bright, but impractical.

Here, here!
A bright part is in lines 18 to 29. The rest are hopefully competent technicalities.

And here is non-recursive implementation of the same algorithm that has following potentially useful properties:
1. It does not use compiler-specific extensions, only standard C.
2. It does not use FMA, so gives correct results on implementations
with broken fam(), like MSVC on pre-AVX computers.

#include <string.h>
#include <stdint.h>
#include <math.h>

static double u2d(uint64_t u) {
double d;
memcpy(&d, &u, sizeof(d));
return d;
}

static uint64_t d2u(double d) {
uint64_t u;
memcpy(&u, &d, sizeof(u));
return u;
}

// raise FE_INVALID and return nan
static double raise_fe_invalid_ret_nan(double x)
{
const uint64_t SNAN_BITS = ~(1ull << 51);
double snan = u2d(SNAN_BITS);
#ifndef __clang__
return snan + x;
#else
volatile double v_snan = snan;
return v_snan + x;
#endif
}

double my_fmod(double x, double y)
{
const uint64_t INF_EXP = 2047;
const uint64_t INF2 = INF_EXP << 53;
const uint64_t HIDDEN_BIT = (uint64_t)1 << 52;
const uint64_t MANT_MASK = HIDDEN_BIT - 1;
const uint64_t SIGN_BIT = (uint64_t)1 << 63;

uint64_t ux = d2u(x);
uint64_t uy = d2u(y);
uint64_t ux2 = ux*2;
uint64_t uy2 = uy*2;
uint64_t sx = ux & SIGN_BIT;

// process non-finite x
if (ux2 >= INF2) { // x is inf or nan
if (ux2 > INF2) // x is nan
return x + y; // raises FE_INVALID when either x or y is sNAN
// x is inf
if (uy2 > INF2) // y is nan
return x + y;
// y is finite or inf
return raise_fe_invalid_ret_nan(x);
}
// x is finite

// process non-finite and zero y
if (uy2-1 >= INF2-1) { // y is inf or nan or 0
if (uy2 > INF2) // y is nan
return y;
// x is inf
if (uy2 == INF2) // y is inf
return x;
// y is 0
return raise_fe_invalid_ret_nan(x);
}

// y is finite non-zero
if (ux2 < uy2)
return x; // abs(x) < abs(y)

// extract mantissa and exponent
int64_t mantX = ((ux2 >> 1) & MANT_MASK)+HIDDEN_BIT;
int64_t mantY = ((uy2 >> 1) & MANT_MASK)+HIDDEN_BIT;
int expX = ux2 >> 53;
int expY = uy2 >> 53;
unsigned dExp = expX - expY;
double ax = fabs(x);
double ay = fabs(y);
if (ax*0x1p-53 <= ay) {
// Quick path
int64_t d = (int64_t)(ax/ay);
if (expY == 0) { // Y subnormal
// don't normalize
mantY -= HIDDEN_BIT;
expY = 1;
if (expX == 0) { // X subnormal
mantX -= HIDDEN_BIT;
expX = 1;
}
dExp = expX - expY;
}
mantX = (mantX << dExp) - mantY*(uint64_t)d;
} else {
if (expY == 0) { // Y subnormal
// Normalize
uy = d2u(y*0x1p52);
mantY = (uy & MANT_MASK) | HIDDEN_BIT;
expY = ((int)(uy >> 52) & 2047) - 52;
dExp = expX - expY;
}
if (mantY == (int64_t)HIDDEN_BIT)
return u2d(sx); // Y is power of 2

// Calculate rem(2**dExp, mantY)
unsigned e0, n_steps;
for (n_steps = 0, e0 = dExp; e0 > 105; ++n_steps)
e0 /= 2;

double ry = 1.0/mantY;
uint64_t mantRy = (d2u(ry) & MANT_MASK) | HIDDEN_BIT;
uint64_t d = mantRy >> (105-e0);
int64_t f = (((uint64_t)1 << 52) << (e0-52)) - mantY*d;
// f = rem(2**e0, mantY) + a*mantY where -1 <= a <= 1
if (n_steps > 0) {
int next_bit = n_steps-1;
const uint64_t F_MAX = (uint64_t)3 << 54;
do {
double df = (double)f;
d = (int64_t)(df*df*ry);
f = (uint64_t)f*(uint64_t)f - mantY*d;
f <<= (dExp >> next_bit) & 1;
if (f+F_MAX > F_MAX*2)
f -= (int64_t)(f*ry)*mantY;
--next_bit;
} while (next_bit >= 0);
}
if (mantX >= mantY)
mantX -= mantY;
d = (int64_t)((double)f*(int64_t)mantX*ry);
mantX = (uint64_t)f*(uint64_t)mantX - mantY*(uint64_t)d;
}
while ((uint64_t)mantX >= (uint64_t)mantY) {
if (mantX < 0)
mantX += mantY;
else
mantX -= mantY;
}

if (expY <= 1) { // Y subnormal
mantX >>= 1 - expY;
return u2d(mantX | sx);
}

// Apply exponent of Y to mantX
uint64_t ures0 = ((uint64_t)expY << 52) | sx;
uint64_t ures = ures0 | (mantX & MANT_MASK);
if (mantX & HIDDEN_BIT)
ures0 = 0;
return u2d(ures) - u2d(ures0);
}

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

Who's Online
Recent Visitors
- vivise39
  Fri Aug 1 19:51:01 2025
  from Бахчисарай via HTTP
- brokenmind
  Mon Aug 11 04:03:15 2025
  from The Insane Asylum BBS via RLogin
- Guest
  Sat Aug 23 02:20:31 2025
  from Qc, Canada via Raw
- Guest
  Sun Aug 24 17:06:06 2025
  from Qc, Canada via Telnet

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	11
Nodes:	8 (0 / 8)
Uptime:	50:13:10
Calls:	166
Files:	21,502
Messages:	77,728

Re: a MSVC and glibc-compatible fmod()

Who's Online

Recent Visitors

System Info