Forum: d0p3 BBS

Re: _BitInt(N)

From Philipp Klaus Krause@3:633/10 to All on Sun Nov 23 12:46:10 2025

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do. Also
being able to use bit-fields wider than int.

Saving memory for two reasons:

* On small embedded systems where there is very little memory
* For code that needs to be very fast on big systems to make data
structures fit into cache

Philipp

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Sun Nov 23 13:59:59 2025

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor that
might actually have a 22-bit hardware types.

Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow semantics).

Such as working out how pointers to them will work.

Also
being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and int256_t.
Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type

So what is the result type of multiplying values of those two types?

Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Sun Nov 23 17:06:54 2025

On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.

Implementing such odd-size types on regular 8/16/32/64-bit hardware
is full of problems if you want to do it without padding (in order to
get the savings). On even with padding (to get the desired overflow semantics).

Such as working out how pointers to them will work.

Also
being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

* There appears to be no upper limit on size, so _BitInt(2997901) is
a valid type

Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets) appears
to be 2**16 or 2**16-1. I don't remember which one.

So what is the result type of multiplying values of those two types?

I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically unsatisfactory, but consistent with the rest of language.
And practically sufficient, because C programmers are already accustomed
to write statements like:
uint64_t foo(uint32_t x, uint16 y) { return (uint64_t)x*y; }

So it would be natural for them to write:
_BitInt(1536) foo(_BitInt(1024) x, _BitInt(512) y) {
return _BitInt(1536)x*y;
}

Since the pattern is so common already, optimizing compiler is likely to understand the meaning and generate only necessary calculations.
Or, at least, to not generate too much of unnecessary calculations.

Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be
implemented anyway), where the precision is a runtime attribute.

I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part
of the data, is not allowed. Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on
arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Sun Nov 23 14:38:17 2025

bart <bc@freeuk.com> writes:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.

What rationale are you referring to? There hasn't been an official ISO
C Rationale document since C99.

Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).

Such as working out how pointers to them will work.

Why would pointers to _BitInt types be a problem? A _BitInt object is
a fixed-size chunk of memory, similar to a struct object.

Also being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

Why is that a problem? If you don't want odd-sized types, don't use them.

* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type

The upper limit is specified by the implementation as BITINT_MAXWIDTH, a
macro defined in <limits.h>.

For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).

clang seems to have some problems with _BitInt(8388608). For example,
this program:

#include <limits.h>

_BitInt(BITINT_MAXWIDTH) n = 42;

int main(void) {
n *= n;
}

takes a *long* time to compile with clang. I believe it's generating
inline code to do the 8388608 by 8388608 bit multiplication.

So what is the result type of multiplying values of those two types?

_BitInt types are exempt from the integer promotion rules (so _BitInt(3) doesn't promote to int), but the usual arithmetic conversions apply.
If you multiply values of two _BitInt types, the result is the wider of
the two types.

N3220 is a draft of C23.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf

Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.

_BitInt(n) objects are fixed-size. Addition and subtraction should be
fairly straightforward. For multiplication and division, gcc generates
calls to __mulbitint3 and __divmodbitint4, and clang generates huge
amounts of inline code. My guess is that future llvm/clang releases
will handle _BitInt types more efficiently.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 00:30:46 2025

On 23/11/2025 22:38, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.

What rationale are you referring to? There hasn't been an official ISO
C Rationale document since C99.

See Introduction and Rationale here:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).

Such as working out how pointers to them will work.

Why would pointers to _BitInt types be a problem? A _BitInt object is
a fixed-size chunk of memory, similar to a struct object.

Saving memory was mentioned. To achieve that means having bitfields that
may not start at bit 0 of a byte, and may cross byte- or word-boundaries.

For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
but storing packed values means it would use only 625K bytes.

Anyway, pointers to individual values, or to some arbitrary element or
slice of such an array, would need some extra info.

Also being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

Why is that a problem? If you don't want odd-sized types, don't use them.

It is an unnecessary complication. There will be a lot of extra rules
that maybe partly 'implementation defined', so behaviour may vary. And
people WILL uses those types because they are there, and likely they
will be inefficient.

What happens when a 391-bit type, even unsigned, overflows? These larger
types are likely to use a multiple of 64-bits, and for 391 bits will
need 7 x 64 bits, of which the last word will have 57 bits of padding.
It's very messy.

Specifying a multiple of 64 bits is better; a power of two even better.

* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type

The upper limit is specified by the implementation as BITINT_MAXWIDTH, a macro defined in <limits.h>.

For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).

clang seems to have some problems with _BitInt(8388608). For example,
this program:

#include <limits.h>

_BitInt(BITINT_MAXWIDTH) n = 42;

int main(void) {
n *= n;
}

takes a *long* time to compile with clang. I believe it's generating
inline code to do the 8388608 by 8388608 bit multiplication.

Now try it with two disparate sizes.

So what is the result type of multiplying values of those two types?

_BitInt types are exempt from the integer promotion rules (so _BitInt(3) doesn't promote to int), but the usual arithmetic conversions apply.
If you multiply values of two _BitInt types, the result is the wider of
the two types.

So multiplying even two one-million-bit types could overflow!

Such limits for /fixed-width/ integers are ridiculous.

You might say this is no different from defining an array of exactly
123,456 elements. But the use-cases are very different.

I starting going into details but I guess you don't care about such
matters or whether the feature makes much sense.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Sun Nov 23 21:39:59 2025

On 11/23/2025 7:59 AM, bart wrote:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor that might actually have a 22-bit hardware types.

Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow semantics).

Such as working out how pointers to them will work.

In BGBCC, for any size <= 256 bits, it is padded to the next power-of-2
size. Although if the size is NPOT, some extra handling exists to
mask/extend it to the requested size.

Sizes larger than 256 are padded to the next multiple of 128 bits.

IIRC, GCC and Clang use smaller padding, but not looked into it.

Also
being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and int256_t.
Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type

In BGBCC, there is a hard limit of IIRC 16384 bits.

As an extension, it also allows for very large literals, though
currently literals larger than 128 bits can only use hexadecimal or similar.

This is encoded via suffixes, eg:
I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
I128, UI128: 128-bit
I256, UI256: 256-bit
other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).

Larger decimal numbers could be supported, but for now I don't have a
strong need for decimal literals beyond 128 bits.

Implicitly, there is a limit of around 1K bits for literals mostly due
to normal tokens having a limit of 255 characters. Compound string
literals have a higher limit of 4096 (standard) or 65536 (implementation).

Possibly, as a little bit of wonk, internally large literals are
implemented in the compiler on top of Base85 strings.

Where, say, for integer literals:
48 bits or less: Stored directly in compiler-specific tagrefs;
49-64 bits: Encoded via an index into a lookup table.
65-128 bits: Split into a pair of 64-bit chunks as indices into a lookup table;
129+: String cosplaying as an integer literal.

So what is the result type of multiplying values of those two types?

Typically the max of either input type...

Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.

Disagree, this would open up a whole big mess.

Can't have this for similar reasons to why one doesn't have
variable-sized structs.

Decided to leave out the whole VLA mess.
Better to just pretend VLAs don't exist.

...

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 10:29:30 2025

On 23/11/2025 16:06, Michael S wrote:

On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.

Implementing such odd-size types on regular 8/16/32/64-bit hardware
is full of problems if you want to do it without padding (in order to
get the savings). On even with padding (to get the desired overflow
semantics).

Such as working out how pointers to them will work.

Also
being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

* There appears to be no upper limit on size, so _BitInt(2997901) is
a valid type

Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets) appears
to be 2**16 or 2**16-1. I don't remember which one.

So what is the result type of multiplying values of those two types?

I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically unsatisfactory, but consistent with the rest of language.

There is one key difference between the _BitInt() types and other
integer types - with _BitInt(), there are no automatic promotions to
other integer types. Thus if you are using _BitInt() operands in an arithmetic expression, these are not promoted to "int" or "unsigned int"
even if they are smaller (lower rank). If you mix _BitInt()'s of
different sizes, then the smaller one is first converted to the larger
type. And if _BitInt(N) is mixed with unsigned _BitInt(N), that will
mean the signed operand is converted to an unsigned _BitInt(N) -
something that I think is "arithmetically unsatisfactory", as you put it.

And practically sufficient, because C programmers are already accustomed
to write statements like:
uint64_t foo(uint32_t x, uint16 y) { return (uint64_t)x*y; }

So it would be natural for them to write:
_BitInt(1536) foo(_BitInt(1024) x, _BitInt(512) y) {
return _BitInt(1536)x*y;
}

Since the pattern is so common already, optimizing compiler is likely to understand the meaning and generate only necessary calculations.
Or, at least, to not generate too much of unnecessary calculations.

Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be
implemented anyway), where the precision is a runtime attribute.

I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part
of the data, is not allowed.

Correct. _BitInt(N) is a signed integer type with precisely N value
bits. It can have padding bits if necessary (according to the target
ABI), but it can't have any other information.

Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.

Yes, exactly. At the call site, the size of the _BitInt type is always
a known compile-time constant, so it can easily be passed on. Thus :

_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;

can be implemented as something like :

__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 11:17:05 2025

On 24/11/2025 09:29, David Brown wrote:

On 23/11/2025 16:06, Michael S wrote:

On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:

So what is the result type of multiplying values of those two types?

I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically
unsatisfactory, but consistent with the rest of language.

There is one key difference between the _BitInt() types and other
integer types - with _BitInt(), there are no automatic promotions to
other integer types.� Thus if you are using _BitInt() operands in an arithmetic expression, these are not promoted to "int" or "unsigned int" even if they are smaller (lower rank).� If you mix _BitInt()'s of
different sizes, then the smaller one is first converted to the larger
type.

I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part
of the data, is not allowed.

Correct.� _BitInt(N) is a signed integer type with precisely N value
bits.� It can have padding bits if necessary (according to the target
ABI), but it can't have any other information.

Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on
arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.

Yes, exactly.� At the call site, the size of the _BitInt type is always
a known compile-time constant, so it can easily be passed on.� Thus :

��_BitInt(N) x;
��_BitInt(M) y;
��_BitInt(NM) z = x * y;

So what is NM here; is it N*M (the potential maximum size of the
result), or max(N, M)?

It sounds like the max precision you get will be the latter.

can be implemented as something like :

��__bit_int_signed_mult(NM, (unsigned char *) &z,
�� N, (const unsigned char *) &x,
�� M, (const unsigned char *) &y);

How would you write a generic user function that operates on any size
BitInt? For example:

_BitInt(?) bi_square(_BitInt(?));

Even if you passed the size as a parameter, there would be a problem
with the BitInt type.

This assumes BitInts are passed and returned by value, but even using
BitInt* wouldn't help.

This sets it apart from arrays, where you also define very large, fixed
size arrays, but can use a T(*)[] type to write generic functions, that
take an additional length parameter.

This will be for a particular T, but for BitInt, T is also fixed; it
happens to be an implicit bit type.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 12:17:58 2025

On 24/11/2025 01:30, bart wrote:

On 23/11/2025 22:38, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.

What rationale are you referring to?� There hasn't been an official ISO
C Rationale document since C99.

See Introduction and Rationale here:

�https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

That's a proposal document, rather than that actual C standard. But it
is useful and relevant here, and explains some of the potential uses of _BitInt.

Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).

Such as working out how pointers to them will work.

Why would pointers to _BitInt types be a problem?� A _BitInt object is
a fixed-size chunk of memory, similar to a struct object.

Saving memory was mentioned. To achieve that means having bitfields that
may not start at bit 0 of a byte, and may cross byte- or word-boundaries.

No, that is incorrect.

The proposal mentions saving /space/ as relevant in FPGAs - not saving /memory/. The authors use-case here is in writing code that can be
compiled with a "normal" C compiler on a "normal" target, and also
compiled to FPGA /hardware/, with the same semantics. In hardware, a
5-bit by 5-bit single-cycle multiplier is very much smaller than an
8-bit by 8-bit multiplier, and orders of magnitude smaller than if the
5-bit integers are promoted to 32-bit before multiplying.

The proposal is not about saving /memory/. It specifically says that a _BitInt(N) has the same size and alignment as the smallest basic type
that can contain it, until you get to N greater than 64-bit, in which
they are contained in an array of int64_t. (The reality is a little
more formal, to handle targets that have other sizes of their basic types.)

So on a "normal" target, a _BitInt(3) is the same size and alignment as
a uint8_t, a _BitInt(35) is effectively contained in an uint64_t, and an
array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or 128
bits, not 68 bits.

As far as I can see, the C23 standard does not specify these details,
and leaves them up to the target ABI. But at the very least, they will
always take an integer number of bytes - unsigned char. There can never
be any crossing of byte boundaries.

I expect most "big" implementations to follow the proposals
recommendation with containers of 8, 16, 32 and 64 bits, then arrays of
64-bit chunks after that. I expect some smaller targets to be a bit
more flexible - 8-bit embedded targets are likely to use 8-bit chunks
for everything, and 16-bit and 32-bit devices will use 16-bit and 32-bit chunks. I have not yet looked for implementations in order to check this.

Compilers targeting FPGA hardware generation are, by their nature, weird
in many ways. They will generate N-bit wide logic and registers for
local data and expressions. How they implement things like arrays in
memory will probably be very specialised - these are not tools you use
with arbitrary C code, and almost everything is specially written.

For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
but storing packed values means it would use only 625K bytes.

Anyway, pointers to individual values, or to some arbitrary element or
slice of such an array, would need some extra info.

Also being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

Why is that a problem?� If you don't want odd-sized types, don't use
them.

It is an unnecessary complication. There will be a lot of extra rules
that maybe partly 'implementation defined', so behaviour may vary. And people WILL uses those types because they are there, and likely they
will be inefficient.

Why? And why do you talk specifically about odd numbers? I can
understand your concern about packing arrays of _BitInts that are not multiples of 8, though I hope you now understand that it is not the
problem you thought it was. However, I see no reason to suppose that _BitInt(5) is any more or less "complicated" than _BitInt(6) just
because 5 is an odd number!

A major point of the _BitInt concept is to be able to specify and use
integers of specific explicit sizes in a way that is as implementation independent as possible. Some aspects of the implementation cannot be
avoided - such as the size of unsigned char and alignment and padding
for storage. But the behaviour of the types is entirely independent of
the implementation. There are no "extra rules" - neither for specific implementations, nor for specific sizes of _BitInt's.

Efficiency of implementation is, of course, up to the implementation.
But there is absolutely no reason to suppose that working with a _BitInt
of size up to the implementation's maximum integer type is going to be
less efficient than using other types and masking. For larger
_BitInt's, there are different possible implementation strategies with different pros and cons in regard to efficiency.

What happens when a 391-bit type, even unsigned, overflows? These larger types are likely to use a multiple of 64-bits, and for 391 bits will
need 7 x 64 bits, of which the last word will have 57 bits of padding.
It's very messy.

It is not messy at all. Signed integer overflow is UB, unsigned integer overflow is wrapping. It's the same as always, and could not be
simpler, clearer or neater.

Specifying a multiple of 64 bits is better; a power of two even better.

You can pick _BitInt sizes as you want - if you want a power of two or multiple of 64, use that. You get exactly the same overflow behaviour.

* There appears to be no upper limit on size, so _BitInt(2997901) is a
�� valid type

The upper limit is specified by the implementation as BITINT_MAXWIDTH, a
macro defined in <limits.h>.

For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).

clang seems to have some problems with _BitInt(8388608).� For example,
this program:

#include <limits.h>

_BitInt(BITINT_MAXWIDTH) n = 42;

int main(void) {
�� n *= n;
}

takes a *long* time to compile with clang.� I believe it's generating
inline code to do the 8388608 by 8388608 bit multiplication.

Now try it with two disparate sizes.

I think compiler implementations would do well to pick a max width that
is a more realistic for real-world use-cases. (And having 2**16 - 1
instead of 2**16 seems very strange to me.) Even more important for efficiency is to make a distinction between what sizes work well for
inline code, and what should use more generic library code.

So what is the result type of multiplying values of those two types?

_BitInt types are exempt from the integer promotion rules (so _BitInt(3)
doesn't promote to int), but the usual arithmetic conversions apply.
If you multiply values of two _BitInt types, the result is the wider of
the two types.

So multiplying even two one-million-bit types could overflow!

Such limits for /fixed-width/ integers are ridiculous.

Um, I think you might want to re-read and re-phrase that. When you have fixed-width integers, you have a finite range. Try to go beyond that,
and you have arithmetic overflow. There is no alternative for
fixed-width integers. It doesn't matter if your integers are 8-bit or a million bits. Integer systems that don't have overflow need arbitrary precision - dynamic allocation for different sizes.

You might say this is no different from defining an array of exactly
123,456 elements. But the use-cases are very different.

I starting going into details but I guess you don't care about such
matters or whether the feature makes much sense.

I am not sure what you mean by that.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Mon Nov 24 13:44:41 2025

On Mon, 24 Nov 2025 12:17:58 +0100
David Brown <david.brown@hesbynett.no> wrote:

The proposal is not about saving /memory/. It specifically says that
a _BitInt(N) has the same size and alignment as the smallest basic
type that can contain it, until you get to N greater than 64-bit, in
which they are contained in an array of int64_t. (The reality is a
little more formal, to handle targets that have other sizes of their
basic types.)

That is a bit unfortunate.
Compiler support for arrays of 17 to 24bit numbers packed as 3 octet
per item would have been handy. And not hard at all for compiler to
implement, at least on architectures that has proper support for
unaligned access, like x86, POWER, Arm and RISC-V.

I certainly have real-world applications that use packed arrays like
that. They could have been written in cleaner and less error-prone
way if such feature was available.

I suppose, packed numeric arrays with 5, 6 or 7 octets per item are also
used by some people, although they are probably less common than my
case.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 11:45:18 2025

On 24/11/2025 03:39, BGB wrote:

On 11/23/2025 7:59 AM, bart wrote:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.

Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).

Such as working out how pointers to them will work.

In BGBCC, for any size <= 256 bits, it is padded to the next power-of-2 size. Although if the size is NPOT, some extra handling exists to mask/ extend it to the requested size.

There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.

I had been responding to the claim that those smaller types save memory, compared to using sizes 8/16/32 bits which are commonly available and
have better hardware support.

But if a _BitInt(17) is rounded up to 32 bits, there's not going to be
any saving!

Here, I wouldn't use the type system at all to define odd-sized fields.
C already has bitfields within structs, that can be used to efficiently
pack odd-sized data. But they have lots of restrictions, and are not an independent type.

(In my stuff, I do the same, but with more control. I also have bitfield-operators that work within ordinary integers. And, in one
language, arrays of 1/2/4 bits. But again none of these bitfields of
sub-byte elements are proper types, although those u1/u2/u4 elements
come close.)

In BGBCC, there is a hard limit of IIRC 16384 bits.

As an extension, it also allows for very large literals, though
currently literals larger than 128 bits can only use hexadecimal or
similar.

This is encoded via suffixes, eg:
� I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
� I128, UI128: 128-bit
� I256, UI256: 256-bit
�� other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).

Larger decimal numbers could be supported, but for now I don't have a
strong need for decimal literals beyond 128 bits.

I did once have a very nice 128-bit type in my systems language, but it
didn't get enough use to be worth supporting. It was awkward to
implement too, since each value type took up two registers, or two stack
slots (in some cases, one of each!)

But my scripting language has an arbitrary-precision /decimal/ floating
point type, which can also be used for pure integer calculations.

I think the maximum range is 10**19000000000 (and a matching minimum). Precision is limited only by memory and runtime, but there are usually
caps in place otherwise evaluating 1/3 would go on forever.

This is one is actually more useful and a lot of fun.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Mon Nov 24 13:57:31 2025

On Mon, 24 Nov 2025 11:45:18 +0000
bart <bc@freeuk.com> wrote:

But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.

Arbitrary-precision floating point? That sounds problematic, regardless
of base. Unless you don't use the word 'arbitrary' in the same sense
that it is used, for example, in GMP.
Gnu MPFR is very careful to never call itself "arbitrary-precision" in
official docs.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Mon Nov 24 14:10:33 2025

On Mon, 24 Nov 2025 00:30:46 +0000
bart <bc@freeuk.com> wrote:

It is an unnecessary complication. There will be a lot of extra rules
that maybe partly 'implementation defined', so behaviour may vary.
And people WILL uses those types because they are there, and likely
they will be inefficient.

What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391
bits will need 7 x 64 bits, of which the last word will have 57 bits
of padding. It's very messy.

To me, it does not sound as a problem at all, at least for unsigned
types. Masking out unnecessary MS bits in MS word is easy.
Even for signed, sign extension of MS word is not as easy, as masking
out, but hardly a rocket science. The problem with signed is that
signed overflow is a saint cow of the temple of worshipers of nazal
demons. So, authors of proposal were afraid of touching it.

Specifying a multiple of 64 bits is better; a power of two even
better.

I strongly disagree. Being able to specify, say, 192-bit integers is
a useful thing. Esp. in context of multiplication and division.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 04:29:19 2025

bart <bc@freeuk.com> writes:

On 23/11/2025 22:38, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 23/11/2025 13:32, Waldek Hebisch wrote:

Philipp Klaus Krause <pkk@spth.de> wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do.

IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.

The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.

What rationale are you referring to? There hasn't been an official
ISO C Rationale document since C99.

See Introduction and Rationale here:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

Thanks.

Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).

Such as working out how pointers to them will work.

Why would pointers to _BitInt types be a problem? A _BitInt object
is a fixed-size chunk of memory, similar to a struct object.

Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-boundaries.

That part of the rationale appears to be specific to FPGA hardware, not something I know much about.

For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
but storing packed values means it would use only 625K bytes.

Anyway, pointers to individual values, or to some arbitrary element or
slice of such an array, would need some extra info.

On the implementations I have access to (gcc and clang), a _BitInt
object is an ordinary object, with a size that's whole number of bytes. `unsigned _BitInt(4)`, for example, has 4 value bits and 4 padding bits. (Unless it's a bit-field, but that doesn't give you packed arrays.)

I can see the benefit of tightly packing multiple bit-precise
integers into an array, but I don't see a way to do that, either
with the current gcc and llvm/clang implementations or with the C
memory model.

Also being able to use bit-fields wider than int.

For me main gain is reasonably standard syntax for integers bigger
that 64 bits.

Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.

But there are two problems with _BitInt:

* Any odd sizes are allowed, such as _BitInt(391)

Why is that a problem? If you don't want odd-sized types, don't use
them.

It is an unnecessary complication. There will be a lot of extra rules
that maybe partly 'implementation defined', so behaviour may vary. And
people WILL uses those types because they are there, and likely they
will be inefficient.

Imposing arbitrary restrictions would introduce more unnecessary
complication. As far as I've been able to tell, odd-sized _BitInt types
are already implemented (though I've done very little testing).

What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391 bits
will need 7 x 64 bits, of which the last word will have 57 bits of
padding. It's very messy.

An unsigned _BitInt(391) value wraps around modulo 2**391. In the
current gcc and clang implementations, it has a size of 56 bytes, with
391 value bits and 57 value bits. It doesn't seem to be a problem in
practice.

Specifying a multiple of 64 bits is better; a power of two even better.

Then by all means do so. Operations on _BitInt(448) or _BitInt(512)
might even be more efficient than operations on _BitInt(391).

If you want the language to restrict allowed widths, how exactly
would you specify it? Would you allow 32 but not 33? 64 but
not 65? 72? 80?

You can impose whatever restrictions you like in your own code.

* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type

The upper limit is specified by the implementation as
BITINT_MAXWIDTH, a macro defined in <limits.h>. For gcc 15.2.0 on
x86_64, BITINT_MAXWIDTH is 65535 (2**16-1). For clang 21.1.5 it's
8388608 (2**23 bits, 1048576 bytes). clang seems to have some
problems with _BitInt(8388608). For example, this program: #include
<limits.h> _BitInt(BITINT_MAXWIDTH) n = 42;
int main(void) {
n *= n;
}
takes a *long* time to compile with clang. I believe it's generating
inline code to do the 8388608 by 8388608 bit multiplication.

Now try it with two disparate sizes.

Why? llvm/clang currently has a known problem with multiplication
and division on very large _BitInt types. It shouldn't be too
difficult for them to correct it. Operations on disparate sizes
don't add much complexity (the narrower operand is promoted to the
wider type).

If you're curious, here's the bug report (I've commented on it),
but it's an implementation issue, not a language issue.

https://github.com/llvm/llvm-project/issues/126384

So what is the result type of multiplying values of those two types?

_BitInt types are exempt from the integer promotion rules (so
_BitInt(3) doesn't promote to int), but the usual arithmetic
conversions apply. If you multiply values of two _BitInt types, the
result is the wider of the two types.

So multiplying even two one-million-bit types could overflow!

Of course. These are fixed-width types, not arbitrary precision types.

If you want to multiply two _BitInt(1'000'000) values without overflow,
you can convert to _BitInt(2'000'000) -- if the compiler supports it.
(Don't expect the code to be efficient, at least for now.)

Such limits for /fixed-width/ integers are ridiculous.

I acknowledge that you think so.

I honestly don't know why the gcc maintainers felt it was worthwhile
to support _BitInt types up to 65535 bits, or why the llvm/clang
maintainers decided to support up to 8388608 bits. But that's
what they've done, and again, you don't have to use it if you don't
want to. There could easily be a perfectly valid reason that you
and I are not aware of.

It's likely that implementing million-bit integers isn't
significantly harder than implementing thousand-bit integers.

Bit-precise integers up to, say, 128 or 256 bits seem to be
implemented reasonably efficiently. How exactly does the fact that
compilers support much wider types inconvenience you?

You might say this is no different from defining an array of exactly
123,456 elements. But the use-cases are very different.

I starting going into details but I guess you don't care about such
matters or whether the feature makes much sense.

On the contrary, I'm curious about it. But if two different compiler
teams have already done the work of implementing bit-precise
integers with extremely large and/or odd widths, I can think of
no reason to complain about it. Even if it doesn't make sense,
I didn't have to do the work of implementing it.

Incidentally, something odd happens to quoted text in your followups.
Blank lines are lost, and paragraphs are reformatted oddly, often
with alternating long and short lines. Is your newsreader doing
that, or is it something else? Can you do something about it?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 12:31:44 2025

On 24/11/2025 11:17, David Brown wrote:

On 24/11/2025 01:30, bart wrote:

Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.

No, that is incorrect.

The proposal mentions saving /space/ as relevant in FPGAs - not saving / memory/.

But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.

That's not going to happen if they are simply rounded up to the next power-of-two type.

If the purpose is, say, a 17-bit type that wraps past values of 131071,
then that sounds like a lot of extra code needed, for something that
does not sound that useful. Why modulo 2**17; why not 100,000? Or any
value more relevant to the task.

� The authors use-case here is in writing code that can be
compiled with a "normal" C compiler on a "normal" target, and also
compiled to FPGA /hardware/, with the same semantics.� In hardware, a 5-
bit by 5-bit single-cycle multiplier is very much smaller than an 8-bit
by 8-bit multiplier, and orders of magnitude smaller than if the 5-bit integers are promoted to 32-bit before multiplying.

The proposal is not about saving /memory/.� It specifically says that a _BitInt(N) has the same size and alignment as the smallest basic type
that can contain it, until you get to N greater than 64-bit, in which
they are contained in an array of int64_t.� (The reality is a little
more formal, to handle targets that have other sizes of their basic types.)

So on a "normal" target, a _BitInt(3) is the same size and alignment as
a uint8_t, a _BitInt(35) is effectively contained in an uint64_t, and an array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or 128
bits, not 68 bits.

As far as I can see, the C23 standard does not specify these details,
and leaves them up to the target ABI.� But at the very least, they will always take an integer number of bytes - unsigned char.� There can never
be any crossing of byte boundaries.

What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These could actually be practically implemented, with a few restrictions, and could
save a lot of memory.

Why?� And why do you talk specifically about odd numbers?� I can
understand your concern about packing arrays of _BitInts that are not multiples of 8, though I hope you now understand that it is not the
problem you thought it was.� However, I see no reason to suppose that _BitInt(5) is any more or less "complicated" than _BitInt(6) just
because 5 is an odd number!

I mean odd compared with powers-of-two, or multiples of 8.

A major point of the _BitInt concept is to be able to specify and use integers of specific explicit sizes in a way that is as implementation independent as possible.� Some aspects of the implementation cannot be avoided - such as the size of unsigned char and alignment and padding
for storage.� But the behaviour of the types is entirely independent of
the implementation.� There are no "extra rules" - neither for specific implementations, nor for specific sizes of _BitInt's.

Efficiency of implementation is, of course, up to the implementation.
But there is absolutely no reason to suppose that working with a _BitInt
of size up to the implementation's maximum integer type is going to be
less efficient than using other types and masking.� For larger
_BitInt's, there are different possible implementation strategies with different pros and cons in regard to efficiency.

What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391 bits
will need 7 x 64 bits, of which the last word will have 57 bits of
padding. It's very messy.

It is not messy at all.� Signed integer overflow is UB, unsigned integer overflow is wrapping.� It's the same as always, and could not be
simpler, clearer or neater.

In my 391-bit example, the top 7 bits will be within a 64-bit word. What values will those extra 57 bits be?

Taking just those 7 bits by themselves, if the value is 1111111, that is:
00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)

and you do an arithmetic right shift, then you will get 0111111 not

1111111, since the hardware sign bit is bit 63 not bit 6. It needs more
work.

Such limits for /fixed-width/ integers are ridiculous.

Um, I think you might want to re-read and re-phrase that.� When you have fixed-width integers, you have a finite range.

No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
those still aren't built-in).

And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)

For such sizes it makes much more sense to acknowledge the existence of arbitrary-precision support, so that the equivalents of int1000000_t and int817838_t would be compatible types. Or you can forget specific widths
and just have the one bigint type.

(I use such types, but within a library, and there there are ways cap
the precision.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 04:37:33 2025

BGB <cr88192@gmail.com> writes:
[...]

In BGBCC, there is a hard limit of IIRC 16384 bits.

As an extension, it also allows for very large literals, though
currently literals larger than 128 bits can only use hexadecimal or
similar.

This is encoded via suffixes, eg:
I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
I128, UI128: 128-bit
I256, UI256: 256-bit
other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).

In C23, an integer constant with a "wb" or "WB" suffix is of type
_BitInt(n). One with a "wbu" suffix is of type unsigned _BitInt(n).
The value of n is the smallest that can accomodate the value of the
constant.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 12:56:58 2025

On 24/11/2025 11:57, Michael S wrote:

On Mon, 24 Nov 2025 11:45:18 +0000
bart <bc@freeuk.com> wrote:

But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.

Arbitrary-precision floating point? That sounds problematic, regardless
of base. Unless you don't use the word 'arbitrary' in the same sense
that it is used, for example, in GMP.
Gnu MPFR is very careful to never call itself "arbitrary-precision" in official docs.

If you mean problems like repeated multiplies giving ever larger
numbers, then that will happen also with integers (or rationals).

If you mean the problems with a divide operation potentially carrying on indefinitely, then a cap needs to be set on that.

I haven't attempted libraries for working out trancendental functions;
the problems there are in getting a particular precision even if you
know that in advance.

But for basic arithmetic, it works extremely well.

(While it is built-in to my scripting language, it was originally a
standalone library and has been ported to C. See the bignum.c and
bignum.h files here:

https://github.com/sal55/langs/tree/master/bignum

You can try out division like this:

#include <stdio.h>
#include "bignum.h"

int main() {
Bignum a, b, c;

a = bn_makeint(1);
b = bn_makeint(7);
c = bn_init();

bn_div(c, a, b, 1000);
bn_println(c);
}

(Build as 'gcc prog.c bignum.c' etc.)

You can see that 'bn_div' needs a precision argument: this is the number
of significant decimal digits. Using 100M here produced 100 million
digits and took about 6 seconds.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 05:06:33 2025

David Brown <david.brown@hesbynett.no> writes:
[...]

Yes, exactly. At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :

_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;

can be implemented as something like :

__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);

That looks like it's supposed to avoid overflow (I'm assuming NM is N + M), but it wouldn't work. The type of a C expression is almost always determined
by the expression itself, regardless of the context in which it appears.
The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can
overflow even if the full result would fit into z.

You can do this instead (not tested):

_BitInt(N) x;
_BitInt(M) y;
_Bit_Int(N+M) z = (_BitInt(N+M))x * y;

(I'm assuming N+M is sufficient, but I might have missed an off-by-one
error somewhere.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 05:12:12 2025

bart <bc@freeuk.com> writes:

On 24/11/2025 09:29, David Brown wrote:

On 23/11/2025 16:06, Michael S wrote:

On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:

So what is the result type of multiplying values of those two types?

I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically
unsatisfactory, but consistent with the rest of language.

There is one key difference between the _BitInt() types and other
integer types - with _BitInt(), there are no automatic promotions to
other integer types.� Thus if you are using _BitInt() operands in an
arithmetic expression, these are not promoted to "int" or "unsigned
int" even if they are smaller (lower rank).� If you mix _BitInt()'s
of different sizes, then the smaller one is first converted to the
larger type.

I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part >>> of the data, is not allowed.

Correct.� _BitInt(N) is a signed integer type with precisely N value
bits.� It can have padding bits if necessary (according to the
target ABI), but it can't have any other information.

Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on
arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.

Yes, exactly.� At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed
on.� Thus :
��_BitInt(N) x;
��_BitInt(M) y;
��_BitInt(NM) z = x * y;

So what is NM here; is it N*M (the potential maximum size of the
result), or max(N, M)?

I made the same mistake in my previous post, but corrected it before
posting it. The required size for the product in N+M bits, not N*M.
For example, N=32, M=64 -> NM=96.

[...]

How would you write a generic user function that operates on any size
BitInt? For example:

_BitInt(?) bi_square(_BitInt(?));

I don't think you can. Each _BitInt(N) type is distinct.

You could have a function that operates on arguments of type
[unsigned] _BitInt(BITINT_MAXWIDTH) and depend on implicit
conversions, but that's likely to be horribly inefficient.

Or you can replace BITINT_MAXWIDTH by the maximum width you happen to
need in your application.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Mon Nov 24 15:17:49 2025

On Mon, 24 Nov 2025 12:56:58 +0000
bart <bc@freeuk.com> wrote:

On 24/11/2025 11:57, Michael S wrote:

On Mon, 24 Nov 2025 11:45:18 +0000
bart <bc@freeuk.com> wrote:

But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.

Arbitrary-precision floating point? That sounds problematic,
regardless of base. Unless you don't use the word 'arbitrary' in
the same sense that it is used, for example, in GMP.
Gnu MPFR is very careful to never call itself "arbitrary-precision"
in official docs.

If you mean problems like repeated multiplies giving ever larger
numbers, then that will happen also with integers (or rationals).

If you mean the problems with a divide operation potentially carrying
on indefinitely, then a cap needs to be set on that.

Yes, that what I meant.

I haven't attempted libraries for working out trancendental
functions; the problems there are in getting a particular precision
even if you know that in advance.

But for basic arithmetic, it works extremely well.

(While it is built-in to my scripting language, it was originally a standalone library and has been ported to C. See the bignum.c and
bignum.h files here:

https://github.com/sal55/langs/tree/master/bignum

You can try out division like this:

#include <stdio.h>
#include "bignum.h"

int main() {
Bignum a, b, c;

a = bn_makeint(1);
b = bn_makeint(7);
c = bn_init();

bn_div(c, a, b, 1000);
bn_println(c);
}

(Build as 'gcc prog.c bignum.c' etc.)

You can see that 'bn_div' needs a precision argument: this is the
number of significant decimal digits. Using 100M here produced 100
million digits and took about 6 seconds.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Mon Nov 24 15:27:36 2025

On Mon, 24 Nov 2025 05:06:33 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

Yes, exactly. At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :

_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;

can be implemented as something like :

__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);

That looks like it's supposed to avoid overflow (I'm assuming NM is N
+ M), but it wouldn't work. The type of a C expression is almost
always determined by the expression itself, regardless of the context
in which it appears. The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can overflow even if the full result would fit
into z.

You can do this instead (not tested):

_BitInt(N) x;
_BitInt(M) y;
_Bit_Int(N+M) z = (_BitInt(N+M))x * y;

(I'm assuming N+M is sufficient, but I might have missed an off-by-one
error somewhere.)

You missed nothing. N+M is both sufficient and necessary. The latter
because of -(2**(N-1)) * -(2**(M-1)).

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 05:33:03 2025

bart <bc@freeuk.com> writes:

On 24/11/2025 11:17, David Brown wrote:

On 24/11/2025 01:30, bart wrote:

Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.

No, that is incorrect.
The proposal mentions saving /space/ as relevant in FPGAs - not
saving / memory/.

But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.

That's *your* presumption.

The rationale section of N2709 mentions performance/space concerns only
in the context of FPGAs.

Packing arrays on ordinary hardware is impractical given C's memory
model.

[...]

What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
could actually be practically implemented, with a few restrictions,
and could save a lot of memory.

No, they couldn't. Array indexing is defined in terms of pointer
arithmetic, and you can't have a pointer to something smaller than one
byte.

<OT>I can see something this being done in C++ with operator
overloading. See, for example, the std::vector<bool> partial specialization.</OT>

[...]

In my 391-bit example, the top 7 bits will be within a 64-bit
word. What values will those extra 57 bits be?

Probably 0.

Taking just those 7 bits by themselves, if the value is 1111111, that is:
00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)

and you do an arithmetic right shift, then you will get 0111111 not

1111111, since the hardware sign bit is bit 63 not bit 6. It needs
more work.

Yes, the compiler has to do some extra work for types with padding bits,
to ensure that those bits are either set to 0 or properly ignored.

[...]

No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
those still aren't built-in).

And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)

Your complaint seems to be that the feature is too flexible.

For such sizes it makes much more sense to acknowledge the existence
of arbitrary-precision support, so that the equivalents of
int1000000_t and int817838_t would be compatible types. Or you can
forget specific widths and just have the one bigint type.

Yes, there are a lot of things that C23 *could* have done, but didn't.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 05:35:30 2025

bart <bc@freeuk.com> writes:
[...]

There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.

As far as I know, the standard makes no such distinction.

I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.

I don't recall any such claim. Do you have a citation (other than
the FPGA-specific wording in N2709)?

But if a _BitInt(17) is rounded up to 32 bits, there's not going to be
any saving!

Correct.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 14:49:08 2025

On 24/11/2025 12:17, bart wrote:

On 24/11/2025 09:29, David Brown wrote:

On 23/11/2025 16:06, Michael S wrote:

On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:

So what is the result type of multiplying values of those two types?

I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically
unsatisfactory, but consistent with the rest of language.

There is one key difference between the _BitInt() types and other
integer types - with _BitInt(), there are no automatic promotions to
other integer types.� Thus if you are using _BitInt() operands in an
arithmetic expression, these are not promoted to "int" or "unsigned
int" even if they are smaller (lower rank).� If you mix _BitInt()'s of
different sizes, then the smaller one is first converted to the larger
type.

I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part >>> of the data, is not allowed.

Correct.� _BitInt(N) is a signed integer type with precisely N value
bits.� It can have padding bits if necessary (according to the target
ABI), but it can't have any other information.

Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on
arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.

Yes, exactly.� At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :

��_BitInt(N) x;
��_BitInt(M) y;
��_BitInt(NM) z = x * y;

So what is NM here; is it N*M (the potential maximum size of the
result), or max(N, M)?

No, it is whatever you want it to be. I didn't want to use the next
letter after N because _BitInt(O) could easily be misunderstood. But of course NM could be misunderstood too. Perhaps N1, N2 and N3 would have
been better choices than N, M and NM.

You pick the size of "z" here according to your needs for your code.
The multiplication will be done, logically, at max(N, M) bits. The
result will then be converted to NM bits. Like always in C, the
semantics of the calculation is entirely independent of the type of the variable you assign the results to. And like always in C, the compiler
may take advantage of knowledge of the assigned type in order to give
more efficient code, as long as it does not stray from giving the same
value as if it took the code literally.

So if you want the full range of values of x and y to be usable here,
then NM would have to be N * M. But you would also need a cast, such as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want to
multiply two 32-bit ints as a 64-bit operation.

Alternatively, you might know more about the values that might be in x
and y, and have a smaller NM (though you still need a cast if it is
greater than both N and M). Or you might be using unsigned types and
want the wrapping / masking behaviour.

The point was not what size NM is, but that it is known to the compiler
at the time of writing the expression.

It sounds like the max precision you get will be the latter.

can be implemented as something like :

��__bit_int_signed_mult(NM, (unsigned char *) &z,
�� N, (const unsigned char *) &x,
�� M, (const unsigned char *) &y);

How would you write a generic user function that operates on any size BitInt? For example:

�� _BitInt(?) bi_square(_BitInt(?));

You can't. _BitInt(N) and _BitInt(M) are distinct types, for differing
N and M. You can't write a generic user function in C that implements
"T foo(T)" where T can be "int", "short", "long int", or other types. C simply does not have type-generic functions.

You /can/ write generic macros that handle different _BitInt types, but
that would quickly get painful given that you'd need a case for each
size of _BitInt you wanted for the _Generic macro.

If you want generics, you are better off with a language that supports generics, such as C++.

Even if you passed the size as a parameter, there would be a problem
with the BitInt type.

Yes. But you could use a void* pointer for more generic parameters.

However, _BitInt types are for "bit-precise integer types". They are
for specific fixed sizes, not for arbitrary precision integers. They
are not ideally suited for tasks for which they were not designed -
that's hardly surprising.

This assumes BitInts are passed and returned by value, but even using BitInt* wouldn't help.

Yes, they are passed around as values - they are integer types and are
passed around like other integer types. (Implementations may use stack
blocks and pointers for passing the values around if they are too big
for registers, just as implementations can do with any value type.
That's an implementation detail - logically, they are passed and
returned as values.)

This sets it apart from arrays, where you also define very large, fixed
size arrays, but can use a T(*)[] type to write generic functions, that
take an additional length parameter.

_BitInt's are fixed-size integer types, not arrays. Again, it is not
then surprising that they are different from arrays.

This will be for a particular T, but for BitInt, T is also fixed; it
happens to be an implicit bit type.

_BitInt's are not arrays, they are scalars - they are integer types.
There is no concept of a type "_BitInt" - they always have compile-time
fixed sizes, such as "_BitInt(12)". So the idea of passing around
generic _BitInt's makes no more sense than passing around any other kind
of generic integer types. (Of course you can have an array of _BitInt's
of any given size.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 14:51:09 2025

On 24/11/2025 14:06, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

Yes, exactly. At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :

_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;

can be implemented as something like :

__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);

That looks like it's supposed to avoid overflow (I'm assuming NM is N + M), but
it wouldn't work. The type of a C expression is almost always determined
by the expression itself, regardless of the context in which it appears.
The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can
overflow even if the full result would fit into z.

You can do this instead (not tested):

_BitInt(N) x;
_BitInt(M) y;
_Bit_Int(N+M) z = (_BitInt(N+M))x * y;

(I'm assuming N+M is sufficient, but I might have missed an off-by-one
error somewhere.)

It /looks/ like NM means "N + M" (or N * M, as both Bart and I wrote
without thinking), but that was not my intention. I simply meant a
constant that may be chosen differently from N and M, and did not want
to go on to the letter O. In hindsight, NM was a poor choice.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 15:02:59 2025

On 24/11/2025 12:44, Michael S wrote:

On Mon, 24 Nov 2025 12:17:58 +0100
David Brown <david.brown@hesbynett.no> wrote:

The proposal is not about saving /memory/. It specifically says that
a _BitInt(N) has the same size and alignment as the smallest basic
type that can contain it, until you get to N greater than 64-bit, in
which they are contained in an array of int64_t. (The reality is a
little more formal, to handle targets that have other sizes of their
basic types.)

That is a bit unfortunate.
Compiler support for arrays of 17 to 24bit numbers packed as 3 octet
per item would have been handy. And not hard at all for compiler to implement, at least on architectures that has proper support for
unaligned access, like x86, POWER, Arm and RISC-V.

I certainly have real-world applications that use packed arrays like
that. They could have been written in cleaner and less error-prone
way if such feature was available.

I suppose, packed numeric arrays with 5, 6 or 7 octets per item are also
used by some people, although they are probably less common than my
case.

There may certainly be use-cases for such "packed arrays", but I think
that would just add complications to the definitions of _BitInt and
require more implementation-specific behaviour. And then someone would
insist that they be packed by bit, rather than by byte, and cause all
the problems that Bart feared.

I think this kind of thing is probably best left to
implementation-specific features - just like "packed" attributes and
pragmas today.

Alternatively, a standardised syntax for detailed control of packing and ordering in structs, arrays, and especially bit-fields, could be
developed and added to the standards. I don't see a good reason to
handle _BitInt's differently.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 14:21:23 2025

On 24/11/2025 13:35, Keith Thompson wrote:

bart <bc@freeuk.com> writes:
[...]

There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.

As far as I know, the standard makes no such distinction.

*I* am making the distinction. From an implementation point of view (and assuming 64-bit hardware), they are quite different.

And that leads to different kinds of language features.

If the possibilities above 64 bits were less ambitious (say i128 and
i256), then the concept might be stretched to cover both. But not when
when you can also have i1234567.

It would be having a GETBITS macro, which is not limited to a 1- to
63-bit bitfield of a u64 value, but could return a slice of an
arbitrarily large array.

I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.

I don't recall any such claim. Do you have a citation (other than
the FPGA-specific wording in N2709)?

This is where it came up in this thread:

On 23/11/2025 11:46, Philipp Klaus Krause wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do. Also
being able to use bit-fields wider than int.

Saving memory for two reasons:

* On small embedded systems where there is very little memory
* For code that needs to be very fast on big systems to make data
structures fit into cache

Although this doesn't go as far as using odd bit-sizes: it would mean
using sizes like 24, 40, 48, and 56 bits instead of 32 or 64 bits.

The savings would be sparse.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 14:41:03 2025

On 24/11/2025 13:33, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
could actually be practically implemented, with a few restrictions,
and could save a lot of memory.

No, they couldn't. Array indexing is defined in terms of pointer
arithmetic, and you can't have a pointer to something smaller than one
byte.

The restrictions I mentioned were to do with pointers to individual bits.

It is possible that operations such as:

x = A[i]
A[i] = x

can be well defined when A is an array of 1/2/4-bit values, even if
expressed like this:

*(A + i)

But this would have to be indivisible when A is such an array: only the
whole thing is valid, not (A + i) by itself, or A by itself; you'd need &A.

This would need a small tweak to the language, but that is nothing
compared to supporting (i3783467 * i999 / i3) >> i17.

But I write a script in my dynamic language, which does support arrays
of 'u1 u2 u4', and it gives these results:

Array of u1 uses 12,500,000 bytes
Array of u2 uses 25,000,000 bytes
Array of u4 uses 50,000,000 bytes
Array of u8 uses 100,000,000 bytes
Array of u16 uses 200,000,000 bytes
Array of u32 uses 400,000,000 bytes
Array of u64 uses 800,000,000 bytes

C can only get down to that u8 figure (100MB) using its 'char' type.
Even 'bool' doesn't make it smaller (presumably for the reasons you mentioned).

You are forced to emulate such arrays in user-code using shifts and masks.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 15:41:13 2025

On 24/11/2025 13:31, bart wrote:

On 24/11/2025 11:17, David Brown wrote:

On 24/11/2025 01:30, bart wrote:

Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.

No, that is incorrect.

The proposal mentions saving /space/ as relevant in FPGAs - not saving
/ memory/.

But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.

OK. However, that is not what is in the proposal, nor in the C23 standard.

That's not going to happen if they are simply rounded up to the next power-of-two type.

Correct (with the proviso that after 64 bits, rounding is to whatever
type can contain an int64_t).

As I mentioned, I don't think the C standards require that rounding-up
size, even though it was in the proposal. It may be worth punting that question over to the "comp.std.c" newsgroup to see if someone has a
definite answer.

For the kind of small systems that had been mentioned in the context of
saving memory, compilers often have extensions or
implementation-specific features (attributes, pragmas, etc.) to go
beyond standard C in order to get greater efficiency on tiny systems.
These may support smaller containers or tighter array packing.

If the purpose is, say, a 17-bit type that wraps past values of 131071,
then that sounds like a lot of extra code needed, for something that
does not sound that useful. Why modulo 2**17; why not 100,000? Or any
value more relevant to the task.

Signed _BitInt's don't wrap - arithmetic overflow is UB. Unsigned
_BitInt's wrap, just like with all other unsigned integer types in C.
And wrapping is /not/ a lot of extra code - wrapping an N-bit type is
just a and instruction with the constant (2 << N) - 1. This can be done
once at the end of complex arithmetic expressions, in most cases
(shift-right and division can mean extra masking is needed).

Why not provide wrapping types with arbitrary wrapping values? Why not
indeed - some languages do (Ada springs to mind). They are not actually
that often needed, so it's easier just to put a "% X" operation in the
user code.

The rational in the proposal that you linked said why these _BitInt
types can be useful. They are for expressing the intent of the
programmer more clearly, making it more convenient to work with somewhat bigger integer sizes (such as for cryptography), and improving FPGA development. Wrapping is not a big point (and it does not apply at all
to signed _BitInt).

� The authors use-case here is in writing code that can be compiled
with a "normal" C compiler on a "normal" target, and also compiled to
FPGA /hardware/, with the same semantics.� In hardware, a 5- bit by
5-bit single-cycle multiplier is very much smaller than an 8-bit by
8-bit multiplier, and orders of magnitude smaller than if the 5-bit
integers are promoted to 32-bit before multiplying.

The proposal is not about saving /memory/.� It specifically says that
a _BitInt(N) has the same size and alignment as the smallest basic
type that can contain it, until you get to N greater than 64-bit, in
which they are contained in an array of int64_t.� (The reality is a
little more formal, to handle targets that have other sizes of their
basic types.)

So on a "normal" target, a _BitInt(3) is the same size and alignment
as a uint8_t, a _BitInt(35) is effectively contained in an uint64_t,
and an array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or
128 bits, not 68 bits.

As far as I can see, the C23 standard does not specify these details,
and leaves them up to the target ABI.� But at the very least, they
will always take an integer number of bytes - unsigned char.� There
can never be any crossing of byte boundaries.

What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These could actually be practically implemented, with a few restrictions, and could
save a lot of memory.

They could - but that would add a lot of complications (the once you
worried about). I would assume that this was considered both by the
authors of the proposal, and by the C committee, and rejected as not
being worth the cost.

Why?� And why do you talk specifically about odd numbers?� I can
understand your concern about packing arrays of _BitInts that are not
multiples of 8, though I hope you now understand that it is not the
problem you thought it was.� However, I see no reason to suppose that
_BitInt(5) is any more or less "complicated" than _BitInt(6) just
because 5 is an odd number!

I mean odd compared with powers-of-two, or multiples of 8.

Okay. "Unusual" might have been a better choice of term, or you could
have explained what you meant. But that makes more sense.

A major point of the _BitInt concept is to be able to specify and use
integers of specific explicit sizes in a way that is as implementation
independent as possible.� Some aspects of the implementation cannot be
avoided - such as the size of unsigned char and alignment and padding
for storage.� But the behaviour of the types is entirely independent
of the implementation.� There are no "extra rules" - neither for
specific implementations, nor for specific sizes of _BitInt's.

Efficiency of implementation is, of course, up to the implementation.
But there is absolutely no reason to suppose that working with a
_BitInt of size up to the implementation's maximum integer type is
going to be less efficient than using other types and masking.� For
larger _BitInt's, there are different possible implementation
strategies with different pros and cons in regard to efficiency.

What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391
bits will need 7 x 64 bits, of which the last word will have 57 bits
of padding. It's very messy.

It is not messy at all.� Signed integer overflow is UB, unsigned
integer overflow is wrapping.� It's the same as always, and could not
be simpler, clearer or neater.

In my 391-bit example, the top 7 bits will be within a 64-bit word. What values will those extra 57 bits be?

They are padding bits. They don't contribute to the value of the object.

An implementation, or rather an ABI, can decide that they should always
be zero, or always zero for unsigned _BitInt and always a sign extension
for signed _BitInt, or it can decide that they are always ignored.
Giving a specific value means masking may be needed before storing a
value in memory or passing it on to an external function, while making
it ignored can mean masking might be needed when reading from memory or
using a returned value.

It is not really any different from other padding bits or bytes, such as
all but the LSB in a _Bool, or padding in structs.

Taking just those 7 bits by themselves, if the value is 1111111, that is:
� 00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)

and you do an arithmetic right shift, then you will get 0111111 not

C does not have an "arithmetic right shift" operation - that's an assembly-level operation. Signed right-shift of negative values is implementation-defined in C.

1111111, since the hardware sign bit is bit 63 not bit 6. It needs more work.

If the value of those 7 bits is 0111'1111, you have a negative value and right-shifting that is implementation-defined. The compiler
implementation can pick whatever it feels is efficient and a good choice
for its users. Maybe that means it defines the right-shift to work as
though the type was unsigned - you get 0011'1111. Maybe it means it
defines the padding bits for signed _BitInt to use sign extension, and
signed right-shift instructions. Maybe it means that the compiler will
mask the value, then sign-extend it, then do a signed right-shift
instruction, then mask it again. That's all up to the implementation.

You are worrying about completely negligible things here. (If you are considering adding support for _BitInt to your own C tools, then I
understand wanting to get all the details right.)

Such limits for /fixed-width/ integers are ridiculous.

Um, I think you might want to re-read and re-phrase that.� When you
have fixed-width integers, you have a finite range.

No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
those still aren't built-in).

And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)

I had taken your "ridiculous" comment to be part of your complaint that "multiplying even two one-million-bit types could overflow". But those statements are independent, then only the first is silly - of course arithmetic on any finite sized type can overflow unless specifically
limited (such as by wrapping behaviour for unsigned types). I agree
that huge fixed-size integer types are not useful, though I am not sure
where the ideal limit lies. The biggest use-case for very large
integers is cryptography. I find it hard to imagine sizes greater than
16 kbit being directly useful, and thus 32 kbit sizes for intermediary results. Fixed sizes can be more efficient than arbitrary precision
types when the same sized objects are used repeatedly.

For such sizes it makes much more sense to acknowledge the existence of arbitrary-precision support, so that the equivalents of int1000000_t and int817838_t would be compatible types. Or you can forget specific widths
and just have the one bigint type.

(I use such types, but within a library, and there there are ways cap
the precision.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 15:59:41 2025

On 24/11/2025 14:17, Michael S wrote:

On Mon, 24 Nov 2025 12:56:58 +0000
bart <bc@freeuk.com> wrote:

On 24/11/2025 11:57, Michael S wrote:

On Mon, 24 Nov 2025 11:45:18 +0000
bart <bc@freeuk.com> wrote:

But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.

Arbitrary-precision floating point? That sounds problematic,
regardless of base. Unless you don't use the word 'arbitrary' in
the same sense that it is used, for example, in GMP.
Gnu MPFR is very careful to never call itself "arbitrary-precision"
in official docs.

If you mean problems like repeated multiplies giving ever larger
numbers, then that will happen also with integers (or rationals).

If you mean the problems with a divide operation potentially carrying
on indefinitely, then a cap needs to be set on that.

Yes, that what I meant.

I remember a fun programming task at university in a language similar to Haskell, which involved writing an arbitrary precision fixed-point
decimal arithmetic package. It included support for an infinite
polynomial expansion for arctan, and then use a Maclin-like formula to
get a "value" for pi. It all worked well, as long as you remembered to
limit how many digits you printed out...

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Mon Nov 24 11:52:33 2025

On 11/24/2025 6:37 AM, Keith Thompson wrote:

BGB <cr88192@gmail.com> writes:
[...]

In BGBCC, there is a hard limit of IIRC 16384 bits.

As an extension, it also allows for very large literals, though
currently literals larger than 128 bits can only use hexadecimal or
similar.

This is encoded via suffixes, eg:
I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
I128, UI128: 128-bit
I256, UI256: 256-bit
other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).

In C23, an integer constant with a "wb" or "WB" suffix is of type
_BitInt(n). One with a "wbu" suffix is of type unsigned _BitInt(n).
The value of n is the smallest that can accomodate the value of the
constant.

OK, I missed that part.

I had a need though in this case to specify an exact width for the
constant in some use cases, rather than merely just specify its largeness.

But, yeah, I<nn> and U<nn> / UI<nn> are non-standard, but alas...

Follows a similar pattern as for printf modifiers, say:
printf("%I64u\n", longValue); //MSVC specific
Vs, say:
printf("%llu\n", longValue); //Most everything else

In this case, the I<nn> notation being extended to also cover __int128
and _BitInt.

...

[...]

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 18:35:01 2025

On 24/11/2025 14:41, David Brown wrote:

On 24/11/2025 13:31, bart wrote:

That's all up to the implementation.

You are worrying about completely negligible things here.

Is it that negligible? That's easy to say when you're not doing the implementing! However it may impact on the size and performance of code.

And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)

I had taken your "ridiculous" comment to be part of your complaint that "multiplying even two one-million-bit types could overflow".� But those statements are independent, then only the first is silly - of course arithmetic on any finite sized type can overflow unless specifically
limited (such as by wrapping behaviour for unsigned types).� I agree
that huge fixed-size integer types are not useful, though I am not sure where the ideal limit lies.

You don't think it strange that C doesn't even have a 128-bit type yet
(it only barely has width-specific 64-bit ones).

There is just the poor gnu extension where 128-bit integers didn't have
a literal form, and there was no way to print such values.

But now there is this huge leap, not only to 128/256/512/1024 bits, but
to conceivably millions, plus the ability to specify any weird type you
like, like 182 bits (eg. somebody makes a typo for _BitInt(128), but
they silently get a viable type that happens to be a little less
efficient!).

So, 20 years of having 64-bit processors with little or no support for
even double-word types, and now there is this explosion in capabilities.

Or, are literals and print facilities for these new types still missing?

Personally I think they should have got the basics right first, like a
decent 128-bit type, proper literals, and ways to print.

This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
allocated on the stack?). A poorly suited, hard-to-implement feature.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Mon Nov 24 13:12:54 2025

On 11/24/2025 8:21 AM, bart wrote:

On 24/11/2025 13:35, Keith Thompson wrote:

bart <bc@freeuk.com> writes:
[...]

There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.

As far as I know, the standard makes no such distinction.

*I* am making the distinction. From an implementation point of view (and assuming 64-bit hardware), they are quite different.

And that leads to different kinds of language features.

As noted, as I understand it there is no reason for the storage to be
smaller than the next power-of-2 size.

Supporting odd-sized values in memory would have added a lot more of a
pain in terms of making things efficient (it is a lot more of an issue
to store a 24-bit or 40-bit item to memory than 32 or 64).

Though, one possibility could be "__packed _BitInt(n)" where in this
case it would handle them as the nearest multiple of 8 bits rather than
as the nearest power-of-2.

As least on my ISA design, Load/Store ops are mostly only available in power-of-2 sizes, and the direct displacement case is limited to natural alignment (though using RISC-V encodings can sidestep this limitation in
the case of the XG3 variant, or if targeting RISC-V, *).

*: In my case, the ISA has split into multiple variants:
XG1: Its original form.
16/32/64/96 bit instructions.
Mostly 5-bit register fields.
XG2: Modified.
Loses 16-bit encodings;
Gains slightly larger immediate values;
All register fields expand to 6 bits;
Encoding scheme is slightly dog-chewed.
XG3:
Instructions were repacked to be compatible with RISC-V;
Register numbering was made compatible with RISC-V;
Un-dog-chewed the encoding scheme some vs its predecessors;
Instruction stream can be mixed/matched with RV64G.
However, while both RV64G and XG3 ops support superscalar.
For reasons, my CPU core can't co-issue RV64 and XG3 instructions.
So, it is more like the ISA can flip/flop every clock-cycle.

However, can note that RISC-V also still lacks NPOT memory operations.

And, if your memory store looks like:
SRLI X6, X10, 16
SW X10, 13(X12)
SB X6, 15(X12)

This isn't great, don't want to pay these sorts of penalties without reason.

For odd-sized _BitInt, one pays the cost mostly by using sign/zero
extension on certain operations.

In basic forms of both ISAs, this can be done via a pair of shift instructions, say, zero-extending 24 bits:
SLLI X10, X10, 40
SRLI X10, X10, 40

In my case, there is an optional feature that can allow this to be
encoded as a single instruction. Although the instruction in question
uses a 64-bit encoding; so doesn't save any code-size over the pair of
shifts, but is faster; partly also because in my CPU core most
instructions have a minimum latency of 2 clock cycles; which isn't ideal
for a lot of RISC-V's patterns.

Though, on the CPU in question, the ideal scheduling isn't so much to
try to reuse a register immediately, but if possible to put around 5 instructions between modifying a register and trying to access its value
again (but, this case really sucks for some constructs in RV).

Like, one can't optimally schedule an array index load (needs 3
instructions in RV64G) when such scheduling will most likely exceed the
total length of the loop body (and trying to modulo-schedule array-loads
is just kinda absurd).

Well, technically, CPU isn't VLIW (at least for RV64 and XG3, XG1 and
XG2 were "LIW"), but being 3-wide in-order, optimal case for performance
is still to try to schedule things as-if they were (V)LIW.

Though, the spacing drops to 3 intermediate instructions if scheduling
for 2-wide; which may make sense either if there isn't sufficient ILP to optimize for 3-wide scheduling (most of the time) or the code is doing
things that hinder 3-wide operation (minority case; but can happen as
the 3rd lane in this case only does basic ALU instructions and is
"eaten" by certain instructions, such as indexed-store, etc).

...

My compiler still doesn't deal with all of this well (and sorta blows it
off in the case of targeting RV64G or RV64GC), but this sort of thing
seems to be sort of a pain case in general (and it sorta helps if the programmer also write their code in a way that helps the compiler along
here; but helps some if ISA design limitations don't actively hinder the ability to generate efficient code in this area).

...

Though, had noted that (curiously) writing code as-if one were targeting
a modulo-scheduled VLIW seems to help with x86-64 as well, even if
x86-64 has nowhere near enough registers to benefit here (it is almost
as-if x86-64 has a mechanism in place to cheapen the cost of stack
spills and reloads).

In my case, I had instead used 64 GPRs (from the RV64G POV, it is just
the X and F register spaces glued together). Where 64 is mostly enough
to competently modulo-schedule things and not run out of registers.

Though, it is only some kinds of code that can benefit from the power of
64 GPRs.

But, yeah, in any case, I guess the main issue is that NPOT loads/stores
would suck here in the absence of dedicated CPU instructions (in a
similar way to how much it hurts by RV64G lacking indexed-load/store;
where array operations are often very common in the types of code one
might want to optimize via modulo scheduling the loop).

But, you don't really want to add NPOT Load/Store instructions either,
because this more just offloads the pain onto the CPU.

...

If the possibilities above 64 bits were less ambitious (say i128 and
i256), then the concept might be stretched to cover both. But not when
when you can also have i1234567.

It would be having a GETBITS macro, which is not limited to a 1- to 63-
bit bitfield of a u64 value, but could return a slice of an arbitrarily large array.

I added some Verilog style notation, which can in premise be used for
large _BitInts. However this case is untested and very likely runs into
an "implementation hole" for types larger than 128 bits.

I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.

I don't recall any such claim.� Do you have a citation (other than
the FPGA-specific wording in N2709)?

This is where it came up in this thread:

On 23/11/2025 11:46, Philipp Klaus Krause wrote:

Am 22.10.25 um 14:45 schrieb Thiago Adams:

Is anyone using or planning to use this new C23 feature?
What could be the motivation?

Saving memory by using the smallest multiple-of-8 N that will do. Also being able to use bit-fields wider than int.

Saving memory for two reasons:

* On small embedded systems where there is very little memory
* For code that needs to be very fast on big systems to make data structures fit into cache

Although this doesn't go as far as using odd bit-sizes: it would mean
using sizes like 24, 40, 48, and 56 bits instead of 32 or 64 bits.

The savings would be sparse.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Mon Nov 24 21:26:53 2025

On 24/11/2025 19:35, bart wrote:

On 24/11/2025 14:41, David Brown wrote:

On 24/11/2025 13:31, bart wrote:

That's all up to the implementation.

You are worrying about completely negligible things here.

Is it that negligible? That's easy to say when you're not doing the implementing!

Of course I am not implementing it. As always with features in C, no
one is particularly bothered about how much effort is needed by the implementers. The prime concern is always the compiler users, not the compiler writers.

However it may impact on the size and performance of code.

The impact of an extra mask operation when you are handling 6 (IIRC)
chunks of 64-bit data is not going to give a very significant effect on
the size or performance of the code.

And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)

I had taken your "ridiculous" comment to be part of your complaint
that "multiplying even two one-million-bit types could overflow".� But
those statements are independent, then only the first is silly - of
course arithmetic on any finite sized type can overflow unless
specifically limited (such as by wrapping behaviour for unsigned
types).� I agree that huge fixed-size integer types are not useful,
though I am not sure where the ideal limit lies.

You don't think it strange that C doesn't even have a 128-bit type yet
(it only barely has width-specific 64-bit ones).

How do you that I think that, from what I wrote? You are just making
stuff up again.

I think a 128-bit type can be useful. Many C compilers support one, and
now the standard supports one too. It's called "_BitInt(128)", and you
can expect it to perform exactly like __int128 or whatever
compiler-specific 128-bit types you might have in a given tool.

There is just the poor gnu extension where 128-bit integers didn't have
a literal form, and there was no way to print such values.

How many times have you felt the need to write a 128-bit literal? And
how many times has that literal been in decimal (it's not difficult to
put together a 128-bit value from two 64-bit values)? You really are
making a mountain out of a molehill here.

But now there is this huge leap, not only to 128/256/512/1024 bits, but
to conceivably millions, plus the ability to specify any weird type you like, like 182 bits (eg. somebody makes a typo for _BitInt(128), but
they silently get a viable type that happens to be a little less efficient!).

And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,
types with no more than a simple typedef if you don't like the names. I
can't see your problem here.

So, 20 years of having 64-bit processors with little or no support for
even double-word types, and now there is this explosion in capabilities.

Or, are literals and print facilities for these new types still missing?

Personally I think they should have got the basics right first, like a decent 128-bit type, proper literals, and ways to print.

This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
allocated on the stack?). A poorly suited, hard-to-implement feature.

You are joking, right? How is dealing with a _BitInt(1000000) any more difficult than dealing with a "struct { uint64_t chunks[125000]; }" ?

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Mon Nov 24 22:27:10 2025

On 24/11/2025 20:26, David Brown wrote:

On 24/11/2025 19:35, bart wrote:

There is just the poor gnu extension where 128-bit integers didn't
have a literal form, and there was no way to print such values.

How many times have you felt the need to write a 128-bit literal?� And
how many times has that literal been in decimal

I don't think there were hex literals either.

(it's not difficult to
put together a 128-bit value from two 64-bit values)?� You really are
making a mountain out of a molehill here.

Well, it seems that such literals now exist (with 'wb' suffix). So I
guess somebody other than you decided that feature WAS worth adding!

But you can't as yet print out such values; I guess you can't 'scanf'
them either. These are necessary to perform I/O on such data from/to
text files.

I must say you have a very laidback attibute to language design:

"Let's add this 128-bit type, but let's not bother providing a way to
enter such values, or add any facilities to print them out. How often
would somebody need to do that anyway? But if they really /have/ to,
then there are plenty of hoops they can jump through to achieve it!"

(In my implementation of 128-bit types, from 2021, I allowed full
128-bit decimal, hex and binary literals, and they could be printed in
any base.

But they weren't used enough and were dropped, in favour of an unlimited precision type in my other language.

On interesting use-case for literals was short-strings; 128 bits allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I think C is
still stuck at one, or 4 if you're lucky.)

But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a
little less efficient!).

And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,

And 821 bits. This is what I don't get. Why is THAT so important?

Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?

If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.

For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-based
types like Ada, or not at all.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 16:46:32 2025

bart <bc@freeuk.com> writes:

On 24/11/2025 13:33, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
could actually be practically implemented, with a few restrictions,
and could save a lot of memory.

No, they couldn't. Array indexing is defined in terms of pointer
arithmetic, and you can't have a pointer to something smaller than one
byte.

The restrictions I mentioned were to do with pointers to individual bits.

Right. C doesn't have pointers to individual bits.

It is possible that operations such as:

x = A[i]
A[i] = x

can be well defined when A is an array of 1/2/4-bit values, even if
expressed like this:

*(A + i)

Not in C as it's currently defined.

But this would have to be indivisible when A is such an array: only
the whole thing is valid, not (A + i) by itself, or A by itself; you'd
need &A.

This would need a small tweak to the language, but that is nothing
compared to supporting (i3783467 * i999 / i3) >> i17.

It would hardly be a "small tweak".

I can imagine some future version of C adding support for indexing
packed arrays, but I don't think it would have been worthwhile
just so that large arrays of small _BitInts can be stored more
efficiently. Doing that on ordinary hardware was not part of the
rationale for C23's bit-precise integer types, and I haven't seen
any such proposals for C2y.

And assuming that "(i3783467 * i999 / i3) >> i17" means what I think
it means, huge bit-precise integers are already standard (they're
part of C23), and the work of implementing them is largely done in
gcc and llvm/clang.

But I write a script in my dynamic language,

[...]

C can only get down to that u8 figure (100MB) using its 'char'
type. Even 'bool' doesn't make it smaller (presumably for the reasons
you mentioned).

You are forced to emulate such arrays in user-code using shifts and masks.

Yes. C doesn't support packed arrays, and is unlikely to do so
any time in the near future. C23 added a feature that doesn't do
everything you want it to do. You can of course implement such
things in a library, but the syntax for using it would probably be
a bit ugly.

And in fact at least one person has done so. (I've known about
this for about a minute, so I have no comment other than that
it exists.)

https://github.com/gpakosz/PackedArray/

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 17:00:17 2025

BGB <cr88192@gmail.com> writes:

On 11/24/2025 8:21 AM, bart wrote:

On 24/11/2025 13:35, Keith Thompson wrote:

bart <bc@freeuk.com> writes:
[...]

There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.

As far as I know, the standard makes no such distinction.

*I* am making the distinction. From an implementation point of view
(and assuming 64-bit hardware), they are quite different.
And that leads to different kinds of language features.

As noted, as I understand it there is no reason for the storage to be
smaller than the next power-of-2 size.

Really?

Rounding up to 8, 16, 32, or the next multiple of 64 bits seems
reasonable. Rounding 1025 bits up to 2048 does not (and is not
the current gcc and llvm/clang implementations do).

What advantage does rounding 1025 up to 2048 give you over rounding
it up to 1088 (17*64)? It seems to me that the only real difference
is in how many times a loop has to iterate.

My understanding is that power-of-two sizes lose their advantages
beyond about 64 or 128 bits. Am I mistaken?

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 17:23:47 2025

David Brown <david.brown@hesbynett.no> writes:

On 24/11/2025 12:17, bart wrote:

On 24/11/2025 09:29, David Brown wrote:

[...]

So if you want the full range of values of x and y to be usable here,
then NM would have to be N * M. But you would also need a cast, such
as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want
to multiply two 32-bit ints as a 64-bit operation.

N + M, not N * M.

Alternatively, you might know more about the values that might be in x
and y, and have a smaller NM (though you still need a cast if it is
greater than both N and M). Or you might be using unsigned types and
want the wrapping / masking behaviour.

The point was not what size NM is, but that it is known to the
compiler at the time of writing the expression.

It sounds like the max precision you get will be the latter.

can be implemented as something like :

��__bit_int_signed_mult(NM, (unsigned char *) &z,
�� N, (const unsigned char *) &x,
�� M, (const unsigned char *) &y);

How would you write a generic user function that operates on any
size BitInt? For example:
�� _BitInt(?) bi_square(_BitInt(?));

You can't. _BitInt(N) and _BitInt(M) are distinct types, for
differing N and M. You can't write a generic user function in C that implements "T foo(T)" where T can be "int", "short", "long int", or
other types. C simply does not have type-generic functions.

Sort of. C23 defines the term "generic function" (N3220 7.26.5.1,
string search functions). For example, strchr() can take a const void* argument and return a const void* result, or it can take a void*
argument and return a void* result. (C++ does this by having two
overloaded strchr() functions.)

These "generic functions" are (almost certainly) implemented as macros
that use _Generic. If you bypass the macro definition, you get the
function that can take a const char* and return a char*.

So C doesn't have type-generic functions, but it does have feature that
let you implement things that act like type-generic functions.

You /can/ write generic macros that handle different _BitInt types,
but that would quickly get painful given that you'd need a case for
each size of _BitInt you wanted for the _Generic macro.

Indeed. A _Generic selection that handles all the ordinary non-extended integer types needs to handle 12 cases if I'm counting correctly, which
is feasible. But the addition of bit-precise types adds
BITINT_MAXWIDTH*2-1 new distinct predefined types, and a generic
selection would need one case for each.

However, you could have a function that takes a void*, a size, and a
width as arguments and operates on a _BitInt(?) or unsigned _BitInt(?)
type. In fact, gcc has internal functions like that for multiplication
and division. (You mentioned something like that in text that I've
snipped.)

[...]

This assumes BitInts are passed and returned by value, but even
using BitInt* wouldn't help.

Yes, they are passed around as values - they are integer types and are
passed around like other integer types. (Implementations may use
stack blocks and pointers for passing the values around if they are
too big for registers, just as implementations can do with any value
type. That's an implementation detail - logically, they are passed and returned as values.)

Yes, and in general a _BitInt argument has to be copied to the
corresponding parameter, since a change to the parameter can't affect
the value of the argument.

But passing huge _BitInts by value is no more problematic than passing
huge structs by value.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 18:03:00 2025

bart <bc@freeuk.com> writes:

On 24/11/2025 14:41, David Brown wrote:

On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.

Is it that negligible? That's easy to say when you're not doing the implementing! However it may impact on the size and performance of
code.

You're right, it's easy to say when I'm not doing the implementing.
Which I'm not.

The maintainers of gcc and llvm/clang have done that for me, so I don't
have to worry about it.

Are you planning to implement bit-precise integer types yourself? I
don't think you've said so in this thread. If you are, you have at
least two existing implementations you can look at for ideas.

[...]

You don't think it strange that C doesn't even have a 128-bit type yet
(it only barely has width-specific 64-bit ones).

C doesn't *require* 128-bit types. It certainly allows them. A C90 implementation could in principle have had 128-bit long, and a C99 or
later implementation can have 128-bit long and/or an extended 128-bit
type.

As of C99 or C11, *requiring* support for 128-bit integers probably
wouldn't have been reasonable.

Please distinguish between the language and implementations.

There is just the poor gnu extension where 128-bit integers didn't
have a literal form, and there was no way to print such values.

But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a
little less efficient!).

Yes. With the addition of bit-precise types, gcc's __int128 might be
obsolete (though there's bound to be existing code that depends on it).
I can imagine that gcc might make __int128 an alias for _BitInt(128).

So, 20 years of having 64-bit processors with little or no support for
even double-word types, and now there is this explosion in
capabilities.

Those 20 years are in the past. Not much we can do about that now.

Seriously, is your problem with _BitInt types that they're too flexible?
What advantage do you expect from imposing additional restrictions on
a feature that has already been defined and implemented?

Or, are literals and print facilities for these new types still missing?

C23 has literals for bit-precise integer types, using a "wb" or "WB"
suffix. That's something you could have found out by reading the N3220
C23 draft, or by reading one of my posts earlier in this thread. But I
don't mind answering questions.

There doesn't seem to be printf/scanf support for bit-precise integer
types, which is a little disappointing. But since they're all distinct
types, it could be difficult to define.

Personally I think they should have got the basics right first, like a
decent 128-bit type, proper literals, and ways to print.

No language changes would be necessary to support 128-bit integer types. Implementations are free to support [u]int128_t and/or to make long long
128 bits.

It would have been nice if gcc's __int128 had been developed further,
but for whatever reason that didn't happen. (Maybe there wasn't enough demand.)

This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
allocated on the stack?). A poorly suited, hard-to-implement feature.

It doesn't look particularly like VLAs to me. The width is a
compile-time constant. Allocating large _BitInt objects is no
harder or easier than allocating large struct objects.

Here's an idea. Rather than asserting that _BitInt(1'000'000)
is silly and obviously useless, try *asking* how it's useful.
I personally don't know what I'd do with a million-bit integer,
but maybe somebody out there has a valid use for it. Meanwhile,
its existence doesn't bother me.

My guess is that once you've implemented integers wider than 128
or 256 bits, million-bit integers aren't much extra effort.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Mon Nov 24 18:10:13 2025

bart <bc@freeuk.com> writes:

On 24/11/2025 20:26, David Brown wrote:

[...]

And this huge leap also lets you have 128-bit, 256-bit, 512-bit,
etc.,

And 821 bits. This is what I don't get. Why is THAT so important?

Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?

Because a more general definition, allowing all widths up to some
maximum, is *simpler* than a definition with arbitrary restrictions.
And since it's already been implemented, what the heck are you
complaining about?

If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.

You do know that u8, u16, et al are not C types, right? (Yes, I know
what you mean by those names.)

For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full
range-based types like Ada, or not at all.

Great, so don't use them.

If the ISO C committee withdrew the current official 2023 standard
document and replaced it with one that imposes restrictions on _BitInt
types, and gcc and clang withdrew their implementations, would that
satisfy you?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Mon Nov 24 20:10:54 2025

On 11/24/2025 7:00 PM, Keith Thompson wrote:

BGB <cr88192@gmail.com> writes:

On 11/24/2025 8:21 AM, bart wrote:

On 24/11/2025 13:35, Keith Thompson wrote:

bart <bc@freeuk.com> writes:
[...]

There are two kinds of BitInts: those smaller than 64 bits; and those >>>>> larger than 64 bits, sometimes /much/ larger.

As far as I know, the standard makes no such distinction.

*I* am making the distinction. From an implementation point of view
(and assuming 64-bit hardware), they are quite different.
And that leads to different kinds of language features.

As noted, as I understand it there is no reason for the storage to be
smaller than the next power-of-2 size.

Really?

Rounding up to 8, 16, 32, or the next multiple of 64 bits seems
reasonable. Rounding 1025 bits up to 2048 does not (and is not
the current gcc and llvm/clang implementations do).

Granted, I meant for smaller sizes (below 128 bits).

BGBCC rounds larger sizes up to the next multiple of 128 bits.

However, 384 bits is the first size where rounding up to a multiple of
128 bits differs from the next power of 2.

What advantage does rounding 1025 up to 2048 give you over rounding
it up to 1088 (17*64)? It seems to me that the only real difference
is in how many times a loop has to iterate.

My understanding is that power-of-two sizes lose their advantages
beyond about 64 or 128 bits. Am I mistaken?

[...]

I mentioned a few messages up that this was not the scheme I am using.

So:
1.. 8 => 8
9.. 16 => 16
17.. 32 => 32
33.. 64 => 64
65..128 => 128
129..256 => 256
256..384 => 384 (first point of divergence)
385..512 => 512
513..640 => 640 (second point of divergence)
641..768 => 768 (third point of divergence)
...

But, alas, reason for keeping small sizes power-of-2 is to optimize for
memory loads/stores.

Reason for multiples of 128 bits for larger sizes was this was the most efficient option for the target ISA (ans also less complicated for the
support code).

Though, if optimizing for RISC-V, a case could be made for using the
next multiple of 64 bits instead.

...

While theoretically possible, multiples of a smaller size would end up
being a worse option in terms of performance than just "wasting" a few
extra bytes.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tue Nov 25 07:56:30 2025

On 25/11/2025 02:23, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 24/11/2025 12:17, bart wrote:

On 24/11/2025 09:29, David Brown wrote:

[...]

So if you want the full range of values of x and y to be usable here,
then NM would have to be N * M. But you would also need a cast, such
as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want
to multiply two 32-bit ints as a 64-bit operation.

N + M, not N * M.

Of course. (I /really/ should have picked a different third identifier...)

Alternatively, you might know more about the values that might be in x
and y, and have a smaller NM (though you still need a cast if it is
greater than both N and M). Or you might be using unsigned types and
want the wrapping / masking behaviour.

The point was not what size NM is, but that it is known to the
compiler at the time of writing the expression.

It sounds like the max precision you get will be the latter.

can be implemented as something like :

��__bit_int_signed_mult(NM, (unsigned char *) &z,
�� N, (const unsigned char *) &x,
�� M, (const unsigned char *) &y);

How would you write a generic user function that operates on any
size BitInt? For example:
�� _BitInt(?) bi_square(_BitInt(?));

You can't. _BitInt(N) and _BitInt(M) are distinct types, for
differing N and M. You can't write a generic user function in C that
implements "T foo(T)" where T can be "int", "short", "long int", or
other types. C simply does not have type-generic functions.

Sort of. C23 defines the term "generic function" (N3220 7.26.5.1,
string search functions). For example, strchr() can take a const void* argument and return a const void* result, or it can take a void*
argument and return a void* result. (C++ does this by having two
overloaded strchr() functions.)

These "generic functions" are (almost certainly) implemented as macros
that use _Generic. If you bypass the macro definition, you get the
function that can take a const char* and return a char*.

So C doesn't have type-generic functions, but it does have feature that
let you implement things that act like type-generic functions.

Yes. It has also had type-generic maths functions for a good while.
But it doesn't have a general generic function mechanism other than
_Generic macros.

You /can/ write generic macros that handle different _BitInt types,
but that would quickly get painful given that you'd need a case for
each size of _BitInt you wanted for the _Generic macro.

Indeed. A _Generic selection that handles all the ordinary non-extended integer types needs to handle 12 cases if I'm counting correctly, which
is feasible. But the addition of bit-precise types adds
BITINT_MAXWIDTH*2-1 new distinct predefined types, and a generic
selection would need one case for each.

However, you could have a function that takes a void*, a size, and a
width as arguments and operates on a _BitInt(?) or unsigned _BitInt(?)
type. In fact, gcc has internal functions like that for multiplication
and division. (You mentioned something like that in text that I've
snipped.)

You could, yes. I started thinking about how you might make one that
didn't require the user to manually include the bitcount of the _BitInt
to use it, but I couldn't figure out a good way. You can get a start,
from using sizeof on the _BitInt parameter, but I can't think of a way
to get bitcount exactly (even using _Generic's).

[...]

This assumes BitInts are passed and returned by value, but even
using BitInt* wouldn't help.

Yes, they are passed around as values - they are integer types and are
passed around like other integer types. (Implementations may use
stack blocks and pointers for passing the values around if they are
too big for registers, just as implementations can do with any value
type. That's an implementation detail - logically, they are passed and
returned as values.)

Yes, and in general a _BitInt argument has to be copied to the
corresponding parameter, since a change to the parameter can't affect
the value of the argument.

The workings of C parameter passing were unfortunately cut in stone
before anyone thought of passing large types as parameters. In
hindsight it's easy to see it could have been better to say that
function parameters are implicitly "const" and attempting to modify them
is UB - just make a local copy if you want to make a change. But it's
too late now!

But passing huge _BitInts by value is no more problematic than passing
huge structs by value.

Exactly, yes.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Tue Nov 25 11:38:32 2025

On 25/11/2025 02:03, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 24/11/2025 14:41, David Brown wrote:

On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.

Is it that negligible? That's easy to say when you're not doing the
implementing! However it may impact on the size and performance of
code.

You're right, it's easy to say when I'm not doing the implementing.
Which I'm not.

The maintainers of gcc and llvm/clang have done that for me, so I don't
have to worry about it.

Are you planning to implement bit-precise integer types yourself? I
don't think you've said so in this thread. If you are, you have at
least two existing implementations you can look at for ideas.

No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits, and played with 1/2/4 bits, but my view is that above this range, using
exact bit-sizes is the wrong way to go.

While for odd sizes up to 64 bits, bitfields are more apt than employing
the type system.

Here's an idea. Rather than asserting that _BitInt(1'000'000)
is silly and obviously useless, try *asking* how it's useful.
I personally don't know what I'd do with a million-bit integer,
but maybe somebody out there has a valid use for it. Meanwhile,
its existence doesn't bother me.

Again, my view is that types like _BitInt(123456) (could they have made
it any more fiddly to type?!) is the same mistake that early Pascal made
with arrays.

It is common that an N-array of T and an M-array of T are not
compatible, but usually there are ways to deal generically with both.

My guess is that once you've implemented integers wider than 128
or 256 bits, million-bit integers aren't much extra effort.

I've implemented 128-bit arithmetic, and have seen some scary-looking C
code that implemented 256-bit arithmetic. Neither of those would scale
to N-bits where N can be arbitrary large /and/ might not be a multiple
of either 64 or 8.

You would need pretty much the same algorithms as used for arbitrary precision. Those usually require N to be some multiple of 'limb' size.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Tue Nov 25 14:12:07 2025

On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:

No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.

Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?

While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.

int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}

Nice, is not it?
Doing the same with bit fields is possible, but less obvious and less convenient. Also it potentially can play havoc with compiler that took
strict aliasing rules more seriously than they deserve.

int sign_extend12(unsigned x)
{
struct bar {
signed a: 12;
};
return ((struct bar*)&x)->a;;
}

Doing the same with shifts is almost as convenient as with _BitInt and
it works great on all popular compilers, but according to wording of C
Standard it is Undefined Behavior.

int sign_extend12(unsigned x)
{
return (int32_t)((uint32_t)x << 20) >> 20;
}

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Tue Nov 25 14:57:17 2025

On 25/11/2025 12:12, Michael S wrote:

On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:

No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.

Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?

I can invent anything I like. I've looked at such things many times, and
came to the conclusion that using types is the wrong approach, certainly
for this level of language.

(Yes, long ago I allowed type denotations such as:

int*N a a has N bytes or N*8 bits (from Fortran)
int:N b b has N bits

Then I realised I was never going to use anything other than some
power-of-two size of 8 bits or more, for discrete variables.)

While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.

int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}

Nice, is not it?

By 'bitfields' I mean bitfields within structs, but also bitfield
operators whch work on any integer values.

Bitfields are nearly always unsigned in my projects, so I don't have an
exact equivalent to this example.

But a solution not using types would look like this:

y := x.[0..11] # get first 12 bits
y := x.[12..23] # next 12 bits

x.[24..35] := y # set next 12 bits (x, y are 64 bits!)

y := x.[0..i] # get first i+1 bits

To optionally interpret a bitfield extraction as signed, I'd need to
think up some way of denoting that. For bitfield insertion it doesn't
matter.

Your example is interesting but rather limited; while it does deal with
a signed field:

* That field can only start at bit zero, without extra manipulations

* The size is fixed at 12 (if you decide to change the field size, or
you want it as a constant parameter somewhere, it starts getting
awkward)

* If you are dealing with a range of bitfield sizes, you will need a
dedicated function, or somehow enumerate all possibilities using
_Generic.

* It's not clear how bitfield insertion would work, whether you'd still
employ a _BitInt type, and/or just revert to those shifts and masks.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Tue Nov 25 18:29:43 2025

On Tue, 25 Nov 2025 14:57:17 +0000
bart <bc@freeuk.com> wrote:

On 25/11/2025 12:12, Michael S wrote:

On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:

No, apart from the usual set of 8/16/32/64 bits. I've done 128
bits, and played with 1/2/4 bits, but my view is that above this
range, using exact bit-sizes is the wrong way to go.

Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?

I can invent anything I like. I've looked at such things many times,
and came to the conclusion that using types is the wrong approach,
certainly for this level of language.

(Yes, long ago I allowed type denotations such as:

int*N a a has N bytes or N*8 bits (from Fortran)
int:N b b has N bits

Then I realised I was never going to use anything other than some power-of-two size of 8 bits or more, for discrete variables.)

While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.

int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}

Nice, is not it?

By 'bitfields' I mean bitfields within structs, but also bitfield
operators whch work on any integer values.

Bitfields are nearly always unsigned in my projects, so I don't have
an exact equivalent to this example.

But a solution not using types would look like this:

y := x.[0..11] # get first 12 bits
y := x.[12..23] # next 12 bits

x.[24..35] := y # set next 12 bits (x, y are 64 bits!)

y := x.[0..i] # get first i+1 bits

To optionally interpret a bitfield extraction as signed, I'd need to
think up some way of denoting that. For bitfield insertion it doesn't matter.

Your example is interesting but rather limited; while it does deal
with a signed field:

* That field can only start at bit zero, without extra manipulations

* The size is fixed at 12 (if you decide to change the field size, or
you want it as a constant parameter somewhere, it starts getting
awkward)

* If you are dealing with a range of bitfield sizes, you will need a
dedicated function, or somehow enumerate all possibilities using
_Generic.

* It's not clear how bitfield insertion would work, whether you'd
still employ a _BitInt type, and/or just revert to those shifts and
masks.

My example is from real world. Dealing with A-to-D converters. I need
sign extension of that sort quite often.
* I don't recollect needing to sign-extend field that does not start at
offset zero, but if it happens then logical left shift [before cast] is
an obvious and natural solution.
* My ADCs have fixed # of bits. It does not change in the middle of
project. And even if it does then a new value is also fixed, so
constant (enum or define) works fine.
Same for your other points - I don't recollect that I neeed something
like that sufficiently often to ... well... recollect.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Tue Nov 25 18:33:30 2025

On 25/11/2025 16:29, Michael S wrote:

On Tue, 25 Nov 2025 14:57:17 +0000
bart <bc@freeuk.com> wrote:

On 25/11/2025 12:12, Michael S wrote:

On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:

No, apart from the usual set of 8/16/32/64 bits. I've done 128
bits, and played with 1/2/4 bits, but my view is that above this
range, using exact bit-sizes is the wrong way to go.

Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?

I can invent anything I like. I've looked at such things many times,
and came to the conclusion that using types is the wrong approach,
certainly for this level of language.

(Yes, long ago I allowed type denotations such as:

int*N a a has N bytes or N*8 bits (from Fortran)
int:N b b has N bits

Then I realised I was never going to use anything other than some
power-of-two size of 8 bits or more, for discrete variables.)

While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.

int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}

Nice, is not it?

By 'bitfields' I mean bitfields within structs, but also bitfield
operators whch work on any integer values.

Bitfields are nearly always unsigned in my projects, so I don't have
an exact equivalent to this example.

But a solution not using types would look like this:

y := x.[0..11] # get first 12 bits
y := x.[12..23] # next 12 bits

x.[24..35] := y # set next 12 bits (x, y are 64 bits!)

y := x.[0..i] # get first i+1 bits

To optionally interpret a bitfield extraction as signed, I'd need to
think up some way of denoting that. For bitfield insertion it doesn't
matter.

Your example is interesting but rather limited; while it does deal
with a signed field:

* That field can only start at bit zero, without extra manipulations

* The size is fixed at 12 (if you decide to change the field size, or
you want it as a constant parameter somewhere, it starts getting
awkward)

* If you are dealing with a range of bitfield sizes, you will need a
dedicated function, or somehow enumerate all possibilities using
_Generic.

* It's not clear how bitfield insertion would work, whether you'd
still employ a _BitInt type, and/or just revert to those shifts and
masks.

My example is from real world. Dealing with A-to-D converters. I need
sign extension of that sort quite often.

OK, I've looked at datasheets for two 12-ADCs. Both had a choice of
analog inputs, and in both the digital value was clocked out serially
(one with the input channel number as 4 extra bits).

The first apparently had a pin-selectable signed/unsigned mode; the
second didn't mention that, but did mention 000h and FFFh limits which
suggest unsigned.

But in any case, some extra circuitry would be needed to get the 12
parallel bits before they can be input via a 16-bit read. Here, you
might just tie D11-D15 together, so that a twos complement 12-bit value becomes a 16-bit one.

Or maybe the CPU has its own serial input pin. The point is, the whole
thing is a rather trivial matter, and it can be taken care of in several places.

I don't know the details in your case, but if BitInt helps you save a
couple of lines of code, then fine. Although I don't think this feature
would be worth adding just for that purpose.

(The only ADCs I've used were 4-bit (homemade) and 8-bit, both giving
unsigned data in parallel, used for frame-grabbing video circuits so
read directly into memory rather than via an explicit memory- or
port-read instruction.)

* I don't recollect needing to sign-extend field that does not start

offset zero,

So what's in the rest of the 32-bit field, garbage?

Same for your other points - I don't recollect that I neeed something
like that sufficiently often to ... well... recollect.

Yours is one of a thousand possible applications. Everyone will have
different needs. Maybe someone else will have a 16 or 32-bit value with assorted bitfields of different widths.

Then maybe C bitfields could be used, but a bigger problem with those is
poor control over layout, which is anyway implementation-defined. (Mine
of course don't have that problem!)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tue Nov 25 21:25:01 2025

On 24/11/2025 23:27, bart wrote:

On 24/11/2025 20:26, David Brown wrote:

On 24/11/2025 19:35, bart wrote:

There is just the poor gnu extension where 128-bit integers didn't
have a literal form, and there was no way to print such values.

How many times have you felt the need to write a 128-bit literal?� And
how many times has that literal been in decimal

I don't think there were hex literals either.

(it's not difficult to put together a 128-bit value from two 64-bit
values)?� You really are making a mountain out of a molehill here.

Well, it seems that such literals now exist (with 'wb' suffix). So I
guess somebody other than you decided that feature WAS worth adding!

But you can't as yet print out such values; I guess you can't 'scanf'
them either. These are necessary to perform I/O on such data from/to
text files.

I must say you have a very laidback attibute to language design:

"Let's add this 128-bit type, but let's not bother providing a way to
enter such values, or add any facilities to print them out. How often
would somebody need to do that anyway? But if they really /have/ to,
then there are plenty of hoops they can jump through to achieve it!"

(In my implementation of 128-bit types, from 2021, I allowed full 128-
bit decimal, hex and binary literals, and they could be printed in any
base.

But they weren't used enough and were dropped, in favour of an unlimited precision type in my other language.

On interesting use-case for literals was short-strings; 128 bits allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I think C is still stuck at one, or 4 if you're lucky.)

I have no idea or opinion on why /you/ might want 128-bit or larger
integer types. I believe there is very little use for "normal" numbers
- things you might want to write as literals, calculate with, and read
or write - that won't fit perfectly well within 64 bit types, and would
not be better served by arbitrary sized integers. Arbitrary sized
integers are a very different kettle of fish from large fixed-size
integers, and are not something that would fit in the C language - they
need a library.

I can tell you why /I/ might find larger integer types useful. They
include :

* 128-bit for IPv6 address. These use a variety of styles for input and display, and thus would use specialised routines, not simple literals or printf-style IO.

* Big units for passing data around with larger memory transfers, using
SIMD registers. IO is irrelevant here.

* Cryptography. IO is irrelevant here. But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048, 3072,
4096, 7680, 8096 bits. There may be more common sizes - I'm just
thinking of DES, 3DES, AES, SHA, ECC and RSA.

Smaller sizes can be useful for holding RGB pixel values, audio data, etc.

In none of these cases are bit-precise integer types essential. People
have been doing cryptography for a long time without them. But they can
be convenient, and help people write code that is simpler, clearer, or
more directly expresses their intent. The only specific additional
power you get from these is that you can do arithmetic on bigger types
without having to write the code manually. I don't know if compilers currently do a good enough job for that to be suitable for
multiplication and modulo of larger integers (addition is easy, but for
big sizes, smarter multiplication techniques can be a significant
performance gain).

But those are just the uses /I/ see for them, in things /I/ work with.
(I might also use them for FPGA programming in the future, but I'm not
doing that at the moment.) However, unlike some people, I don't think
the C language should pick features based purely on what I personally
want to use, or what would be even sillier, what I personally think is
easy to implement in a compiler. Other people will have other uses for different sizes.

But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be
a little less efficient!).

And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,

And 821 bits. This is what I don't get. Why is THAT so important?

Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?

The folks behind the proposal provided both. The fact that you can
write _BitInt(821) does not in any way hinder use of _BitInt(256). I
really don't get your problem here.

If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.

/You/ might not have wanted them, but other people would.

For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-based types like Ada, or not at all.

Fortunately for the C world, you are not on the C committee - it doesn't matter if you can't see beyond the end of your nose.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tue Nov 25 21:54:20 2025

On 25/11/2025 13:12, Michael S wrote:

On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:

No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.

Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?

While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.

int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}

Nice, is not it?
Doing the same with bit fields is possible, but less obvious and less convenient. Also it potentially can play havoc with compiler that took
strict aliasing rules more seriously than they deserve.

int sign_extend12(unsigned x)
{
struct bar {
signed a: 12;
};
return ((struct bar*)&x)->a;;
}

int sign_extend12(unsigned x)
{
union {
struct { unsigned u : 12; };
struct { signed s : 12; };
} u = {{ x }};
return u.s;
}

No need for messing about with aliases - type-punning unions are safe
and efficient (on good compilers).

But the _BitInt version is definitely neater. I can see myself using _BitInt(12) and similar sizes for things like values read from hardware sensors of different resolutions.

(The code for all three is the same with gcc on x86 or arm64 -
unfortunately, gcc does not yet support _BitInt on many targets.)

Doing the same with shifts is almost as convenient as with _BitInt and
it works great on all popular compilers, but according to wording of C Standard it is Undefined Behavior.

int sign_extend12(unsigned x)
{
return (int32_t)((uint32_t)x << 20) >> 20;
}

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Tue Nov 25 23:11:58 2025

On Tue, 25 Nov 2025 14:12:07 +0200
Michael S <already5chosen@yahoo.com> wrote:

Doing the same with shifts is almost as convenient as with _BitInt and
it works great on all popular compilers, but according to wording of C Standard it is Undefined Behavior.

int sign_extend12(unsigned x)
{
return (int32_t)((uint32_t)x << 20) >> 20;
}

Before someone corrects me, I'd correct myself: the code above does not
contain Undefined Behavior. It's merely Implementation Defined Behavior.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tue Nov 25 13:42:37 2025

David Brown <david.brown@hesbynett.no> writes:
[...]

But the _BitInt version is definitely neater. I can see myself using _BitInt(12) and similar sizes for things like values read from
hardware sensors of different resolutions.

(The code for all three is the same with gcc on x86 or arm64 -
unfortunately, gcc does not yet support _BitInt on many targets.)

[...]

Is support for _BitInt limited by target or by version?

It looks like _BitInt support was introduced in gcc 14.1.0. You might
have older versions of gcc on other platforms.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Tue Nov 25 21:58:02 2025

On 25/11/2025 20:25, David Brown wrote:

On 24/11/2025 23:27, bart wrote:

On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I
think C is still stuck at one, or 4 if you're lucky.)

I have no idea or opinion on why /you/ might want 128-bit or larger
integer types.� I believe there is very little use for "normal" numbers
- things you might want to write as literals, calculate with, and read
or write - that won't fit perfectly well within 64 bit types, and would
not be better served by arbitrary sized integers.

� Arbitrary sized
integers are a very different kettle of fish from large fixed-size
integers, and are not something that would fit in the C language - they
need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000 bits.

Maybe the latter is autoranging, and might give a 200,000-bit result.

Presumably the former doesn't use inline code, so it would be surprising
if each distinct size of BitInt had dedicated sets of routines for this.
So it sounds like they have to use a generic library anyway.

And sure enough, gcc-generated code contains stuff like this:

mov r8, rcx
mov edx, 50000 # (BitInt(50000)
mov rcx, rax
call __mulbitint3

So, BitInts are different in that they /don't/ need a library?

I can tell you why /I/ might find larger integer types useful.� They
include :

* 128-bit for IPv6 address.� These use a variety of styles for input and display, and thus would use specialised routines, not simple literals or printf-style IO.

So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.

* Big units for passing data around with larger memory transfers, using
SIMD registers.� IO is irrelevant here.

Structs and arrays again spring to mind if you just want an anonymous
data block. (I wonder why it has to be bit-precise for byte-addressed
memory?)

* Cryptography.� IO is irrelevant here.� But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048, 3072, 4096, 7680, 8096 bits.� There may be more common sizes - I'm just
thinking of DES, 3DES, AES, SHA, ECC and RSA.

And I'm again curious as to what /non-numeric/ use a 200,000-bit BitInt
might be put to, that is not better served by an array or struct.

Maybe bit-sets? But there are no special features for accessing
individual bits.

That BigInt() defaults to a signed integer (twos complement?), even for
very large sizes suggests that /numeric/ applications are a primary use.

Smaller sizes can be useful for holding RGB pixel values, audio data, etc.

Except that these are probably rounded up, to the next multiple of two.
So the benefit is minimal; it do something with those padding bits.

And 821 bits. This is what I don't get. Why is THAT so important?

Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?

The folks behind the proposal provided both.� The fact that you can
write _BitInt(821) does not in any way hinder use of _BitInt(256).� I
really don't get your problem here.

You've heard of 'code smell'? Well, this is the same, but for features.

I've been doing this stuff long enough to recognise when a feature is over-elaborate, over-specified and over-flexible. You need to know the
minimum you can get away with, not the maximum!

Let me guess, some committee members have been looking too long at how
C++ does things? That language is utterly incapable of creating anything
small and simple.

If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.

/You/ might not have wanted them, but other people would.

OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?

For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-
based types like Ada, or not at all.

Fortunately for the C world, you are not on the C committee - it doesn't matter if you can't see beyond the end of your nose.

Maybe unfortunately. C used to be a fairly simple language with a lot of baggage; now it's a much heftier one with a lot of baggage!

At least, I've been able to add to my collection of C types that
represent an 8-bit byte:

signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)

The last two are apparently incompatible with the char versions.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tue Nov 25 15:20:43 2025

bart <bc@freeuk.com> writes:

On 25/11/2025 20:25, David Brown wrote:

[...]

� Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000
bits.

It's not about the code that implements multiplication. In gcc, that's
done by calling a built-in function that can operate on arbitrary data
widths.

Think about memory management.

A _BitInt(128) object has a fixed size, like a struct. It can be
allocated locally ("on the stack"), passed to a function, returned
as a function result, used in expressions, etc. Likewise for
_BitInt(2048).

A hypothetical _BitInt(*) object would require an amount of storage
that varies with its current value. That storage would have to be
allocated using malloc() or equivalent, and deallocated using free()
or equivalent. C++ template classes with automatically invoked
constructors and destructors are great for that kind of thing.
C has no such mechanisms, and there's little support for adding
it just for this feature. (There are C container libraries.
I haven't used them, but they tend to require construction and
destruction to be explicit.)

Perhaps a future standard will provide a more flexible flavor of
_BitInt. It might allow the n in _BitInt(n) to be non-constant, or
empty, or "*", to denote an arbitrary-precision integer. But it's
hard to see how that could be done without adding other fundamental
features to the language. And a lot of people's response would be
that if you want C++, you know where to find it.

Similarly, C99 added complex types as a built-in language feature.
C++ added complex types as a template class, because C++ has language
features that support that kind of thing, including user-defined
literals.

If you can think of a way to add arbitrary-precision integers to C
without other radical changes to the language, let us know.

It could also be nice to be able to write code that deals with
multiple widths of _BitInt types, as we can do for arrays even
without VLAs. But C's treatment of arrays is messy, and I'm not
sure duplicating that mess for _BitInt types would be a great idea.
And I wouldn't want to lose the ability to pass _BitInt values
to functions.

[...]

So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.

It brings a 128-bit integer type with constants and straightforward
assignment, comparison, and arithmetic operators.

[...]

That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.

Yes, C23 requires two's-complement for signed integers. (It mandates two's-complement representation, not wraparound behavior; signed
overflow is still UB).

[...]

OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?

I don't know. The language allows 1-bit signed bit-fields, so
_BitInt(1) would make some sense, but the language requires N to
be at least 1 for unsigned _BitInt and 2 for signed _BitInt.

It doesn't bother me too much, since I'm unlikely to have a
use for signed _BitInt(1). But it's an arbitrary restriction.
(And I thought you liked arbitrary restrictions.)

[...]

At least, I've been able to add to my collection of C types that
represent an 8-bit byte:

signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)

The last two are apparently incompatible with the char versions.

You forgot plain char, int_least8_t, and uint_least8_t. And of
course the char types are CHAR_BIT bits, not necessarily 8 bits.

It's mildly interesting that unsigned _BitInt(8) gives you a way to
define an octet even on systems with CHAR_BIT > 8. But of course an
unsigned _BitInt(8) object will still have a size of CHAR_BIT bits.
(Again, saving space on ordinary hardware isn't part of the rationale
for _BitInt types.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Wed Nov 26 02:08:05 2025

On 25/11/2025 23:20, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 25/11/2025 20:25, David Brown wrote:

[...]

� Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000
bits.

It's not about the code that implements multiplication. In gcc, that's
done by calling a built-in function that can operate on arbitrary data widths.

Think about memory management.

Well, I was responding to a suggestion that BitInt support didn't need a library.

But memory management is a good point. Actual, variable-sized bigints
would be awkward in C if you want to use them in ordinary expressions.

Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.

Perhaps a future standard will provide a more flexible flavor of
_BitInt. It might allow the n in _BitInt(n) to be non-constant, or
empty, or "*", to denote an arbitrary-precision integer. But it's
hard to see how that could be done without adding other fundamental
features to the language. And a lot of people's response would be
that if you want C++, you know where to find it.

I think I would have responded better to BitInt if presented as a
'bit-set', effectively a fixed-size bit-array, but passed by value.
This is something that I'd considered myself at one time.

Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal bitsets.)

More significantly, an unbounded version could be passed by reference,
with an accompanying length (I could also use slices that have the
length) as happens with arrays in C.

Similarly, C99 added complex types as a built-in language feature.
C++ added complex types as a template class, because C++ has language features that support that kind of thing, including user-defined
literals.

If you can think of a way to add arbitrary-precision integers to C
without other radical changes to the language, let us know.

I have considered adding my actual arbitrary precision library to my
systems language. It would have been superfical (such types would not be nestable within other data structures), but would have been simpler to
use than function calls.

Some degree of automatic memory management would have been needed
(initialise locals on function entry, free on exit, deal with
intermediates), but not on the C++ scale due to the restrictions.

But I rejected that as being too high-level a feature, and my use-cases
more suitable for a scripting language.

It could also be nice to be able to write code that deals with
multiple widths of _BitInt types, as we can do for arrays even
without VLAs. But C's treatment of arrays is messy, and I'm not
sure duplicating that mess for _BitInt types would be a great idea.
And I wouldn't want to lose the ability to pass _BitInt values
to functions.

[...]

So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.

It brings a 128-bit integer type with constants and straightforward assignment, comparison, and arithmetic operators.

I was commenting on the ipv6 example, where structs give you that
already, except arithmetic which makes little sense.

[...]

That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.

Yes, C23 requires two's-complement for signed integers. (It mandates two's-complement representation, not wraparound behavior; signed
overflow is still UB).

Even though it will now likely be under software control? OK.

At least, I've been able to add to my collection of C types that
represent an 8-bit byte:

signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)

The last two are apparently incompatible with the char versions.

You forgot plain char,

I had char but took it out, as it's a outlier.

int_least8_t, and uint_least8_t.

And 'fast' versions? I still don't know what any of these mean! No other languages seem to have bothered.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tue Nov 25 19:06:31 2025

bart <bc@freeuk.com> writes:

On 25/11/2025 23:20, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 25/11/2025 20:25, David Brown wrote:

[...]

� Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000
bits.

It's not about the code that implements multiplication. In gcc,
that's done by calling a built-in function that can operate on
arbitrary data widths. Think about memory management.

Well, I was responding to a suggestion that BitInt support didn't need
a library.

David didn't actually suggest that. He said that arbitrary-sized
integers would need a library (and such libraries exist), not that
fixed-size integers don't.

The point, I think, is that arbitrary-sized integers, without radical
changes to the language, would require a *visible* library while the
_BitInt types are built into the language. Yes, some operations are implemented as function calls in some implementations. The same
could be true for just about any operation. Some implementations
have software floating-point. gcc implements a large struct
assignment by generating a call to memcmp. And so on.

But memory management is a good point. Actual, variable-sized bigints
would be awkward in C if you want to use them in ordinary expressions.

Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.

Again, any such problems have already been solved by the gcc and
llvm/clang implementations (aside from a clang problem with large multiplication and division). "This feature would be too difficult to implement" is a weak argument when implentations already exist.

BTW, clang has had this feature (originally called _ExtInt rather than
_BitInt) since 2019. Here's the git log entry. The committer is one of
the authors of the N2021 paper, so the similarities are unsurprising.

```
commit 61ba1481e200b5b35baa81ffcff81acb678e8508
Author: Erich Keane <erich.keane@intel.com>
Date: 2019-12-24 07:28:40 -0800

Implement _ExtInt as an extended int type specifier.

Introduction/Motivation:
LLVM-IR supports integers of non-power-of-2 bitwidth, in the iN syntax.
Integers of non-power-of-two aren't particularly interesting or useful
on most hardware, so much so that no language in Clang has been
motivated to expose it before.

However, in the case of FPGA hardware normal integer types where the
full bitwidth isn't used, is extremely wasteful and has severe
performance/space concerns. Because of this, Intel has introduced this
functionality in the High Level Synthesis compiler[0]
under the name "Arbitrary Precision Integer" (ap_int for short). This
has been extremely useful and effective for our users, permitting them
to optimize their storage and operation space on an architecture where
both can be extremely expensive.

We are proposing upstreaming a more palatable version of this to the
community, in the form of this proposal and accompanying patch. We are
proposing the syntax _ExtInt(N). We intend to propose this to the WG14
committee[1], and the underscore-capital seems like the active direction
for a WG14 paper's acceptance. An alternative that Richard Smith
suggested on the initial review was __int(N), however we believe that
is much less acceptable by WG14. We considered _Int, however _Int is
used as an identifier in libstdc++ and there is no good way to fall
back to an identifier (since _Int(5) is indistinguishable from an
unnamed initializer of a template type named _Int).

[0]https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html)
[1]http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf

Differential Revision: https://reviews.llvm.org/D73967
```

[...]

I think I would have responded better to BitInt if presented as a
'bit-set', effectively a fixed-size bit-array, but passed by
value. This is something that I'd considered myself at one time.

Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal
bitsets.)

So rather than a new feature for wide integer types, you would
have preferred something that DOESN'T SUPPORT ARITHMETIC?? How is
that relevant to _BitInt? Bit vectors are great, but they aren't
integers.

This might interest you :
https://github.com/michaeldipperstein/bitarray

More significantly, an unbounded version could be passed by reference,
with an accompanying length (I could also use slices that have the
length) as happens with arrays in C.

Right, like arrays of unsigned char.

[...]

It could also be nice to be able to write code that deals with
multiple widths of _BitInt types, as we can do for arrays even
without VLAs. But C's treatment of arrays is messy, and I'm not
sure duplicating that mess for _BitInt types would be a great idea.
And I wouldn't want to lose the ability to pass _BitInt values
to functions.
[...]

So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.

It brings a 128-bit integer type with constants and straightforward
assignment, comparison, and arithmetic operators.

I was commenting on the ipv6 example, where structs give you that
already, except arithmetic which makes little sense.

OK, I probably snipped too much context here. unsigned _BitInt(128)
could be a reasonable way to represent an ipv6 address. So could
unsigned char[16], or a struct containing an unsigned char[16].

[...]

At least, I've been able to add to my collection of C types that
represent an 8-bit byte:

signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)

The last two are apparently incompatible with the char versions.

You forgot plain char,

I had char but took it out, as it's a outlier.

OK, whatever works for you.

int_least8_t, and uint_least8_t.

And 'fast' versions? I still don't know what any of these mean! No
other languages seem to have bothered.

The "fast" versions could be larger than 8 bits (though I'm mildly
surprised to see that [u]int8_fast_t types *are* 8 bits on several
compilers I just tried).

Of course C++ and Objective-C incorporate C's standard library.

You say you don't know what they mean. Do you *want* to know?
You can always read the standard's description if you're curious.
I never assume that saying you don't know something means that you
want to know about it.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tue Nov 25 19:21:03 2025

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

bart <bc@freeuk.com> writes:

[...]

OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?

I don't know. The language allows 1-bit signed bit-fields, so
_BitInt(1) would make some sense, but the language requires N to
be at least 1 for unsigned _BitInt and 2 for signed _BitInt.

It doesn't bother me too much, since I'm unlikely to have a
use for signed _BitInt(1). But it's an arbitrary restriction.

[...]

I just learned that there's a proposal to allow _BitInt(1) in C2y.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3699.pdf

The current restriction apparently was for historical reasons.
Prior to C23, C didn't require two's complement for signed types,
and signed _BitInt(1) doesn't make much sense for one's complement
or sign-and-magnitude (it could only hold +0 and -0).

Yes, C23 added both _BitInt and the requirement for two's complement,
but preliminary implementations of _BitInt go back several years,
and the requirements didn't catch up. Stuff happens.

Incidentally, C23 requires BITINT_MAXWIDTH to be at least
ULLONG_WIDTH, which is at least 64. clang/llvm sets it to 128 for
some target systems.

https://eisenwave.github.io/cpp-proposals/bitint.html
is a proposal to add C23-style bit-precise integers to C++.
</OT>

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 26 08:55:37 2025

On 25/11/2025 22:58, bart wrote:

On 25/11/2025 20:25, David Brown wrote:

On 24/11/2025 23:27, bart wrote:

On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I
think C is still stuck at one, or 4 if you're lucky.)

I have no idea or opinion on why /you/ might want 128-bit or larger
integer types.� I believe there is very little use for "normal"
numbers - things you might want to write as literals, calculate with,
and read or write - that won't fit perfectly well within 64 bit types,
and would not be better served by arbitrary sized integers.

� Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in the
C language - they need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000 bits.

You are looking at things in completely the wrong way.

Long before you start thinking of how to implement operations, think
about what the types are at a fundamental level.

A fixed-size integer is a value type of fixed, compile-time size. It is passed around as a value. Local instances can be put on a stack with compile-time fixed offsets (and thus using [sp + N] access modes in an implementation). The type has a single simple and obvious (albeit
slightly implementation-dependent) bit representation. A _BitInt(32)
will be identical at the low level to an int32_t. Bigger _BitInt types
are just the same, only bigger. There is no difference in concept, or representation, whether the type is 32-bit or 32 million bits.

An arbitrary sized integer is a dynamic type with variable size. The
base object will hold information about pointers to data, sizes for that stored data - including both how much is in use, and how much is
available. There are endless ways to make such types - you can support multiple allocation parts, or use a single contiguous allocation. You
can store the data in binary, or some kind of packed decimal, or other formats. Passing them around might mean just passing around the base
object, but sometimes you need to make deep copies. Operations might
lead to heap memory allocations or deallocations.

They are so /totally/ different that any similarities in the way you do
a particular arithmetic operation are completely incidental.

Maybe the latter is autoranging, and might give a 200,000-bit result.

Presumably the former doesn't use inline code, so it would be surprising
if each distinct size of BitInt had dedicated sets of routines for this.
So it sounds like they have to use a generic library anyway.

And sure enough, gcc-generated code contains stuff like this:

��mov�� r8, rcx
��mov�� edx, 50000�� # (BitInt(50000)
��mov�� rcx, rax
��call�� __mulbitint3

So, BitInts are different in that they /don't/ need a library?

I can tell you why /I/ might find larger integer types useful.� They
include :

* 128-bit for IPv6 address.� These use a variety of styles for input
and display, and thus would use specialised routines, not simple
literals or printf-style IO.

So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.

A struct is certainly what I use today. But there may be times when it
is convenient to hold the data in a single scalar object. Depending on
the target device, registers, and operations, there might be registers
that can hold a 128-bit scalar for passing it around, or for atomically accessing them.

* Big units for passing data around with larger memory transfers,
using SIMD registers.� IO is irrelevant here.

Structs and arrays again spring to mind if you just want an anonymous
data block. (I wonder why it has to be bit-precise for byte-addressed memory?)

If I have a processor that has 256-bit vector registers, then moving
data by loading and storing 256-bit blocks is going to be more efficient
than doing a loop of 16 byte moves. Today, I would use uint64_t for the
task, as the biggest type available. Why does it have to be
bit-precise? It must be bit-precise because I would want to move 256
bits - not 255 bits or 257 bits.

* Cryptography.� IO is irrelevant here.� But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
3072, 4096, 7680, 8096 bits.� There may be more common sizes - I'm
just thinking of DES, 3DES, AES, SHA, ECC and RSA.

And I'm again curious as to what /non-numeric/ use a 200,000-bit BitInt might be put to, that is not better served by an array or struct.

I don't have a use for a 200,000 bit integer type at the moment. But I
cannot imagine any reason why the language specifications should have arbitrary limits. Are you suggesting that the C standards show say "You
can have _BitInt's up to 8096 because someone found a use for them, but
you can't have size 8097 and above - and 200,000 is right out - because someone else can't imagine they are useful" ?

An implementation can - indeed, must - set a limit to the sizes it
supports. Implementations can have many reasons to do so. Some implementations might have quite low limits (the size of "long long int"
is the minimum allowed for conformance), but then that implementation
might not be so useful to some people.

Maybe bit-sets? But there are no special features for accessing
individual bits.

That BigInt() defaults to a signed integer (twos complement?), even for
very large sizes suggests that /numeric/ applications are a primary use.

Obviously the C standards should have made "_BitInt" signed up to size
73 bits, and unsigned from then on. That would have been /so/ much
clearer and simpler for everyone.

Smaller sizes can be useful for holding RGB pixel values, audio data,
etc.

Except that these are probably rounded up, to the next multiple of two.
So the benefit is minimal; it do something with those padding bits.

I write C code. I want my C code to be clear and represent what I am handling, and then let the compiler do its job of generating efficient results. So if I am dealing with data that is 24-bit signed integer
data, then _BitInt(24) (especially with a typedef name) is more accurate source code than "int" or "int32_t".

And 821 bits. This is what I don't get. Why is THAT so important?

Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?

The folks behind the proposal provided both.� The fact that you can
write _BitInt(821) does not in any way hinder use of _BitInt(256).� I
really don't get your problem here.

You've heard of 'code smell'? Well, this is the same, but for features.

Your nose is blocked. Or to be more accurate, you are so obsessed with
the idea that your own language is "perfect" that you simply cannot
accept that other languages might have good features that your language
does not, or that other programmers might want features that your
language does not have.

I've been doing this stuff long enough to recognise when a feature is over-elaborate, over-specified and over-flexible. You need to know the minimum you can get away with, not the maximum!

NIH syndrome combined with megalomania. Other people do this stuff
better than you.

Let me guess, some committee members have been looking too long at how
C++ does things? That language is utterly incapable of creating anything small and simple.

And yet C and C++ programmers outnumber programmers of Bart's own
language by millions. No language - except for yours, of course - is
perfect. But it seems C and C++ are both pretty good for getting the
job done.

If the proposal had instead been simply to extend the 'u8 u16 u32
u64' set of types by a few more entries on the right, say 'u128 u256
u512', would anyone have been clamouring for types like 'u1187'? I
doubt it.

/You/ might not have wanted them, but other people would.

OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?

Apparently that one was ruled out. (I believe the C++ plans for _BitInt
will allow it there - not because it is a useful type in itself, but
because allowing it slightly simplifies generic programming with _BitInt types.)

For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-
based types like Ada, or not at all.

Fortunately for the C world, you are not on the C committee - it
doesn't matter if you can't see beyond the end of your nose.

Maybe unfortunately. C used to be a fairly simple language with a lot of baggage; now it's a much heftier one with a lot of baggage!

At least, I've been able to add to my collection of C types that
represent an 8-bit byte:

�� signed char
�� unsigned char
�� int8_t
�� uint8_t
�� _BitInt(8)
�� unsigned _BitInt(8)

The last two are apparently incompatible with the char versions.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Wed Nov 26 11:12:47 2025

On Tue, 25 Nov 2025 18:33:30 +0000
bart <bc@freeuk.com> wrote:

(The only ADCs I've used were 4-bit (homemade)

Why am I not surprised? ;-)

and 8-bit, both giving
unsigned data in parallel, used for frame-grabbing video circuits so
read directly into memory rather than via an explicit memory- or
port-read instruction.)

ADC technology is improving at decent rate.
Recently we used converter with successive-approximation
architecture that delivers better SNR than most delta-sigma
converters of just few years ago. Without suffering from all
dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.

https://www.analog.com/en/products/ad4030-24.html

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Wed Nov 26 11:29:16 2025

On Tue, 25 Nov 2025 18:33:30 +0000
bart <bc@freeuk.com> wrote:

* I don't recollect needing to sign-extend field that does not
start

offset zero,

So what's in the rest of the 32-bit field, garbage?

Either garbage or zero or, rarely there could be meaningful flags.
I don't see how the question is relevant.

Same for your other points - I don't recollect that I neeed
something like that sufficiently often to ... well... recollect.

Yours is one of a thousand possible applications. Everyone will have different needs. Maybe someone else will have a 16 or 32-bit value
with assorted bitfields of different widths.

Then maybe C bitfields could be used, but a bigger problem with those
is poor control over layout, which is anyway implementation-defined.
(Mine of course don't have that problem!)

According to the language of The Standard, it's not 'poor control'.
As far as standard requirements goes, there is *no* control on layout of
bit fields.
Of course, implementer is encouraged to specify exact rules in his
documents. In many (not all) cases bitfield layout is part of the ABI,
so it is shared by all compilers on given platform. But that does not
exactly help people that don't like reading ABI docs or compiler
manuals. Also does not help those poor souls that try to write portable
code.

Shifts and masks provide much more solid ground.
And combination of shifts with _BitInt() appears equally solid, but
more convenient and more self-documenting.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Wed Nov 26 11:52:07 2025

On Tue, 25 Nov 2025 19:06:31 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

BTW, clang has had this feature (originally called _ExtInt rather than _BitInt) since 2019. Here's the git log entry. The committer is one
of the authors of the N2021 paper, so the similarities are
unsurprising.

```
commit 61ba1481e200b5b35baa81ffcff81acb678e8508
Author: Erich Keane <erich.keane@intel.com>
Date: 2019-12-24 07:28:40 -0800

Implement _ExtInt as an extended int type specifier.

Introduction/Motivation:
LLVM-IR supports integers of non-power-of-2 bitwidth, in the iN
syntax. Integers of non-power-of-two aren't particularly interesting
or useful on most hardware, so much so that no language in Clang has
been motivated to expose it before.

However, in the case of FPGA hardware normal integer types where
the full bitwidth isn't used, is extremely wasteful and has severe
performance/space concerns. Because of this, Intel has
introduced this functionality in the High Level Synthesis compiler[0]
under the name "Arbitrary Precision Integer" (ap_int for short).
This has been extremely useful and effective for our users,
permitting them to optimize their storage and operation space on an architecture where both can be extremely expensive.

We are proposing upstreaming a more palatable version of this to
the community, in the form of this proposal and accompanying patch.
We are proposing the syntax _ExtInt(N). We intend to propose this to
the WG14 committee[1], and the underscore-capital seems like the
active direction for a WG14 paper's acceptance. An alternative that
Richard Smith suggested on the initial review was __int(N), however
we believe that is much less acceptable by WG14. We considered _Int,
however _Int is used as an identifier in libstdc++ and there is no
good way to fall back to an identifier (since _Int(5) is
indistinguishable from an unnamed initializer of a template type
named _Int).
[0]https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html)
[1]http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf

Differential Revision: https://reviews.llvm.org/D73967
```

[...]

I like the feature in the form that it ended up, but I certainly
dislike their motivation.
[O.T. rant]
High Level Synthesis, both by Altera (part of Intel in 2016-2024) and
by Xilinx (part of AMD since 2022), is an archetypal snake oil.
Bullshit doctors lure people into notion that they can save their time
by not learning proper HDLs. But at the naive user than believed their
crap end up spending more time rather than less.
As far as Altera/Xilinx is concerned, short-term gain is that users
make less efficient design, which mean that they have to by bigger,
more expensive FPGA devices. But at the long term it is loss for FPGA ecosystem, because more people believe that FPGas are shite when in
fact it's not FPGAs that are bad, but improper tools (HLS).
[/O.T. rant]

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Wed Nov 26 12:01:30 2025

On Tue, 25 Nov 2025 13:42:37 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

But the _BitInt version is definitely neater. I can see myself
using _BitInt(12) and similar sizes for things like values read from hardware sensors of different resolutions.

(The code for all three is the same with gcc on x86 or arm64 - unfortunately, gcc does not yet support _BitInt on many targets.)

[...]

Is support for _BitInt limited by target or by version?

It looks like _BitInt support was introduced in gcc 14.1.0. You might
have older versions of gcc on other platforms.

The most recent version of arm-none-eabi-gcc in my distribution of
choice (msys2) is 13.3.0.
I am too lazy to compile arm-none-eabi-gcc from source. Would rather
wait.
I suppose, David is like me in that regard, except that he probably
uses even more conservative distribution.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Wed Nov 26 12:05:42 2025

On 26/11/2025 07:55, David Brown wrote:

On 25/11/2025 22:58, bart wrote:

On 25/11/2025 20:25, David Brown wrote:

On 24/11/2025 23:27, bart wrote:

On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'.
I think C is still stuck at one, or 4 if you're lucky.)

I have no idea or opinion on why /you/ might want 128-bit or larger
integer types.� I believe there is very little use for "normal"
numbers - things you might want to write as literals, calculate with,
and read or write - that won't fit perfectly well within 64 bit
types, and would not be better served by arbitrary sized integers.

� Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000 bits.

You are looking at things in completely the wrong way.

Long before you start thinking of how to implement operations, think
about what the types are at a fundamental level.

A fixed-size integer is a value type of fixed, compile-time size.� It is passed around as a value.� Local instances can be put on a stack with compile-time fixed offsets (and thus using [sp + N] access modes in an implementation).� The type has a single simple and obvious (albeit
slightly implementation-dependent) bit representation.� A _BitInt(32)
will be identical at the low level to an int32_t.� Bigger _BitInt types
are just the same, only bigger.� There is no difference in concept, or representation, whether the type is 32-bit or 32 million bits.

An arbitrary sized integer is a dynamic type with variable size.� The
base object will hold information about pointers to data, sizes for that stored data - including both how much is in use, and how much is
available.� There are endless ways to make such types - you can support multiple allocation parts, or use a single contiguous allocation.� You
can store the data in binary, or some kind of packed decimal, or other formats.� Passing them around might mean just passing around the base object, but sometimes you need to make deep copies.� Operations might
lead to heap memory allocations or deallocations.

They are so /totally/ different that any similarities in the way you do
a particular arithmetic operation are completely incidental.

But BitInts /will/ need runtime library support?

I've acknowledged in my last post that arbitrary precision would have
memory management issues, /if/ you wanted to add them to the language in
such a way that, if variables 'a b c d' had such a type, you can write:

a = b + c * d;

This is not what I had in mind; such arithmetic would use explicit
function calls with explicit management of intermediates (like GMP).

So from this point of view, fixed-size BitInts are better, but also a
higher level ability than I would have considered added to the language.

Even if BitInts were restricted to saner and smaller sizes, I'd consider actual arithmetic on 128 bits up to a few K bits and above a specialist,
niche application.

But logic operations (== & | ^) on unsigned BitInts are more reasonable (because they implement some features of bit-sets).

For arithmetic on considerably larger numbers, I still think arbitrary precision is the best bet.

Structs and arrays again spring to mind if you just want an anonymous
data block. (I wonder why it has to be bit-precise for byte-addressed
memory?)

If I have a processor that has 256-bit vector registers, then moving
data by loading and storing 256-bit blocks is going to be more efficient than doing a loop of 16 byte moves.� Today, I would use uint64_t for the task, as the biggest type available.� Why does it have to be bit-
precise?� It must be bit-precise because I would want to move 256 bits -
not 255 bits or 257 bits.

By bit-precise I mean being able to specify 255 and 257 bits! Memory is usually expression in bytes or words; not bits.

* Cryptography.� IO is irrelevant here.� But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
3072, 4096, 7680, 8096 bits.� There may be more common sizes - I'm
just thinking of DES, 3DES, AES, SHA, ECC and RSA.

And I'm again curious as to what /non-numeric/ use a 200,000-bit
BitInt might be put to, that is not better served by an array or struct.

I don't have a use for a 200,000 bit integer type at the moment.� But I cannot imagine any reason why the language specifications should have arbitrary limits.� Are you suggesting that the C standards show say "You
can have _BitInt's up to 8096 because someone found a use for them, but
you can't have size 8097 and above - and 200,000 is right out - because someone else can't imagine they are useful" ?

And yet, integer widths have been roughly capped at double a machine
word size for decades - until 64 bits came along and then few even
bothered with double-width.

Nobody thought how easy it would be to just have an integer of whatever
size you like - you just generate whatever code is necessary to make it happen. We could have had BitInts on 32- and even 16-bit machines if
only somebody had thought of it!

An implementation can - indeed, must - set a limit to the sizes it
supports.� Implementations can have many reasons to do so.� Some implementations might have quite low limits (the size of "long long int"
is the minimum allowed for conformance), but then that implementation
might not be so useful to some people.

Maybe bit-sets? But there are no special features for accessing
individual bits.

That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.

Obviously the C standards should have made "_BitInt" signed up to size
73 bits, and unsigned from then on.� That would have been /so/ much
clearer and simpler for everyone.

Or unsigned could have been the default.

Smaller sizes can be useful for holding RGB pixel values, audio data,
etc.

Except that these are probably rounded up, to the next multiple of
two. So the benefit is minimal; it do something with those padding bits.

I write C code.� I want my C code to be clear and represent what I am handling, and then let the compiler do its job of generating efficient results.� So if I am dealing with data that is 24-bit signed integer
data, then _BitInt(24) (especially with a typedef name) is more accurate source code than "int" or "int32_t".

Suddenly everybody is dealing with signed values of 12 and 24 bits!

I actually had exactly that feature:

int*3 a # from 1980s; a 3-byte or 24-bit signed type
int:24 b # from 1990s; a 24-bit signed type

Or at least, I had the syntax. Those odd values would have been
rejected, as I didn't have support for them, or a way to emulate them
(which is what BitInt(24) appears to do).

So I got rid of the feature and ended up with int32 and then i32. (I
think Zig allows types like i24 and i123456, presumably built upon
LLVM's integer types which go up to 2**23 or 2**24 bits.)

You've heard of 'code smell'? Well, this is the same, but for features.

Your nose is blocked.� Or to be more accurate, you are so obsessed with
the idea that your own language is "perfect" that you simply cannot
accept that other languages might have good features that your language
does not, or that other programmers might want features that your
language does not have.

I've been doing this stuff long enough to recognise when a feature is
over-elaborate, over-specified and over-flexible. You need to know the
minimum you can get away with, not the maximum!

NIH syndrome combined with megalomania.� Other people do this stuff
better than you.

I've noticed that other languages tend to go overboard with things, and
now it's happening to C.

I made a decision to keep my systems language at a certain level
regarding such things as the type system, while having lots of
convenient micro-features:

print int@(x+y).[52..62]

This type-puns a float64 r-value expression into an int, and extracts
that bitfield (which is the unsigned exponent field when float64 uses
iee754).

I'd be interested to see how you can do this better, using general
language features (adding a dedicated .exponent property to floats would
be cheating!).

Let me guess, some committee members have been looking too long at how
C++ does things? That language is utterly incapable of creating
anything small and simple.

And yet C and C++ programmers outnumber programmers of Bart's own
language by millions.� No language - except for yours, of course - is perfect.� But it seems C and C++ are both pretty good for getting the
job done.

My systems language DOES have lots of very nice micro-features compared
to C. And usually they are presented in a tidy fashion. I don't think
there's any argument about that. (Look at C's ugly X-macros for example.)

My language is not perfect; a big thing it's missing is Pascal-style enumeration types that are type-safe, that would detect a lot of errors.

But as a systems language, it is much more enticing than C.

(Today I need to start porting a 20Kloc application in my language, to
C; proper C not machine transpiling. I'm not looking forward to all that typing!)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 26 13:15:21 2025

On 26/11/2025 03:08, bart wrote:

On 25/11/2025 23:20, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 25/11/2025 20:25, David Brown wrote:

[...]

�� Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000
bits.

It's not about the code that implements multiplication.� In gcc, that's
done by calling a built-in function that can operate on arbitrary data
widths.

Think about memory management.

Well, I was responding to a suggestion that BitInt support didn't need a library.

I did not say that. (You really need to get a better understanding of
basic logic.) I said that arbitrary sized integers need a library - I
did not say that fixed-sized integers do not need a library.

Perhaps more clearly, arbitrary sized integers need a user-visible
library in C. They need functions to allocate, deallocate, and copy the integers, as well as converting to and from normal integers, at a bare minimum.

It is normal in C implementations that some operations are done with
"hidden" library calls - functions in a "language support library" that
you do not call directly. On an x86 machine, "x / y" might generate a
divide instruction, while on a microcontroller it might generate a call
to a "__divide_int" function in an internal compiler-specific library
(with internal compiler-specific calling conventions). _BitInt support
can certainly make use of such libraries, just like anything else in C.

And it looks like the gcc implementation of _BitInt /does/ use such
libraries for big enough _BitInt types, while using inline code for
sizes that can be done reasonably efficiently. Clang, on the other
hand, apparently generates inline code no matter what size of _BitInt
you have. Those are implementation choices, and it's all hidden from
the user.

But memory management is a good point. Actual, variable-sized bigints
would be awkward in C if you want to use them in ordinary expressions.

Indeed.

Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.

You already support such types in C. If it is a problem, it is a
problem that every vaguely compliant C compiler has already solved.

struct Big { uint64_t xs[250000]; }

That type is passed around, copied and assigned by value, even though it
is 1 MB in size. _BigInt's don't add any new issues here.

Perhaps a future standard will provide a more flexible flavor of
_BitInt.� It might allow the n in _BitInt(n) to be non-constant, or
empty, or "*", to denote an arbitrary-precision integer.� But it's
hard to see how that could be done without adding other fundamental
features to the language.� And a lot of people's response would be
that if you want C++, you know where to find it.

I think I would have responded better to BitInt if presented as a
'bit-set',� effectively a fixed-size bit-array, but passed by value.
This is something that I'd considered myself at one time.

Certainly _BitInt's can be used as bitsets.

Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal bitsets.)

_BitInt's have logical operators. You can get access to individual bits
from shifts and masks, just like for any other integer types.

How efficiently a given compiler handles these is another matter -
expect that early implementations will be correct but relatively
inefficient and gradually improve as _BitInt's get more popular.

More significantly, an unbounded version could be passed by reference,
with an accompanying length (I could also use slices that have the
length) as happens with arrays in C.

_BitInt's have fixed sizes - if you want a variable size, use an array.
No one is claiming that _BitInt types are somehow the perfect tool for
any use-case.

Similarly, C99 added complex types as a built-in language feature.
C++ added complex types as a template class, because C++ has language
features that support that kind of thing, including user-defined
literals.

If you can think of a way to add arbitrary-precision integers to C
without other radical changes to the language, let us know.

I have considered adding my actual arbitrary precision library to my
systems language. It would have been superfical (such types would not be nestable within other data structures), but would have been simpler to
use than function calls.

Some degree of automatic memory management would have been needed (initialise locals on function entry, free on exit, deal with intermediates), but not on the C++ scale due to the restrictions.

But I rejected that as being too high-level a feature, and my use-cases
more suitable for a scripting language.

Different languages can support different features in different ways. C cannot support types that involve memory management in a
user-transparent manner - memory management is manual in C. In C++, it
would be entirely possible to make arbitrary precision integers with
automatic memory management. It would not even be particularly
difficult (except for efficient implementation of arithmetic operations
on large sizes), and not need any language changes. But that would not
negate the uses of _BitInt, which is (AFAIUI) on its way into C++.

It could also be nice to be able to write code that deals with
multiple widths of _BitInt types, as we can do for arrays even
without VLAs.� But C's treatment of arrays is messy, and I'm not
sure duplicating that mess for _BitInt types would be a great idea.
And I wouldn't want to lose the ability to pass _BitInt values
to functions.

[...]

So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.

It brings a 128-bit integer type with constants and straightforward
assignment, comparison, and arithmetic operators.

I was commenting on the ipv6 example, where structs give you that
already, except arithmetic which makes little sense.

Shifting and masking would definitely be useful operations. I can't see
a point in adding or multiplying IPv6 addresses, but logical operations
would definitely be useful. Things like netmasks are not always on neat
octet boundaries.

[...]

That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.

Yes, C23 requires two's-complement for signed integers.� (It mandates
two's-complement representation, not wraparound behavior; signed
overflow is still UB).

Even though it will now likely be under software control? OK.

They play by the same rules as all other integer types in C.

At least, I've been able to add to my collection of C types that
represent an 8-bit byte:

�� signed char
�� unsigned char
�� int8_t
�� uint8_t
�� _BitInt(8)
�� unsigned _BitInt(8)

The last two are apparently incompatible with the char versions.

You forgot plain char,

I had char but took it out, as it's a outlier.

int_least8_t, and uint_least8_t.

And 'fast' versions? I still don't know what any of these mean! No other languages seem to have bothered.

The "fast" versions are types that have a minimum given size, but might
be faster than the exact or least versions for typical operations.

So "int_fast32_t" is guaranteed to have at least 32 bits of precision,
but is allowed to be bigger if that is faster. On x86, it is 64 bits
because 64-bit arithmetic and register moves can often involve fewer
masking or sign-extension operations than 32-bit operations. (Because
of the oddities of the x86 world, it seems "int_fast8_t" is 8 bits
rather than 64 bits.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 26 13:24:26 2025

On 25/11/2025 22:42, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

But the _BitInt version is definitely neater. I can see myself using
_BitInt(12) and similar sizes for things like values read from
hardware sensors of different resolutions.

(The code for all three is the same with gcc on x86 or arm64 -
unfortunately, gcc does not yet support _BitInt on many targets.)

[...]

Is support for _BitInt limited by target or by version?

Both - I expect it to be implemented for more targets in later versions :-)

It looks like _BitInt support was introduced in gcc 14.1.0. You might
have older versions of gcc on other platforms.

It was added to x86 and AArch64 targets in gcc 14. It is not supported
on any other targets as yet, as far as I know. Presumably it will come
when someone has done the work for the backends. (Some of such implementations are target-independent, but some are backend specific.) Generally, x86-64 and AArch64 are the targets that get the most focus
and support from the big companies, which 32-bit ARM, MIPS, PowerPC,
etc., can often be a little slower due to fewer resources.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Wed Nov 26 12:45:04 2025

On 26/11/2025 09:12, Michael S wrote:

On Tue, 25 Nov 2025 18:33:30 +0000
bart <bc@freeuk.com> wrote:

(The only ADCs I've used were 4-bit (homemade)

Why am I not surprised? ;-)

and 8-bit, both giving
unsigned data in parallel, used for frame-grabbing video circuits so
read directly into memory rather than via an explicit memory- or
port-read instruction.)

ADC technology is improving at decent rate.
Recently we used converter with successive-approximation
architecture that delivers better SNR than most delta-sigma
converters of just few years ago. Without suffering from all
dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.

https://www.analog.com/en/products/ad4030-24.html

That's interesting; my 4-bit circuit also worked at 2M samples per
second (128 samples every 52us), and probably would have worked much
higher if I'd had the memory to store the results.

This was in 1981.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Wed Nov 26 15:08:54 2025

On Wed, 26 Nov 2025 13:15:21 +0100
David Brown <david.brown@hesbynett.no> wrote:

I did not say that. (You really need to get a better understanding
of basic logic.) I said that arbitrary sized integers need a library
- I did not say that fixed-sized integers do not need a library.

Perhaps more clearly, arbitrary sized integers need a user-visible
library in C. They need functions to allocate, deallocate, and copy
the integers, as well as converting to and from normal integers, at a
bare minimum.

Perhaps things will become even more clear if we make distinction
between run-time library and compiler support library.

In specific case of gcc, the latter is called libgcc. It is (almost)
per architecture. (Almost) the same libgcc works on x86-64 Windows,
Linux, BSD or Solaris. The same for other popular architectures.

The former, on the other hand, is certainly different on different
platforms with the same architecture, but sometimes can be different on
the same platform/architecture. For example, newlib nowadays used
almost exclusively on embedded platforms without real OS, but
historically was invented for Linux, by people, not totally unlike Bart
in their attitude, that hated code bloat of glibc.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Wed Nov 26 15:31:06 2025

On Wed, 26 Nov 2025 12:45:04 +0000
bart <bc@freeuk.com> wrote:

On 26/11/2025 09:12, Michael S wrote:

On Tue, 25 Nov 2025 18:33:30 +0000
bart <bc@freeuk.com> wrote:

(The only ADCs I've used were 4-bit (homemade)

Why am I not surprised? ;-)

and 8-bit, both giving
unsigned data in parallel, used for frame-grabbing video circuits
so read directly into memory rather than via an explicit memory- or
port-read instruction.)

ADC technology is improving at decent rate.
Recently we used converter with successive-approximation
architecture that delivers better SNR than most delta-sigma
converters of just few years ago. Without suffering from all
dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.

https://www.analog.com/en/products/ad4030-24.html

That's interesting; my 4-bit circuit also worked at 2M samples per
second (128 samples every 52us), and probably would have worked much
higher if I'd had the memory to store the results.

This was in 1981.

I would guess that your circuit used Flash ADC architecture. https://en.wikipedia.org/wiki/Flash_ADC
This architecture is great for low resolution and high sample rate, but
can't be improved beyond 10-11 "true" bits of resolution. Or, may be,
it can, but it's so hard that nobody bothers. Instead, high res/high
rate converters use pipelined architecture - sort of cross between
Flash and SAR. The cost of it is typically high power consumption.
Also, resolution/SNR is still not as good as really good SAR.

Example of pipelined ADC:
https://www.analog.com/en/products/ad9652.html

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 26 15:08:27 2025

On 26/11/2025 11:01, Michael S wrote:

On Tue, 25 Nov 2025 13:42:37 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

But the _BitInt version is definitely neater. I can see myself
using _BitInt(12) and similar sizes for things like values read from
hardware sensors of different resolutions.

(The code for all three is the same with gcc on x86 or arm64 -
unfortunately, gcc does not yet support _BitInt on many targets.)

[...]

Is support for _BitInt limited by target or by version?

It looks like _BitInt support was introduced in gcc 14.1.0. You might
have older versions of gcc on other platforms.

The most recent version of arm-none-eabi-gcc in my distribution of
choice (msys2) is 13.3.0.
I am too lazy to compile arm-none-eabi-gcc from source. Would rather
wait.
I suppose, David is like me in that regard, except that he probably
uses even more conservative distribution.

I have release 13.3 installed, but I haven't used it on any projects
yet. I tend to use new releases on new projects, but I am very
conservative about changing toolchains in existing projects.

But for things like this, I use godbolt.org - it is /so/ much easier
than testing individually. Just pick the compiler target and version
from the list, and see if you get an error message when using _BitInt.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 26 15:49:58 2025

On 26/11/2025 13:05, bart wrote:

On 26/11/2025 07:55, David Brown wrote:

On 25/11/2025 22:58, bart wrote:

On 25/11/2025 20:25, David Brown wrote:

On 24/11/2025 23:27, bart wrote:

On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. >>>>> I think C is still stuck at one, or 4 if you're lucky.)

I have no idea or opinion on why /you/ might want 128-bit or larger
integer types.� I believe there is very little use for "normal"
numbers - things you might want to write as literals, calculate
with, and read or write - that won't fit perfectly well within 64
bit types, and would not be better served by arbitrary sized integers.

� Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.

Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that
for multiplying two abitrary-precision ints that happen to be 100,000
bits.

You are looking at things in completely the wrong way.

Long before you start thinking of how to implement operations, think
about what the types are at a fundamental level.

A fixed-size integer is a value type of fixed, compile-time size.� It
is passed around as a value.� Local instances can be put on a stack
with compile-time fixed offsets (and thus using [sp + N] access modes
in an implementation).� The type has a single simple and obvious
(albeit slightly implementation-dependent) bit representation.� A
_BitInt(32) will be identical at the low level to an int32_t.� Bigger
_BitInt types are just the same, only bigger.� There is no difference
in concept, or representation, whether the type is 32-bit or 32
million bits.

An arbitrary sized integer is a dynamic type with variable size.� The
base object will hold information about pointers to data, sizes for
that stored data - including both how much is in use, and how much is
available.� There are endless ways to make such types - you can
support multiple allocation parts, or use a single contiguous
allocation.� You can store the data in binary, or some kind of packed
decimal, or other formats.� Passing them around might mean just
passing around the base object, but sometimes you need to make deep
copies.� Operations might lead to heap memory allocations or
deallocations.

They are so /totally/ different that any similarities in the way you
do a particular arithmetic operation are completely incidental.

But BitInts /will/ need runtime library support?

No, not if an implementation generates the code inline (as clang appears
to do). An implementation /may/ use helper functions from a language
support library - gcc does that, depending on the sizes of the _BitInt
and the operations you are doing. That is no different from all sorts
of other things in the language, and is not some external runtime
library. Your code will not be calling "bigint.dll" or anything like that.

I've acknowledged in my last post that arbitrary precision would have
memory management issues, /if/ you wanted to add them to the language in such a way that, if variables 'a b c d' had such a type, you can write:

�� a = b + c * d;

Arbitrary precision integers have memory management issues no matter how
you want to use them. They need dynamic memory. Either the language
has some kind of automatic memory management (reference counting, RAII, garbage collection, etc.), or it must be done manually. It does not
matter if you use operator notation or function-call notation - except
that you cannot use operator notation with manual memory management.

This is not what I had in mind; such arithmetic would use explicit
function calls with explicit management of intermediates (like GMP).

So from this point of view, fixed-size BitInts are better, but also a
higher level ability than I would have considered added to the language.

_BitInt's are certainly better in that they are scalar types with value semantics and no need for any dynamic memory. Of course arbitrary
precision integers have other advantages. Although for some use-cases
either would work, each can be significantly more appropriate for
different situations.

To my mind, the need for dynamic memory would mean arbitrary precision integers are not appropriate for C - either at the core language level,
or as part of the standard library. I think it is reasonable to have different opinions as to how well fixed-size _BitInts are appropriate to
have in the C core language, though as they are now in C23, the point is
now moot.

Even if BitInts were restricted to saner and smaller sizes, I'd consider actual arithmetic on 128 bits up to a few K bits and above a specialist, niche application.

Fair enough.

But logic operations (== & | ^) on unsigned BitInts are more reasonable (because they implement some features of bit-sets).

For arithmetic on considerably larger numbers, I still think arbitrary precision is the best bet.

Also fair enough.

I don't think anyone is likely to be multiplying million-bit _BitInts in
real code. But I don't think it is appropriate for the language
standard to pick some arbitrary size and say "below that is fine, above
that is too big and programmers should use something else". I don't
think it is appropriate for compiler implementers either. (They may
pick limits based on how they implement things internally - that's not
an arbitrary limit.) Different people have different needs, and no
particular limit fits all use-cases.

Structs and arrays again spring to mind if you just want an anonymous
data block. (I wonder why it has to be bit-precise for byte-addressed
memory?)

If I have a processor that has 256-bit vector registers, then moving
data by loading and storing 256-bit blocks is going to be more
efficient than doing a loop of 16 byte moves.� Today, I would use
uint64_t for the task, as the biggest type available.� Why does it
have to be bit- precise?� It must be bit-precise because I would want
to move 256 bits - not 255 bits or 257 bits.

By bit-precise I mean being able to specify 255 and 257 bits! Memory is usually expression in bytes or words; not bits.

"Bit-precise" means "exactly the bit count I specify". I agree that for moving memory around, I would pick a bit size that is a multiple of 8,
and very likely a power of 2.

* Cryptography.� IO is irrelevant here.� But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
3072, 4096, 7680, 8096 bits.� There may be more common sizes - I'm
just thinking of DES, 3DES, AES, SHA, ECC and RSA.

And I'm again curious as to what /non-numeric/ use a 200,000-bit
BitInt might be put to, that is not better served by an array or struct. >>>

I don't have a use for a 200,000 bit integer type at the moment.� But
I cannot imagine any reason why the language specifications should
have arbitrary limits.� Are you suggesting that the C standards show
say "You can have _BitInt's up to 8096 because someone found a use for
them, but you can't have size 8097 and above - and 200,000 is right
out - because someone else can't imagine they are useful" ?

And yet, integer widths have been roughly capped at double a machine
word size for decades - until 64 bits came along and then few even
bothered with double-width.

Nobody thought how easy it would be to just have an integer of whatever
size you like - you just generate whatever code is necessary to make it happen. We could have had BitInts on 32- and even 16-bit machines if
only somebody had thought of it!

We certainly could have had these. And people /have/ thought about it.
There are endless examples of libraries and "home-made" big integer
types. The reason we have them /now/ is that some people have felt they
were useful enough for their purposes that they bothered doing the work
to implement them in clang and then write proposals to add them to the C standards. Getting something like this into standard C takes time,
expertise, effort and money - commodities that are usually far less
easily available than ideas and imagination.

An implementation can - indeed, must - set a limit to the sizes it
supports.� Implementations can have many reasons to do so.� Some
implementations might have quite low limits (the size of "long long
int" is the minimum allowed for conformance), but then that
implementation might not be so useful to some people.

Maybe bit-sets? But there are no special features for accessing
individual bits.

That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.

Obviously the C standards should have made "_BitInt" signed up to size
73 bits, and unsigned from then on.� That would have been /so/ much
clearer and simpler for everyone.

Or unsigned could have been the default.

That would have been possible, but pointlessly out of step with all
other integer types in C.

Smaller sizes can be useful for holding RGB pixel values, audio
data, etc.

Except that these are probably rounded up, to the next multiple of
two. So the benefit is minimal; it do something with those padding bits. >>>

I write C code.� I want my C code to be clear and represent what I am
handling, and then let the compiler do its job of generating efficient
results.� So if I am dealing with data that is 24-bit signed integer
data, then _BitInt(24) (especially with a typedef name) is more
accurate source code than "int" or "int32_t".

Suddenly everybody is dealing with signed values of 12 and 24 bits!

I don't think I would count Michael and I as "everybody".

But it is certainly true that data from hardware sensors is often of a resolution that does not fit exactly in a standard integer type size,
and _BitInt - signed or unsigned - can be a clear way to work with these values.

I actually had exactly that feature:

�� int*3� a�� # from 1980s; a 3-byte or 24-bit signed type
�� int:24 b�� # from 1990s; a 24-bit signed type

Or at least, I had the syntax. Those odd values would have been
rejected, as I didn't have support for them, or a way to emulate them
(which is what BitInt(24) appears to do).

So I got rid of the feature and ended up with int32 and then i32. (I
think Zig allows types like i24 and i123456, presumably built upon
LLVM's integer types which go up to 2**23 or 2**24 bits.)

You've heard of 'code smell'? Well, this is the same, but for features.

Your nose is blocked.� Or to be more accurate, you are so obsessed
with the idea that your own language is "perfect" that you simply
cannot accept that other languages might have good features that your
language does not, or that other programmers might want features that
your language does not have.

I've been doing this stuff long enough to recognise when a feature is
over-elaborate, over-specified and over-flexible. You need to know
the minimum you can get away with, not the maximum!

NIH syndrome combined with megalomania.� Other people do this stuff
better than you.

I've noticed that other languages tend to go overboard with things, and
now it's happening to C.

What seems to happen is that you read a little bit about a feature, then
go bananas about how terrible it is because it is different from your
own language - without learning about the feature, its use-cases, or why
it was added to the language. When you are called out, and when -
usually through many, many tea-spoon explanations - you understand the feature, you stick to your guns and continue to complain about it no
matter how silly you sound.

It's okay to think that simpler is sometimes better, or that you
disagree with some of the newer features in C. People have different
opinions on the direction newer C standards have taken. But if you want
to critique a feature, do so on the basis of an understanding of the
feature, an understanding of why it was added and what people might use
it for, and an understanding of what pros and cons it has compared to alternatives. "I'm a genius language designer and I don't have this
feature and I don't want to use it" is not a rational argument.

I made a decision to keep my systems language at a certain level
regarding such things as the type system, while having lots of
convenient micro-features:

�� print int@(x+y).[52..62]

This type-puns a float64 r-value expression into an int, and extracts
that bitfield (which is the unsigned exponent field when float64 uses iee754).

I'd be interested to see how you can do this better, using general
language features (adding a dedicated .exponent property to floats would
be cheating!).

What an absurd thing to ask for. You have a special feature in your
language for writing obscure things that are rarely if ever useful in
normal coding. Of course you can write the same effect in C, in a
simple function a few lines long. And that's the way it should be -
obscure things should not take up cognitive space that makes common
things harder.

Let me guess, some committee members have been looking too long at
how C++ does things? That language is utterly incapable of creating
anything small and simple.

And yet C and C++ programmers outnumber programmers of Bart's own
language by millions.� No language - except for yours, of course - is
perfect.� But it seems C and C++ are both pretty good for getting the
job done.

My systems language DOES have lots of very nice micro-features compared
to C. And usually they are presented in a tidy fashion. I don't think there's any argument about that. (Look at C's ugly X-macros for example.)

My language is not perfect; a big thing it's missing is Pascal-style enumeration types that are type-safe, that would detect a lot of errors.

But as a systems language, it is much more enticing than C.

And that is presumably why it is so much more popular than C.

(Today I need to start porting a 20Kloc application in my language, to
C; proper C not machine transpiling. I'm not looking forward to all that typing!)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Wed Nov 26 15:44:54 2025

On 26/11/2025 14:49, David Brown wrote:

On 26/11/2025 13:05, bart wrote:

On 26/11/2025 07:55, David Brown wrote:

NIH syndrome combined with megalomania.� Other people do this stuff
better than you.

I made a decision to keep my systems language at a certain level
regarding such things as the type system, while having lots of
convenient micro-features:

�� print int@(x+y).[52..62]

This type-puns a float64 r-value expression into an int, and extracts
that bitfield (which is the unsigned exponent field when float64 uses
iee754).

I'd be interested to see how you can do this better, using general
language features (adding a dedicated .exponent property to floats
would be cheating!).

What an absurd thing to ask for.

You said, "Other people do this stuff better than you". Presumably,
devising language features. So I gave an example of a small task, and
asked which features those people would devise, or what solution they
would use.

� You have a special feature in your
language for writing obscure things that are rarely if ever useful in
normal coding.

Yes, I call them 'micro-features'.

The examples showed rvalue type-punning and bitfield extraction, which
were recent examples in this thread.

In C, the solution for my example might look like this:

double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

Rather more fiddly and error prone, and it needs an auxiliary statement
that makes it awkward to embed into an expression. (I also had to think
twice about that format code.)

BTW here is how my C transpiler translated it, so it /can/ be done
without explicit temporaries:

mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x + y)),(i64)52,(i64)62),NULL);

� Of course you can write the same effect in C, in a
simple function a few lines long.

Yes, everyone can invent their own solutions. (I've just taken that a
few steps further with an entire language.)

� And that's the way it should be -
obscure things should not take up cognitive space that makes common
things harder.

But _BitInt(12) was also used as an example of saving a few lines of
code or having to write a function or macro (there, to sign-extend the
low-N bits of an integer value, when N is known at compile-time).

But as a systems language, it is much more enticing than C.

And that is presumably why it is so much more popular than C.

If it was generally available then I think quite a few would prefer it.
As it is I enjoy the benefits myself.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 26 17:37:38 2025

On 26/11/2025 16:44, bart wrote:

On 26/11/2025 14:49, David Brown wrote:

On 26/11/2025 13:05, bart wrote:

On 26/11/2025 07:55, David Brown wrote:

NIH syndrome combined with megalomania.� Other people do this stuff
better than you.

I made a decision to keep my systems language at a certain level
regarding such things as the type system, while having lots of
convenient micro-features:

�� print int@(x+y).[52..62]

This type-puns a float64 r-value expression into an int, and extracts
that bitfield (which is the unsigned exponent field when float64 uses
iee754).

I'd be interested to see how you can do this better, using general
language features (adding a dedicated .exponent property to floats
would be cheating!).

What an absurd thing to ask for.

You said, "Other people do this stuff better than you". Presumably,
devising language features. So I gave an example of a small task, and
asked which features those people would devise, or what solution they
would use.

The "other people" I referred to are the folks behind the C language,
not me.

� You have a special feature in your language for writing obscure
things that are rarely if ever useful in normal coding.

Yes, I call them 'micro-features'.

The examples showed rvalue type-punning and bitfield extraction, which
were recent examples in this thread.

In C, the solution for my example might look like this:

�� double temp = x+y;
�� printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

No, that's not how a C solution would work. People who know C would
know that. As a challenge for you, see if you can spot your mistake.

(And of course if anyone wanted to do this stuff in real code, they'd
wrap things in a static inline "bit_range_extract" function.)

Rather more fiddly and error prone, and it needs an auxiliary statement
that makes it awkward to embed into an expression. (I also had to think twice about that format code.)

BTW here is how my C transpiler translated it, so it /can/ be done
without explicit temporaries:

�� mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x + y)),(i64)52,(i64)62),NULL);

Avoiding explicit temporaries is not a goal to aspire to - unless you
are trying to squeeze performance from a poorly optimising compiler.

� Of course you can write the same effect in C, in a simple function a
few lines long.

Yes, everyone can invent their own solutions. (I've just taken that a
few steps further with an entire language.)

� And that's the way it should be - obscure things should not take up
cognitive space that makes common things harder.

But _BitInt(12) was also used as an example of saving a few lines of
code or having to write a function or macro (there, to sign-extend the
low-N bits of an integer value, when N is known at compile-time).

No, what was shown was how _BitInt(12) could let people write clearer C
code than C without _BitInt. There was no comparison to other languages
or other features.

But as a systems language, it is much more enticing than C.

And that is presumably why it is so much more popular than C.

If it was generally available then I think quite a few would prefer it.

Sure. Keep telling yourself that.

As it is I enjoy the benefits myself.

That I /do/ believe - and I genuinely think it is great that you enjoy it.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Wed Nov 26 18:42:11 2025

On 26/11/2025 16:37, David Brown wrote:

On 26/11/2025 16:44, bart wrote:

The "other people" I referred to are the folks behind the C language,
not me.

OK. The people who chose to make 'break' do two jobs, unfortunately in
parts of the language that can overlap in use; those people! (I guess
you mean the more recent lot.)

In C, the solution for my example might look like this:

�� double temp = x+y;
�� printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

No, that's not how a C solution would work.� People who know C would
know that.� As a challenge for you, see if you can spot your mistake.

This was my point. (Although I can't see the problem, making it even
more pertinent.)

(And of course if anyone wanted to do this stuff in real code, they'd
wrap things in a static inline "bit_range_extract" function.)

Also my point: everyone will invent their own incompatible solutions for
this fundamental stuff.

You forgot about the type-punning part, which I guess needs yet another inlined function,

Rather more fiddly and error prone, and it needs an auxiliary
statement that makes it awkward to embed into an expression. (I also
had to think twice about that format code.)

BTW here is how my C transpiler translated it, so it /can/ be done
without explicit temporaries:

�� mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x
+ y)),(i64)52,(i64)62),NULL);

Avoiding explicit temporaries is not a goal to aspire to - unless you
are trying to squeeze performance from a poorly optimising compiler.

The memory temp involved a declaration which needs to exist outside of
the expression in standard C. While type-punning in C either means
writing to a union, or using & and applying a cast.

(My type-punning works on rvalues and will work on values in registers.)

No, what was shown was how _BitInt(12) could let people write clearer C
code than C without _BitInt.� There was no comparison to other languages
or other features.

But when it came my example, it could trivially be done with inline
functions, just like this could.

But as a systems language, it is much more enticing than C.

And that is presumably why it is so much more popular than C.

If it was generally available then I think quite a few would prefer it.

Sure.� Keep telling yourself that.

Well, it would be a minority. Grown-up languages with decent syntax
exist such as Ada and Fortran; those are not that popular. People prefer brace-based languages such as C, Java, Go, Zig, Rust.

Anything without braces isn't taken as seriously, eg. scripting languages.

As it is I enjoy the benefits myself.

That I /do/ believe - and I genuinely think it is great that you enjoy it.

I've had several opportunities to retire my language and switch to C.
Each time, I rejected that and chose to perservere with mine, despite
the extra problems of working with a language used by only one person on
the planet.

Then, because I genuinely considered it better, and now because I enjoy working at it and with it. Using C feels like driving a model T.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 26 21:43:59 2025

On 26/11/2025 19:42, bart wrote:

On 26/11/2025 16:37, David Brown wrote:

On 26/11/2025 16:44, bart wrote:

The "other people" I referred to are the folks behind the C language,
not me.

OK. The people who chose to make 'break' do two jobs, unfortunately in
parts of the language that can overlap in use; those people! (I guess
you mean the more recent lot.)

In C, the solution for my example might look like this:

�� double temp = x+y;
�� printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

No, that's not how a C solution would work.� People who know C would
know that.� As a challenge for you, see if you can spot your mistake.

This was my point. (Although I can't see the problem, making it even
more pertinent.)

So you can claim to have a "better" solution than C, without knowing how
to write it correctly in C?

(And of course if anyone wanted to do this stuff in real code, they'd
wrap things in a static inline "bit_range_extract" function.)

Also my point: everyone will invent their own incompatible solutions for this fundamental stuff.

It is not remotely fundamental. Extracting groups of bits from the representation of a type, especially a floating point type, is a niche operation. (It can be an important operation - such as for software
floating point routines. But the people who write those are few, and
they know what they are doing.)

You forgot about the type-punning part, which I guess needs yet another inlined function,

I didn't forget about anything. I didn't write the incorrect C code.

Rather more fiddly and error prone, and it needs an auxiliary
statement that makes it awkward to embed into an expression. (I also
had to think twice about that format code.)

BTW here is how my C transpiler translated it, so it /can/ be done
without explicit temporaries:

mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x +
y)),(i64)52,(i64)62),NULL);

Avoiding explicit temporaries is not a goal to aspire to - unless you
are trying to squeeze performance from a poorly optimising compiler.

The memory temp involved a declaration which needs to exist outside of
the expression in standard C. While type-punning in C either means
writing to a union, or using & and applying a cast.

"Type punning" refers to using a union to access or reinterpret the
underlying bit representation. Using references and a cast to do so is
UB, except when using pointers to character types. Neither involves
actually putting data into memory or the stack unless you are using a
compiler that can't optimise well - and then it is just a matter of less efficient generated code.

(My type-punning works on rvalues and will work on values in registers.)

No, what was shown was how _BitInt(12) could let people write clearer
C code than C without _BitInt.� There was no comparison to other
languages or other features.

But when it came my example, it could trivially be done with inline functions, just like this could.

Sure.

But as a systems language, it is much more enticing than C.

And that is presumably why it is so much more popular than C.

If it was generally available then I think quite a few would prefer it.

Sure.� Keep telling yourself that.

Well, it would be a minority. Grown-up languages with decent syntax
exist such as Ada and Fortran; those are not that popular. People prefer brace-based languages such as C, Java, Go, Zig, Rust.

Anything without braces isn't taken as seriously, eg. scripting languages.

What a /very/ strange way to distinguish or classify languages. And
what a bizarre way to generalise what people think, as though all
programmers share the same opinions.

As it is I enjoy the benefits myself.

That I /do/ believe - and I genuinely think it is great that you enjoy
it.

I've had several opportunities to retire my language and switch to C.
Each time, I rejected that and chose to perservere with mine, despite
the extra problems of working with a language used by only one person on
the planet.

Then, because I genuinely considered it better, and now because I enjoy working at it and with it. Using C feels like driving a model T.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Wed Nov 26 22:19:47 2025

On 26/11/2025 20:43, David Brown wrote:

On 26/11/2025 19:42, bart wrote:

On 26/11/2025 16:37, David Brown wrote:

On 26/11/2025 16:44, bart wrote:

The "other people" I referred to are the folks behind the C language,
not me.

OK. The people who chose to make 'break' do two jobs, unfortunately in
parts of the language that can overlap in use; those people! (I guess
you mean the more recent lot.)

In C, the solution for my example might look like this:

�� double temp = x+y;
�� printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

No, that's not how a C solution would work.� People who know C would
know that.� As a challenge for you, see if you can spot your mistake.

This was my point. (Although I can't see the problem, making it even
more pertinent.)

So you can claim to have a "better" solution than C, without knowing how
to write it correctly in C?

(And of course if anyone wanted to do this stuff in real code, they'd
wrap things in a static inline "bit_range_extract" function.)

Also my point: everyone will invent their own incompatible solutions
for this fundamental stuff.

It is not remotely fundamental.� Extracting groups of bits from the representation of a type, especially a floating point type, is a niche operation.

A bit like that BitInt(12) example then?

This is about a lower-level systems language working with primitive
machine types, and having access to the underlying bits of those types.

How much more fundamental can you get?

C provides only basic bitwise operators, and you have to do some
bit-fiddling, while trying to avoid UB, in order to extract or inject individual bits or bitfields.

I provide direct indexing ops to get or set any bit or bitfield, which
is actually a great core feature to have, but for some reason you want
to downplay it.

You might just admit for once that it is quite neat.

� (It can be an important operation - such as for software
floating point routines.

That particular task can be important for lots of reasons.

� But the people who write those are few, and
they know what they are doing.)

And I don't? I used to write FP emulation routines...

"Type punning" refers to using a union to access or reinterpret the underlying bit representation.� Using references and a cast to do so is
UB,

In C maybe, using your favoured compilers. In my implementations of C,
and in my languages, it is well defined, especially as it is
type-punning a 64-bit quantity to another 64-bit quantity.

(This is a great thing about creating your own implementations: you get
to say what is UB, which will be for genuine, not artificial ones
maintained so that C compilers can be one-up on each other.

As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source language,
known to be well-defined on their platforms of interest, but inbetween,
C says otherwise.)

Note that in the original example in my language, no references are used
(the code just copies a FP register to a GPR without conversion).

except when using pointers to character types.� Neither involves
actually putting data into memory or the stack unless you are using a compiler that can't optimise well - and then it is just a matter of less efficient generated code.

OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

Anything without braces isn't taken as seriously, eg. scripting
languages.

What a /very/ strange way to distinguish or classify languages.

It's an observation. Which languages that call themselves 'systems
languages' these days don't use braces?

� And
what a bizarre way to generalise what people think, as though all programmers share the same opinions.

You're welcome to do your own survey.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Wed Nov 26 17:04:06 2025

On 11/25/2025 5:38 AM, bart wrote:

On 25/11/2025 02:03, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 24/11/2025 14:41, David Brown wrote:

On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.

Is it that negligible? That's easy to say when you're not doing the
implementing! However it may impact on the size and performance of
code.

You're right, it's easy to say when I'm not doing the implementing.
Which I'm not.

The maintainers of gcc and llvm/clang have done that for me, so I don't
have to worry about it.

Are you planning to implement bit-precise integer types yourself?� I
don't think you've said so in this thread.� If you are, you have at
least two existing implementations you can look at for ideas.

No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits, and played with 1/2/4 bits, but my view is that above this range, using
exact bit-sizes is the wrong way to go.

On normal PC's, it is meh.

On FPGA's, more so the whole HLS (High Level Synthesis) thing, it is
much more significant.

Also it is a bridge that allows sensible mapping some Verilog semantics
onto C, which can in turn be made more efficient than "ye olde shifts
and masking". This is partly a case because the compiler has more
freedom to either use specific CPU features, or to implement the
constructs in ways that are more efficient but would impose too much
mental computational burden on normal programmers (such as shifts being relative to other shifts, and/or where the most efficient masking
strategy depends on the width of the type being masked, etc).

Though, granted, bolting a bunch of Verilog stuff onto C is also
nonstandard (and goes well beyond the scope of _BitInt). But, a lot of
it is stuff that wouldn't really make sense at all in C in the absence
of exact-width integers.

Though, the other parts of Verilog don't map over quite so easily...
always @(posedge clock)
...
... yeah ...

Ironically, had started looking into adding Verilog support to my
compiler (at the time hoping maybe to be able to implement something
that was less of a pain to debug on than Verilator), most I got here was
the idea that modules would be mapped onto classes and so each module
could be implemented as a class instance, with an internal run/step
method which would check variables and fire off any "always" blocks when appropriate.

The effort kinda stalled out at this stage though (and motivation
lessened when I actually found some of the bugs I had been looking for).

Some other functionality had ended up mapped onto C, some features (ironically) being useful in this C land, and others not so much.

Well, maybe some people could cheer for things like "casez()" or "__switchz()":
__switchz(val[15:0])
{
case 0bZZZZ_ZZZZ_ZZZZ_ZZZ0u16: ... matches everything with LSB clear
case 0bZZZZ_ZZZZ_ZZZZ_ZZ01u16: ... matches with LSB's as 01
case 0bZZZZ_ZZZZ_ZZZZ_Z011u16: ...
case 0b1111_ZZZZ_ZZZZ_0111u16: ... mastches 0111 and MSBs set to 1s.
}

Where, 0bZZZZ_ZZZZ_ZZZZ_Z011u16 is a C syntax analog of 16'bbZZZZ_ZZZZ_ZZZZ_Z011 (and in this case my compiler allows for either
_ or single quotes).

Though, implementing this in a way that is efficient is a harder problem
(much more complicated than a normal "switch()").

Though, had I gotten this part implemented, would still have also needed:
A high performance emulator (now partly written, but, would likely need
a full JIT compiler rather than a call-threading interpreter);
A better/more usable debugger (*).

*: My existing "jx2vm" emulator mostly dumps stuff if the emulator
exits, and has an integrated GDB style debugger, this still leaves
something to be desired.

So, more likely the desired debugger would likely be built on "x3vm",
but have not yet done so.

Also compiler needs to produce more complete debuginfo. As-is, it is outputting symbol maps (in nm notation, similar to that typically used
by the Linux kernel), with line-numbers in a slightly nonstandard way,
and some small about of STABS. Maybe weak, but currently the most
reachable strategy (contrast, GCC would typically put the debug info
inside the binary, either as STABS or DWARF depending on target, ...).
The debuginfo is still very incomplete, and I am also lacking a good
debugger here.

I had considered the possibility of going to a binary format for the map
files to save space, but for now they are still ASCII based (well, or
the possible lazier option of internally generating the map in ASCII
format, but then dumping it in gzip format or similar ".map.gz"; would
need to decompress them when loaded, but would leave an easy option for
a user to get back to an ASCII map file as needed). Including STABS
would add considerable bulk even vs just a normal symbol listing.

I have my own reasons for not wanting to put debuginfo inside the
binaries themselves. MSVC is kinda similar, just uses ".PDB" files instead.

While for odd sizes up to 64 bits, bitfields are more apt than employing
the type system.

This is missing the point of the purpose of _BitInt...

Here's an idea.� Rather than asserting that _BitInt(1'000'000)
is silly and obviously useless, try *asking* how it's useful.
I personally don't know what I'd do with a million-bit integer,
but maybe somebody out there has a valid use for it.� Meanwhile,
its existence doesn't bother me.

Again, my view is that types like _BitInt(123456) (could they have made
it any more fiddly to type?!) is the same mistake that early Pascal made with arrays.

It is common that an N-array of T and an M-array of T are not
compatible, but usually there are ways to deal generically with both.

For using them in a way that is useful for their intended purpose, there
need to be some constraints here.

But, alas, debating 1M bit values is a little moot in my case as the
compiler doesn't go quite that big.

Most cases where a giant _BitInt could make sense are better served by
not using _BitInt.

In this case, the limit is 16383 bits, but this is still bigger than
anything it really makes sense to do with _BitInt.

Also doesn't make sense for Verilog either; about as soon as you start
trying to use values this big, it is "gonna eat the FPGA".

Well, and for things like bignums, could instead make a case for a
dynamic typesystem and the ability for user code to plug new types into
said dynamic typesystem (and register operators for said types).

But, this is probably a feature that is unlikely to get added to mainline C.

Well, and probably about as soon as someone adds dynamic types, people
might start pushing for also adding a garbage collector, even if (thus
far) still no one has succeeded in making GC "not suck" (some may point
to JVM and .NET, but they mostly just made it "less obvious").

Contrast to my recent annoyance that Firefox is regularly stalling for extended periods of time for presumably GC related reasons (and has
seemingly failed to implement one that runs particularly fast).

My guess is that once you've implemented integers wider than 128
or 256 bits, million-bit integers aren't much extra effort.

I've implemented 128-bit arithmetic, and have seen some scary-looking C
code that implemented 256-bit arithmetic. Neither of those would scale
to N-bits where N can be arbitrary large /and/ might not be a multiple
of either 64 or 8.

You would need pretty much the same algorithms as used for arbitrary precision. Those usually require N to be some multiple of 'limb' size.

Note for example, say:
How do you think I would have implemented large _BitInt?...
Why is storage a multiple of 128-bits / 16 bytes?...
...

Well, internally in this case the compiler effectively just sorta
generates internal runtime calls.

In this case:
1-64 bits: Mostly native;
65-128: Semi-native mixed with runtime calls.
129-256: Runtime calls to fixed-width 256-bit handlers.
257-16383: Generic runtime calls that pass the width as an argument.

...

If the size isn't an exact multiple of the target size, one can pad it
up and and then sign or zero extend the high-order element.

Ironically, it is sorta like a partial inverse of "memcpy()" and
"memset()" with a fixed size:
Size is small, better to handle it inline;
Medium, use size-specialized handling;
Large, use a generic call (actually call "memcpy()").

Say:
<= 64 bytes: Inline
65-512 bytes: Call into fixed-size copies
Generally, copy any trailing bytes and then fall into a copy-slide.
> 512: Actually call "memcpy()".

Even for smaller types, there is no guarantee that it is not a runtime call.

Say, for example, on some random (non x86-64 or similar) target:
long x, y, z;
...
z=x/y;

How confident can you be that it is *not* just secretly calling a
runtime function?...

Even if the ISA has an instruction, there isn't much guarantee that it
is going to be faster than using a runtime call.

Or, say, the only time one can be semi-confident it is not a runtime
call is if it is "int/const" or similar (because the compiler can turn
this into multiply-by-reciprocal).

Say, a CPU being left with choices for how to implement divide:
Faster, but very expensive;
Say, for example, a radix divider or similar;
Medium cost, kinda slow: Hardware shift-and-subtract or similar;
Can give ~ 36 cycle 32-bit IDIV, and ~ 68 cycle 64-bit IDIV.
Cheap but slow: Trap and emulate, OS then uses shift-and-subtract.

In some cases, it being faster for the compiler to use runtime calls for divide and similar, since this can sidestep the performance cost of the emulation trap.

Except ironically, me shifting towards the counter stance for Binary128,
as Binary128 operations can be sufficiently slow that the code-density
savings can offset the (comparably modest in this case) performance cost
of the emulation traps (otherwise, code using "long double" or similar
having a whole lot of runtime calls; which cost around 8x the code
footprint of just pretending one has the corresponding instructions).

Also semi-relevant for RV64, which also lacks particularly good options
for implementing fast "__int128" support, which is (ironically) somewhat relevant for making the Binary128 FPU emulation not slow. Though, I also
took the non-standard stance of defining these operations as working on register pairs (in this case, defining FADD.Q and FMUL.Q as using
register pairs being the less-bad option than also pretending it has
128-bit FPR's; more so as in this case SIMD operations already use
register pairs for 128-bit SIMD, ...).

So, a "Pseudo-Q" defined in terms of:
CPU only actually implements F/D;
Define .Q operations as being allowed, but operating via traps.
Arguably poor, but cheapest option in this case.
Actually supporting Q extension would be too expensive.
I don't exactly expect Binary128 to suddenly become cheap either.

Though, for integer divide and similar, the relative cost is low enough
(and divide is common enough) that trap-and-emulate is rather painful.

So, in the absence of a HW divider, better to use a runtime call
(probably via a shift-and-subtract loop or similar).

But, divide is still rare enough to make it hard-pressed to justify the
cost of a more expensive HW divider (such as Radix-16 or Goldschmidt or similar). One might still be left with trap-and-emulate for FPU divide
and SQRT though.

Or, say, other wonk (for an ISA like RV64):
FDIV, FSQRT: Trap
FMADD/etc (when RM=DYN): Trap
Assuming here that DYN has the option of IEEE correct semantics.
Which means trapping on HW which doesn't have single-rounded FMA.

Mildly annoying then if building with GCC, one has to use special
options to disable FMA and tell it to use a runtime call for FDIV, etc,
in an attempt to avoid it stepping on cases that need emulation traps.

Well, and for sake of IEEE semantics:
Trapping on subnormal/denormal inputs;
Trapping with LOBs of inputs are non-zero for FMUL;
Partial width makes FMUL a lot cheaper for hardware;
But, then one needs to trap in some cases.
...

Well, and other fun corner cutting, like lacking hardware page walking
and reliance on a trap handler to deal with TLB misses; etc.

Well, for an ISA like RV, can also corner-cut things like supporting
options other than X0 and X1 for JAL and JALR (if Rd is not X0 or X1, trap).

Well, and if the CPU is being "extra budget", they might use trap and
emulate for misaligned loads/stores. Though, IMO this is getting a
little too budget (and there are a lot of things that can be implemented
more efficiently if one has at least semi-fast unaligned load/store).

...

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Thu Nov 27 01:05:03 2025

On 26/11/2025 23:04, BGB wrote:

On 11/25/2025 5:38 AM, bart wrote:

On 25/11/2025 02:03, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 24/11/2025 14:41, David Brown wrote:

On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.

Is it that negligible? That's easy to say when you're not doing the
implementing! However it may impact on the size and performance of
code.

You're right, it's easy to say when I'm not doing the implementing.
Which I'm not.

The maintainers of gcc and llvm/clang have done that for me, so I don't
have to worry about it.

Are you planning to implement bit-precise integer types yourself?� I
don't think you've said so in this thread.� If you are, you have at
least two existing implementations you can look at for ideas.

No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.

On normal PC's, it is meh.

On FPGA's, more so the whole HLS (High Level Synthesis) thing, it is
much more significant.

Also it is a bridge that allows sensible mapping some Verilog semantics
onto C, which can in turn be made more efficient than "ye olde shifts
and masking". This is partly a case because the compiler has more
freedom to either use specific CPU features, or to implement the
constructs in ways that are more efficient but would impose too much
mental computational burden on normal programmers (such as shifts being relative to other shifts, and/or where the most efficient masking
strategy depends on the width of the type being masked, etc).

Though, granted, bolting a bunch of Verilog stuff onto C is also
nonstandard (and goes well beyond the scope of _BitInt). But, a lot of
it is stuff that wouldn't really make sense at all in C in the absence
of exact-width integers.

Though, the other parts of Verilog don't map over quite so easily...
� always @(posedge clock)
�� ...
... yeah ...

Ironically, had started looking into adding Verilog support to my
compiler (at the time hoping maybe to be able to implement something
that was less of a pain to debug on than Verilator), most I got here was
the idea that modules would be mapped onto classes and so each module
could be implemented as a class instance, with an internal run/step
method which would check variables and fire off any "always" blocks when appropriate.

The effort kinda stalled out at this stage though (and motivation
lessened when I actually found some of the bugs I had been looking for).

Some other functionality had ended up mapped onto C, some features (ironically) being useful in this C land, and others not so much.

Well, maybe some people could cheer for things like "casez()" or "__switchz()":
� __switchz(val[15:0])
� {
�� case 0bZZZZ_ZZZZ_ZZZZ_ZZZ0u16: ... matches everything with LSB clear
�� case 0bZZZZ_ZZZZ_ZZZZ_ZZ01u16: ... matches with LSB's as 01
�� case 0bZZZZ_ZZZZ_ZZZZ_Z011u16: ...
�� case 0b1111_ZZZZ_ZZZZ_0111u16: ... mastches 0111 and MSBs set to 1s.
� }

Where, 0bZZZZ_ZZZZ_ZZZZ_Z011u16 is a C syntax analog of 16'bbZZZZ_ZZZZ_ZZZZ_Z011 (and in this case my compiler allows for either
_ or single quotes).

Though, implementing this in a way that is efficient is a harder problem (much more complicated than a normal "switch()").

Though, had I gotten this part implemented, would still have also needed:
A high performance emulator (now partly written, but, would likely need
a full JIT compiler rather than a call-threading interpreter);
A better/more usable debugger (*).

*: My existing "jx2vm" emulator mostly dumps stuff if the emulator
exits, and has an integrated GDB style debugger, this still leaves
something to be desired.

So, more likely the desired debugger would likely be built on "x3vm",
but have not yet done so.

Also compiler needs to produce more complete debuginfo. As-is, it is outputting symbol maps (in nm notation, similar to that typically used
by the Linux kernel), with line-numbers in a slightly nonstandard way,
and some small about of STABS. Maybe weak, but currently the most
reachable strategy (contrast, GCC would typically put the debug info
inside the binary, either as STABS or DWARF depending on target, ...).
The debuginfo is still very incomplete, and I am also lacking a good debugger here.

I had considered the possibility of going to a binary format for the map files to save space, but for now they are still ASCII based (well, or
the possible lazier option of internally generating the map in ASCII
format, but then dumping it in gzip format or similar ".map.gz"; would
need to decompress them when loaded, but would leave an easy option for
a user to get back to an ASCII map file as needed). Including STABS
would add considerable bulk even vs just a normal symbol listing.

I have my own reasons for not wanting to put debuginfo inside the
binaries themselves. MSVC is kinda similar, just uses ".PDB" files instead.

While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.

This is missing the point of the purpose of _BitInt...

Which is ... ?

From what I can gether, on ordinary computers, Bitint(N), for N of 1/2
to 63, is just rounded up to the next size of 8/16/32/64 bits, if N is
not already at that size.

That's if storage is involved.

The other aspect appears to be two-fold:

* BitInt(N) used as a cast on an ordinary value will zero- or
sign-extend the low N bits

* When reading from storage allocated with BitInt(N), is ensures only N
bits of info are retrieved, extended as necessary, even of more then N
bits were stored. This applies even if the storage was rounded up.

So it seems to be mainly about masking.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Thu Nov 27 02:18:03 2025

On 27/11/2025 01:30, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

And yet, integer widths have been roughly capped at double a machine
word size for decades - until 64 bits came along and then few even
bothered with double-width.

Nobody thought how easy it would be to just have an integer of whatever
size you like - you just generate whatever code is necessary to make it
happen. We could have had BitInts on 32- and even 16-bit machines if
only somebody had thought of it!

PL/I had things like 'fixed binary(23)' (that is ability to
specify bit size) around 1965, but that stopped at machine
word length. Pascal had range types, but similarly stopped
at at integer size.

What were the reasons for PL/I to use odd sizes not related to word size
or memory width?

GNU Pascal allowed specifiying size in
bits and going to twice machine word (that was limitation
imposed by gcc backend).

Before 64-bits, we needed double the word size in order to represent
ordinary quantities. With 64 bits, there is much less need (hence few
128-bit types).

And yes, such types could be added much earlier and it
is a shame that they are added only now.

So what is the pressing need now?

It is a mild convenience for those applications which really need
numbers of 100s of bits, but not what I would have thought were worth
making special provision for in a language.

While they would be unwieldy for very much larger numbers, and in any
case there are caps in place.

I can see some use when you want a block datatype or so many bytes
(sorry, bits, since it needs to be bit-precise even at the large scale), especially if some bitwise ops are available.

Eg. do some of the things that Pascal bit-sets were used for, but
there's still seems to be lots of support lacking.

So it still appears to me a rather heavyweight feature, in a lightweight language, that is lacking in everyday use-cases.

Part of reason may be that in nineties usage of other
(than C) lower level languages went down. C was
traditionally quite minimal and did not want new to
introduce new features.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Wed Nov 26 21:19:14 2025

On 2025-11-26 04:29, Michael S wrote:

On Tue, 25 Nov 2025 18:33:30 +0000
bart <bc@freeuk.com> wrote:

...

Then maybe C bitfields could be used, but a bigger problem with those
is poor control over layout, which is anyway implementation-defined.
(Mine of course don't have that problem!)

According to the language of The Standard, it's not 'poor control'.
As far as standard requirements goes, there is *no* control on layout of
bit fields.

C ;doesn't provide enough control over bit-field layouts to be useful,
but it's an exaggeration to say it provides no control at all:

"An implementation may allocate any addressable storage unit large
enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
A bit-field declaration with no declarator, but only a colon and a
width, indicates an unnamed bit-field.148) As a special case, a
bit-field structure member with a width of zero indicates that no
further bit-field is to be packed into the unit in which the previous bit-field, if any, was placed."

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Ike Naar@3:633/10 to All on Thu Nov 27 08:10:10 2025

On 2025-11-26, bart <bc@freeuk.com> wrote:

In C, the solution for my example might look like this:

double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

Rather more fiddly and error prone, and it needs an auxiliary statement
that makes it awkward to embed into an expression. (I also had to think twice about that format code.)

The ilogb() function from <math.h> extracts the exponent of a double.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thu Nov 27 11:43:39 2025

On 26/11/2025 23:19, bart wrote:

On 26/11/2025 20:43, David Brown wrote:

On 26/11/2025 19:42, bart wrote:

On 26/11/2025 16:37, David Brown wrote:

On 26/11/2025 16:44, bart wrote:

The "other people" I referred to are the folks behind the C
language, not me.

OK. The people who chose to make 'break' do two jobs, unfortunately
in parts of the language that can overlap in use; those people! (I
guess you mean the more recent lot.)

In C, the solution for my example might look like this:

�� double temp = x+y;
�� printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

No, that's not how a C solution would work.� People who know C would
know that.� As a challenge for you, see if you can spot your mistake.

This was my point. (Although I can't see the problem, making it even
more pertinent.)

So you can claim to have a "better" solution than C, without knowing
how to write it correctly in C?

(And of course if anyone wanted to do this stuff in real code,
they'd wrap things in a static inline "bit_range_extract" function.)

Also my point: everyone will invent their own incompatible solutions
for this fundamental stuff.

It is not remotely fundamental.� Extracting groups of bits from the
representation of a type, especially a floating point type, is a niche
operation.

A bit like that BitInt(12) example then?

Yes, using BitInt(12) is quite niche.

But when the people behind _BitInt() started thinking about what sizes
people might want, it quickly became clear that it would be vastly more
effort to try to define which sizes were useful. It was much simpler
and clearer just to support any size (up to an implementation-defined
limit). Most particular sizes, other than 8, 16, 32, 64, and perhaps
128, are going to be niche. But if one clear feature enables a large
number of niche uses, that's a good thing.

No one has suggested that _BitInt(12) is in any way a /necessary/
feature. And I certainly don't think a stand-alone proposal to add
12-bit types to C would have been accepted. But since it exists, it
will let me write slightly neater code in some cases - thus I will use
it when appropriate.

What I don't like about your bit extraction operations is that you have
an operator syntax for a fairly obscure and rarely used operation. A "bit_range_extract" standard library function would make more sense to
me, though I think shifting and masking works well enough for the few situations where you need it. A syntax that looks very much like array
access is not going to be helpful to people looking at the code - for general-purpose languages, most programmers will never see or use bit
ranges.

This is about a lower-level systems language working with primitive
machine types, and having access to the underlying bits of those types.

How much more fundamental can you get?

It is not fundamental for a low-level systems language. And this is a C
group - C is a language covering general application programming as well
as systems programming. I can agree that it can be a /useful/ operation
at times - useful enough to make it worth having a standard library
function (or macro - or, in your case, a keyword or built-in function).
But not "fundamental" or useful enough to make it operator based like that.

C provides only basic bitwise operators, and you have to do some bit-fiddling, while trying to avoid UB, in order to extract or inject individual bits or bitfields.

You make it sound difficult. It's not.

I provide direct indexing ops to get or set any bit or bitfield, which
is actually a great core feature to have, but for some reason you want
to downplay it.

You might just admit for once that it is quite neat.

I am sure it is very nice on the few occasions when it is useful.

� (It can be an important operation - such as for software floating
point routines.

That particular task can be important for lots of reasons.

� But the people who write those are few, and they know what they are
doing.)

And I don't? I used to write FP emulation routines...

The thing you always seem to forget, is that your languages are written
for /you/ - no one else. It doesn't make a difference whether something
is added /to/ the language or written in code /for/ the language. You
think other languages are missing critical features simply because there
is a thing that /you/ want to do that you added to your own language.
And you think other languages are overly complex or bloated because they
have features that you don't want to use.

That attitude is fine for your own personal specific language. If
that's how you like to do things, that's fine. But that's your own
little isolated world that does not compare to the wider world of other people, other programmers, other languages, other tools.

Imagine asking the regulars in this group what features or changes they
would like C to have in order to make C "perfect" for their uses,
regardless of everyone else, all existing code, all existing tools. We
could all fill pages with ideas. And if those were all added to C, the
result would be a language that made C++ look as easy as Logo, while
being riddled with inconsistencies and contradictions.

"Type punning" refers to using a union to access or reinterpret the
underlying bit representation.� Using references and a cast to do so
is UB,

In C maybe, using your favoured compilers.

In C, yes. This is comp.lang.c.

In my implementations of C,
and in my languages, it is well defined, especially as it is
type-punning a 64-bit quantity to another 64-bit quantity.

OK. But not in C.

(This is a great thing about creating your own implementations: you get
to say what is UB, which will be for genuine, not artificial ones
maintained so that C compilers can be one-up on each other.

Ah, so the many C compilers I have used over the decades were not
"genuine", and the many different processors I have used were all "artificial". Okay, that clears things up.

As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source language,
known to be well-defined on their platforms of interest, but inbetween,
C says otherwise.)

You've never really understood how languages are defined, have you?
With your own languages and tools, you don't have to - there is no need
for standards, specifications, or anything like that. You can just make
up what suits you at the time. The language is "defined" by what the implementation does. That's been very convenient for you, but it has
left you with serious misconceptions about how non-personal languages work.

Note that in the original example in my language, no references are used (the code just copies a FP register to a GPR without conversion).

except when using pointers to character types.� Neither involves
actually putting data into memory or the stack unless you are using a
compiler that can't optimise well - and then it is just a matter of
less efficient generated code.

OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

As you know, you use a union. So just to please you, here is your bit extraction - written as a one-line function (split over two lines for
Usenet) because you seem to think that kind of thing is important :

uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}

That compiles (with gcc on x86-64) to :

movq rax, xmm0
shr rax, 52
and eax, 2047
ret

There's nothing in C that suggests this must be put in memory or do
anything more than this.

Anything without braces isn't taken as seriously, eg. scripting
languages.

What a /very/ strange way to distinguish or classify languages.

It's an observation. Which languages that call themselves 'systems languages' these days don't use braces?

Ada? Forth?

It is certainly common for languages to use braces, simply because they
are a simple and unambiguous way to delimit blocks. They are widely
used in languages that might be called "systems languages", and widely
used in languages that might /not/ be called "systems languages" -
though I don't think there is any remotely clear definition or
distinction between "systems languages" and other languages.

� And what a bizarre way to generalise what people think, as though
all programmers share the same opinions.

You're welcome to do your own survey.

In what way are languages like Ada, Fortran, Python, Haskell, Erlang,
etc., "not taken seriously" ? /Who/ does not take them seriously? Who
takes "B" seriously but not Ruby, just because "B" uses braces and Ruby
does not?

<https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(syntax)#Block_delimitation>

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Thu Nov 27 12:20:32 2025

On 27/11/2025 10:43, David Brown wrote:

On 26/11/2025 23:19, bart wrote:

What I don't like about your bit extraction operations is that you have
an operator syntax for a fairly obscure and rarely used operation.

So shift and masking operations in C are obscure?!

� A

"bit_range_extract" standard library function would make more sense to
me, though I think shifting and masking works well enough for the few situations where you need it.� A syntax that looks very much like array access is not going to be helpful to people looking at the code - for general-purpose languages, most programmers will never see or use bit ranges.

The syntax actually comes from DEC Algol60 IIRC. It was used to access individual characters of a string, normally an indivisible type in that language, and I applied the same concept to bits of an integer.

How much more fundamental can you get?

It is not fundamental for a low-level systems language.

So bits are not fundamental either! But then, it has taken until C23 to standardise binary literals, and there is still no format code for
binary output.

� But the people who write those are few, and they know what they are
doing.)

And I don't? I used to write FP emulation routines...

The thing you always seem to forget, is that your languages are written
for /you/ - no one else.� It doesn't make a difference whether something
is added /to/ the language or written in code /for/ the language.� You
think other languages are missing critical features simply because there
is a thing that /you/ want to do that you added to your own language.
And you think other languages are overly complex or bloated because they have features that you don't want to use.

They frequently have advanced features while ignoring the basics.

Imagine asking the regulars in this group what features or changes they would like C to have in order to make C "perfect" for their uses,
regardless of everyone else, all existing code, all existing tools.� We could all fill pages with ideas.� And if those were all added to C, the result would be a language that made C++ look as easy as Logo, while
being riddled with inconsistencies and contradictions.

Yes, that's the trick. That's why a lot of features I've played with
have disappeared, while some have proved indispensable.

As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source language,
known to be well-defined on their platforms of interest, but
inbetween, C says otherwise.)

You've never really understood how languages are defined, have you? With your own languages and tools, you don't have to - there is no need for standards, specifications, or anything like that.� You can just make up
what suits you at the time.� The language is "defined" by what the implementation does.� That's been very convenient for you, but it has
left you with serious misconceptions about how non-personal languages work.

Here's a program in a very simple language, where all variables have
i64 type:

c = a + b

Here, the author has decreed that any overflow in this addition will
wrap (any overflow bits above 64 are lost). If directly compiled to x64
code it might use this (here 'a b c' are aliases for the registers where
they reside):

mov c, a
add c, b

Or on ARM64:

add c, a, b

Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:

int64_t a, b, c;
...
c = a + b;

But here, if a + b happens to overflow, it is UB, and for no good
reason. You have to fix it. This is where it can be harder to generate
HLL code than assembly!

*Now* do you understand? This is nothing to do with me or my personal languages, it is a problem for every language that transpiles to C,
where there is a mismatch between the sets of behaviour considered UB in
each.

OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

As you know, you use a union.� So just to please you, here is your bit extraction - written as a one-line function (split over two lines for Usenet) because you seem to think that kind of thing is important :

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}

That compiles (with gcc on x86-64) to :

��movq rax, xmm0
��shr rax, 52
��and eax, 2047
��ret

There's nothing in C that suggests this must be put in memory or do
anything more than this.

(This only seems to work with gcc. Clang and MSVS don't like it.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Thu Nov 27 12:46:01 2025

On 27/11/2025 02:32, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

This is about a lower-level systems language working with primitive
machine types, and having access to the underlying bits of those types.

How much more fundamental can you get?

C provides only basic bitwise operators, and you have to do some
bit-fiddling, while trying to avoid UB, in order to extract or inject
individual bits or bitfields.

I provide direct indexing ops to get or set any bit or bitfield, which
is actually a great core feature to have, but for some reason you want
to downplay it.

You might just admit for once that it is quite neat.

Yes, it is neat.

Hmm, perhaps you're being sincere, perhaps not ...

OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

#include <stdint.h>
#include <string.h>

uint64_t
d_to_u(double d) {
uint64_t tmp;
memcpy(&tmp, &d, sizeof(tmp));
return tmp;
}

int
f_exp(double d) {
return (d_to_u(d)>>52)&2047;
}

Using 'gcc -O' I get the following assembly (only code, without
unimportant directives/labels):

d_to_u:
movq %xmm0, %rax
ret

f_exp:
movq %xmm0, %rax
shrq $52, %rax
andl $2047, %eax
ret

As you can see 'd_to_u' is single computational instruction,
you can not do better given that floating point registers
are distinct from integer registers. And 'f_exp' looks
optimal assuming lack of "bit extract" or "extract exponent"
instructions.

Note that you can put both functions above in a header file,
so once you have written few lines above you can use them
in all your C code. Of course, efficientcy depends on
compiler optimization.

Yes (that's something I can't rely on).

These examples are interesting: with a HLL you normally express yourself
in a clear manner, and it is the compiler's job to generate the
complicated code required to implement what you mean.

Here it seems to be other way around: it is the programmer who writes
the convoluted code, and the compiler turns that into short, clear instructions! Which unfortunately no one will see.

If I use your functions like this:

a = f_exp(x + y);

then once the x+y result is in a register, gcc-O2 generates this inline
code for the extraction:

movq rax, xmm0
shr rax, 52
and eax, 2047

If I express it in my language:

a := int@(x + y).[52..62]

then my non-optimising compiler generates this (D0 is rax):

movq D0, XMM4
shr D0, 52
and D0, 2047

So such features have definite advantages, in being able to express
intent directly, and to make it easier for a simple compiler to know
that intent and help it generate reasonable code without lots of
analysis or needing function inlining.

BTW, your example explicitly writes to memory; David Brown posted a
version that didn't do so that I could see. Unless a compound literal is designed to be built in memory? However that version only seemed to work
with one compiler.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thu Nov 27 14:02:38 2025

On 27/11/2025 13:20, bart wrote:

On 27/11/2025 10:43, David Brown wrote:

On 26/11/2025 23:19, bart wrote:

What I don't like about your bit extraction operations is that you
have an operator syntax for a fairly obscure and rarely used operation.

So shift and masking operations in C are obscure?!

Both shift operators and bitwise operators have lots of other uses.

When you are designing a programming language, you first provide general features that can be used for multiple purposes. You only implement specialised features if the need arises - it is too cumbersome, or error-prone, or inefficient, or laborious to use the general features.

In some areas of C usage, shifts and masks - and bitfield extraction -
turn up quite a bit. But it seems the C operators work fine for the
task. It would not exactly be difficult to add a standard
"bit_range_extract" function to the C standard library, yet no one has
felt it to be worth the effort over the last 50 years. Perhaps it is
not as essential or fundamental as you think? Or perhaps C's current
features do the job well enough that there's no need for anything else?

� A

"bit_range_extract" standard library function would make more sense to
me, though I think shifting and masking works well enough for the few
situations where you need it.� A syntax that looks very much like
array access is not going to be helpful to people looking at the code
- for general-purpose languages, most programmers will never see or
use bit ranges.

The syntax actually comes from DEC Algol60 IIRC. It was used to access individual characters of a string, normally an indivisible type in that language, and I applied the same concept to bits of an integer.

I don't care if you found the syntax on the back of a cornflakes packet.
The origin is not relevant.

How much more fundamental can you get?

It is not fundamental for a low-level systems language.

So bits are not fundamental either! But then, it has taken until C23 to standardise binary literals, and there is still no format code for
binary output.

Very few programmers are at all interested in bits. A "double" holds a floating point value, not a pattern of bits. You are thinking on a
level of abstraction that is not realistic for most programming tasks.

� But the people who write those are few, and they know what they
are doing.)

And I don't? I used to write FP emulation routines...

The thing you always seem to forget, is that your languages are
written for /you/ - no one else.� It doesn't make a difference whether
something is added /to/ the language or written in code /for/ the
language.� You think other languages are missing critical features
simply because there is a thing that /you/ want to do that you added
to your own language. And you think other languages are overly complex
or bloated because they have features that you don't want to use.

They frequently have advanced features while ignoring the basics.

No - they frequently have features that /you/ call "advanced" because
you don't need or want them, and they ignore things that /you/ call
"basics" because you /do/ need or want them. It's all about /you/.

Imagine asking the regulars in this group what features or changes
they would like C to have in order to make C "perfect" for their uses,
regardless of everyone else, all existing code, all existing tools.
We could all fill pages with ideas.� And if those were all added to C,
the result would be a language that made C++ look as easy as Logo,
while being riddled with inconsistencies and contradictions.

Yes, that's the trick. That's why a lot of features I've played with
have disappeared, while some have proved indispensable.

As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source language,
known to be well-defined on their platforms of interest, but
inbetween, C says otherwise.)

You've never really understood how languages are defined, have you?
With your own languages and tools, you don't have to - there is no
need for standards, specifications, or anything like that.� You can
just make up what suits you at the time.� The language is "defined" by
what the implementation does.� That's been very convenient for you,
but it has left you with serious misconceptions about how non-personal
languages work.

Here's a� program in a very simple language, where all variables have
i64 type:

� c = a + b

Here, the author has decreed that any overflow in this addition will
wrap (any overflow bits above 64 are lost). If directly compiled to x64
code it might use this (here 'a b c' are aliases for the registers where they reside):

�� mov c, a
�� add c, b

Or on ARM64:

�� add c, a, b

Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:

�� int64_t a, b, c;
�� ...
�� c = a + b;

But here, if a + b happens to overflow, it is UB, and for no good
reason. You have to fix it. This is where it can be harder to generate
HLL code than assembly!

You are talking nonsense.

Either a + b results in the correct answer, or it does not. Any sane
person reads that as "a plus b" - mathematically adding two integers to
get their sum. That's what the programmer wants, and that's what they
ask for. And any sane programmer expects the language to give the
correct result within its limitations, but doe not expect it to do
magic. Expecting to form a sum that is greater than 2 ^ 63 and somehow produce the "correct" result is a total misunderstanding of mathematics
and programming - any primary school kid will tell you that using the
fingers of one hand, you can't add 3 and 4. They will /not/ tell you
that it's fine to add them on one hand because 3 + 4 is actually equal to 2.

*Now* do you understand? This is nothing to do with me or my personal languages, it is a problem for every language that transpiles to C,
where there is a mismatch between the sets of behaviour considered UB in each.

I understand that simple maths and common sense is beyond you. I
understand that you think mathematics should be defined in terms of
accidental byproducts of the way hardware logic designs happen to be implemented.

OK, so how would you do a 'reinterpret' cast in C, of a value like
'x+y'?

As you know, you use a union.� So just to please you, here is your bit
extraction - written as a one-line function (split over two lines for
Usenet) because you seem to think that kind of thing is important :

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}

That compiles (with gcc on x86-64) to :

��movq rax, xmm0
��shr rax, 52
��and eax, 2047
��ret

There's nothing in C that suggests this must be put in memory or do
anything more than this.

(This only seems to work with gcc. Clang and MSVS don't like it.)

I think you are mistaken. clang is fine with it. It is standard C99,
so any decent C compiler from the last 25 years will handle it fine. MS
gave up on bothering to make C compilers before the turn of the century
(they make a reasonable enough C++ compiler). Even your hero tcc is
fine with it (though on my attempts, it produces rubbish code - maybe it
needs different flags for optimisation). The C code is not made invalid
by the existence of C90-only compilers.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thu Nov 27 14:39:51 2025

On 27/11/2025 13:46, bart wrote:

On 27/11/2025 02:32, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

This is about a lower-level systems language working with primitive
machine types, and having access to the underlying bits of those types.

How much more fundamental can you get?

C provides only basic bitwise operators, and you have to do some
bit-fiddling, while trying to avoid UB, in order to extract or inject
individual bits or bitfields.

I provide direct indexing ops to get or set any bit or bitfield, which
is actually a great core feature to have, but for some reason you want
to downplay it.

You might just admit for once that it is quite neat.

Yes, it is neat.

Hmm, perhaps you're being sincere, perhaps not ...

OK, so how would you do a 'reinterpret' cast in C, of a value like
'x+y'?

#include <stdint.h>
#include <string.h>

uint64_t
d_to_u(double d) {
�� uint64_t tmp;
�� memcpy(&tmp, &d, sizeof(tmp));
�� return tmp;
}

int
f_exp(double d) {
�� return (d_to_u(d)>>52)&2047;
}

Using 'gcc -O' I get the following assembly (only code, without
unimportant directives/labels):

d_to_u:
�� movq�� %xmm0, %rax
�� ret

f_exp:
�� movq�� %xmm0, %rax
�� shrq�� $52, %rax
�� andl�� $2047, %eax
�� ret

As you can see 'd_to_u' is single computational instruction,
you can not do better given that floating point registers
are distinct from integer registers.� And 'f_exp' looks
optimal assuming lack of "bit extract" or "extract exponent"
instructions.

Note that you can put both functions above in a header file,
so once you have written few lines above you can use them
in all your C code.� Of course, efficientcy depends on
compiler optimization.

Yes (that's something I can't rely on).

These examples are interesting: with a HLL you normally express yourself
in a clear manner, and it is the compiler's job to generate the
complicated code required to implement what you mean.

Here it seems to be other way around: it is the programmer who writes
the convoluted code, and the compiler turns that into short, clear instructions! Which unfortunately no one will see.

I don't think Waldek's code (or mine) is particularly convoluted. But
in either case, you put such things in static inline functions (or
macros if you need to). Then you have clear intent when implementing
those functions - you are clearly doing low-level shifts and masking.
And you have clear intent when /using/ the functions - you are
extracting some bits from the underlying representation of the value.
You split things into identified functions with specific tasks - that's
at the heart of programming.

And then you let the automated computer system - the compiler - do what
it does best, and generate efficient results.

If I use your functions like this:

�� a = f_exp(x + y);

then once the x+y result is in a register, gcc-O2 generates this inline
code for the extraction:

�� movq�� rax, xmm0
�� shr rax, 52
�� and eax, 2047

If I express it in my language:

�� a := int@(x + y).[52..62]

then my non-optimising compiler generates this (D0 is rax):

�� movq�� D0,�� XMM4
�� shr�� D0,�� 52
�� and�� D0,�� 2047

So such features have definite advantages, in being able to express
intent directly, and to make it easier for a simple compiler to know
that intent and help it generate reasonable code without lots of
analysis or needing function inlining.

You seem to be arguing that it is a good thing to write code that
spoon-feeds the compiler so that the compiler doesn't have to do much
work. You get this because you are writing the application code and
also writing the compiler - so you pick the solution that gives you the
best results for the least effort overall. But that is only appropriate
for people with personal languages like yours.

It should be the other way round - the compiler should be optimising so
that the programmer can work at higher levels of abstraction or write
code in the way that is most convenient to them, and the compiler will
handle the boring low-level details. Programmers using serious
languages /can/ rely on the compiler optimising well.

BTW, your example explicitly writes to memory; David Brown posted a
version that didn't do so that I could see. Unless a compound literal is designed to be built in memory? However that version only seemed to work with one compiler.

The version I wrote is C99 and works fine with any C99 compiler. No,
compound literals are not "designed to be built in memory", whatever
that might mean. A compound literal is a value, and can be used like
any other value.

Waldek's version does use "memcpy" and pointers formed from the
addresses of a parameter and a local variable. That means it must give results as if the parameter and local variable were in memory somewhere
(the stack, in usual practice - though C does not actually require a
stack) and a memory-to-memory copy was carried out, byte by byte.
Critical here is the term "as if". If the compiler can give the same
results without using memory, it is allowed to do so - thus optimising compilers will just do a register transfer from a floating point
register to a general purpose register in this case.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Thu Nov 27 16:02:23 2025

On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 27/11/2025 13:20, bart wrote:

On 27/11/2025 10:43, David Brown wrote:

On 26/11/2025 23:19, bart wrote:

What I don't like about your bit extraction operations is that you
have an operator syntax for a fairly obscure and rarely used
operation.

So shift and masking operations in C are obscure?!

Both shift operators and bitwise operators have lots of other uses.

When you are designing a programming language, you first provide
general features that can be used for multiple purposes. You only
implement specialised features if the need arises - it is too
cumbersome, or error-prone, or inefficient, or laborious to use the
general features.

In some areas of C usage, shifts and masks - and bitfield extraction
- turn up quite a bit. But it seems the C operators work fine for
the task. It would not exactly be difficult to add a standard "bit_range_extract" function to the C standard library, yet no one
has felt it to be worth the effort over the last 50 years. Perhaps
it is not as essential or fundamental as you think? Or perhaps C's
current features do the job well enough that there's no need for
anything else?

? A

"bit_range_extract" standard library function would make more
sense to me, though I think shifting and masking works well enough
for the few situations where you need it.? A syntax that looks
very much like array access is not going to be helpful to people
looking at the code
- for general-purpose languages, most programmers will never see
or use bit ranges.

The syntax actually comes from DEC Algol60 IIRC. It was used to
access individual characters of a string, normally an indivisible
type in that language, and I applied the same concept to bits of an integer.

I don't care if you found the syntax on the back of a cornflakes
packet. The origin is not relevant.

How much more fundamental can you get?

It is not fundamental for a low-level systems language.

So bits are not fundamental either! But then, it has taken until
C23 to standardise binary literals, and there is still no format
code for binary output.

Very few programmers are at all interested in bits. A "double" holds
a floating point value, not a pattern of bits. You are thinking on a
level of abstraction that is not realistic for most programming tasks.

? But the people who write those are few, and they know what
they are doing.)

And I don't? I used to write FP emulation routines...

The thing you always seem to forget, is that your languages are
written for /you/ - no one else.? It doesn't make a difference
whether something is added /to/ the language or written in code
/for/ the language.? You think other languages are missing
critical features simply because there is a thing that /you/ want
to do that you added to your own language. And you think other
languages are overly complex or bloated because they have features
that you don't want to use.

They frequently have advanced features while ignoring the basics.

No - they frequently have features that /you/ call "advanced" because
you don't need or want them, and they ignore things that /you/ call
"basics" because you /do/ need or want them. It's all about /you/.

Imagine asking the regulars in this group what features or changes
they would like C to have in order to make C "perfect" for their
uses, regardless of everyone else, all existing code, all existing
tools. We could all fill pages with ideas.? And if those were all
added to C, the result would be a language that made C++ look as
easy as Logo, while being riddled with inconsistencies and
contradictions.

Yes, that's the trick. That's why a lot of features I've played
with have disappeared, while some have proved indispensable.

As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source
language, known to be well-defined on their platforms of
interest, but inbetween, C says otherwise.)

You've never really understood how languages are defined, have
you? With your own languages and tools, you don't have to - there
is no need for standards, specifications, or anything like that.
You can just make up what suits you at the time.? The language is
"defined" by what the implementation does.? That's been very
convenient for you, but it has left you with serious
misconceptions about how non-personal languages work.

Here's a? program in a very simple language, where all variables
have i64 type:

? c = a + b

Here, the author has decreed that any overflow in this addition
will wrap (any overflow bits above 64 are lost). If directly
compiled to x64 code it might use this (here 'a b c' are aliases
for the registers where they reside):

??? mov c, a
??? add c, b

Or on ARM64:

??? add c, a, b

Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:

??? int64_t a, b, c;
??? ...
??? c = a + b;

But here, if a + b happens to overflow, it is UB, and for no good
reason. You have to fix it. This is where it can be harder to
generate HLL code than assembly!

You are talking nonsense.

Either a + b results in the correct answer, or it does not. Any sane
person reads that as "a plus b" - mathematically adding two integers
to get their sum. That's what the programmer wants, and that's what
they ask for. And any sane programmer expects the language to give
the correct result within its limitations, but doe not expect it to
do magic. Expecting to form a sum that is greater than 2 ^ 63 and
somehow produce the "correct" result is a total misunderstanding of mathematics and programming - any primary school kid will tell you
that using the fingers of one hand, you can't add 3 and 4. They will
/not/ tell you that it's fine to add them on one hand because 3 + 4
is actually equal to 2.

*Now* do you understand? This is nothing to do with me or my
personal languages, it is a problem for every language that
transpiles to C, where there is a mismatch between the sets of
behaviour considered UB in each.

I understand that simple maths and common sense is beyond you. I
understand that you think mathematics should be defined in terms of accidental byproducts of the way hardware logic designs happen to be implemented.

OK, so how would you do a 'reinterpret' cast in C, of a value
like 'x+y'?

As you know, you use a union.? So just to please you, here is your
bit extraction - written as a one-line function (split over two
lines for Usenet) because you seem to think that kind of thing is
important :

uint64_t get_exponent(double x) {
???? return ((union { double d; uint64_t u;}) { x }.u >> 52)
????????????? & ((1ull << (62 - 52 + 1)) - 1

);

}

That compiles (with gcc on x86-64) to :

?????movq rax, xmm0
?????shr rax, 52
?????and eax, 2047
?????ret

There's nothing in C that suggests this must be put in memory or
do anything more than this.

(This only seems to work with gcc. Clang and MSVS don't like it.)

I think you are mistaken. clang is fine with it. It is standard
C99, so any decent C compiler from the last 25 years will handle it
fine. MS gave up on bothering to make C compilers before the turn of
the century (they make a reasonable enough C++ compiler). Even your
hero tcc is fine with it (though on my attempts, it produces rubbish
code - maybe it needs different flags for optimisation). The C code
is not made invalid by the existence of C90-only compilers.

MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret

Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely a bit
better, but the difference is unlikely to be detected by simple
measurements.

Also MSVC compiler does not like your style and produces following
warning:
dave_b.c(5): warning C4116: unnamed type definition in parentheses

BTW, I don't like your style either. My preferred code will look
very similar to the code of Waldek Hebisch except that I'd declare
d_to_u() static.
I don't like union trick. Not just in this particular context, but
generally. memcpy() much cleaner in expressing programmer's intentions.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Thu Nov 27 17:13:50 2025

On 27/11/2025 13:02, David Brown wrote:

On 27/11/2025 13:20, bart wrote:

In some areas of C usage, shifts and masks - and bitfield extraction -
turn up quite a bit.� But it seems the C operators work fine for the
task.� It would not exactly be difficult to add a standard "bit_range_extract" function to the C standard library, yet no one has
felt it to be worth the effort over the last 50 years.

That doesn't say much. Binary literals only became official in C23.

Width-specific integers only became standard in C99 (what did people use
in the preceding quarter-century?) and are not yet in the core language.

Such things as bit-extraction don't get prioritised because it so easy
for people to put together some crummy macro to do the job. But this
means everybody will create their own incompatible solutions.

Min/Max operators, or 'lengthof', will never get added for similar
reasons. But there was a time when every other code-base I looked at
defined its own MIN/MAX or Min/Max or min/max macros or functions.

Examples:

#define MZ_MAX(a, b) (((a) > (b)) ? (a) : (b)) (MZ lib)

# define MAX(A,B) ((A)>(B)?(A):(B)) (SQLite)

#define MAX(a,b) ((a) > (b) ? (a) : (b)) (LIBjpeg)

While everywhere you see patterns like:

sizeof(somearray/somearray[0])

which is crying out for standardisation.

Here, I guess 'no one has felt it to be worth the effort'. Except me.

The syntax actually comes from DEC Algol60 IIRC. It was used to access
individual characters of a string, normally an indivisible type in
that language, and I applied the same concept to bits of an integer.

I don't care if you found the syntax on the back of a cornflakes packet.
�The origin is not relevant.

Oh, I thought it was an automatic negative reaction from you to anything
I'd thought up. I guess you have it in for DEC too.

How much more fundamental can you get?

It is not fundamental for a low-level systems language.

So bits are not fundamental either! But then, it has taken until C23
to standardise binary literals, and there is still no format code for
binary output.

Very few programmers are at all interested in bits.

Unless it is that extra bit in _BitInt(65)! Then it is apparently vital.

� A "double" holds a

floating point value, not a pattern of bits.� You are thinking on a
level of abstraction that is not realistic for most programming tasks.

This is systems programming.

They frequently have advanced features while ignoring the basics.

No - they frequently have features that /you/ call "advanced" because
you don't need or want them, and they ignore things that /you/ call
"basics" because you /do/ need or want them.� It's all about /you/.

Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):

M C
-------------------------------------------------------------
A.len sizeof(A)/sizeof(A[0])

* max(a, b) (a > b ? a : b)

A.odd A & 1, or A % 1

A.even - you can do this one

A.msbit (A>>31) & 1, or (A>>63) & 1

2 ** n (1LL << n)
a ** b (int) pow(a, b) (ints cast to float, and float result)
x ** y (float) pow(x, y)

A.[i] = x - you can do this too; assume x is 0 or 1

A.[i..j] = x - yikes!

* if c in 'A'..'Z' if (c >= 'A' && c <= 'Z')

* if c in [cr, lf] if (c == cr || c == lf)

* if a = b = c if (a == b && b == c)

* swap(A[i+1], A[j]) {T temp=A[i+1]; A[i+1]=A[j]; A[j]=temp;}

abs(x) abs(x), labs(x), llabs(x), fabs(x) ...

println =a, =b printf("A=%X B=%Y\n", a, b); what are X, Y?

readln a, b - some scanf nonsense

(a,b):=c divrem d - involves div_t and div()

print "-" * 50 "----------------------- ... "

A[i, j] A[i][j]

byte unsigned char, uint8_t, _BitInt(8), char maybe

(* marks examples that are problemetic in C when operands that have side-effects are evaluated twice)

There are /dozens/ of examples like this that make small tasks a
pleasure to write, but also make them clearer, to the point, and less
error prone.

But let me guess, none of this cuts any ice at all. The C will always be superior.

You are talking nonsense.

End of discussion then. You either missed my point or chose to ignore it.

I understand that simple maths and common sense is beyond you.

More insults.

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}

That compiles (with gcc on x86-64) to :

��movq rax, xmm0
��shr rax, 52
��and eax, 2047
��ret

There's nothing in C that suggests this must be put in memory or do
anything more than this.

(This only seems to work with gcc. Clang and MSVS don't like it.)

I think you are mistaken.� clang is fine with it.� It is standard C99,
so any decent C compiler from the last 25 years will handle it fine.� MS gave up on bothering to make C compilers before the turn of the century (they make a reasonable enough C++ compiler).� Even your hero tcc is
fine with it (though on my attempts, it produces rubbish code - maybe it needs different flags for optimisation).� The C code is not made invalid
by the existence of C90-only compilers.

I was mistaken. I used godbolt.org but it was set to C++. Presumably gcc
has some C++ extensions that make it valid.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Ike Naar@3:633/10 to All on Thu Nov 27 17:38:03 2025

On 2025-11-27, bart <bc@freeuk.com> wrote:

Well, let's stick with C. Here are some features I use, and the C equivalents (A has whatever type is needed):

M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1

"A % 1" ?

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Thu Nov 27 17:59:19 2025

On 27/11/2025 17:38, Ike Naar wrote:

On 2025-11-27, bart <bc@freeuk.com> wrote:

Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):

M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1

"A % 1" ?

I guess A % 2 then.

Note my remark about error proneness later on.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thu Nov 27 21:15:53 2025

On 27/11/2025 15:02, Michael S wrote:

On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:

MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret

Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely a bit better, but the difference is unlikely to be detected by simple
measurements.

I think it is unlikely that this version - moving from xmm0 to rax via
memory instead of directly - is faster on any processor. But I fully
agree that it is unlikely to be a measurable difference in practice.

Also MSVC compiler does not like your style and produces following
warning:
dave_b.c(5): warning C4116: unnamed type definition in parentheses

Warnings are a matter of taste. There's nothing wrong with my code, but
it may be against some code styles.

BTW, I don't like your style either. My preferred code will look
very similar to the code of Waldek Hebisch except that I'd declare
d_to_u() static.
I don't like union trick. Not just in this particular context, but
generally. memcpy() much cleaner in expressing programmer's intentions.

I particularly don't like using unions in compound literals like this
either - it was just to make a compact demonstration. I'd write real
code in more re-usable bits with static inline functions.

I disagree, however, that memcpy() shows intent better. The intention
is not to copy it to memory - the intention is to access the underlying
bit representation as a different type. A type-punning union is at
least, if not more, clear for that purpose (IMHO - and judgements of
style and clarity are very much a matter of opinion).

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Fri Nov 28 00:15:07 2025

On Thu, 27 Nov 2025 21:15:53 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 27/11/2025 15:02, Michael S wrote:

On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:

MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret

Although on old AMD processors it is likely faster than nicer code generated by gcc and clang. On newer processor gcc code is likely a
bit better, but the difference is unlikely to be detected by simple measurements.

I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.

I wonder, how do you have a nerve "to think" about things that you have absolutely no idea about?

Instead of "thinking" you could just as well open Optimization
Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
Agner Fog's instruction tables. Move from XMM to GPR on these
processors is very slow: 8 clocks on BD, 7 on BbC.

BTW, AMD K8 has the opposite problem. Move from XMM to GPR is reasonably
fast, but move from GPR to XMM is painfully slow.

On the other hand, moves "via memory" are reasonably fast on these
CPUs (except, may be, Bobcat? I am not sure about it), because data
does not really travels through memory or through cache. Load-store
forwarding picks the data directly from the store queue.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thu Nov 27 15:59:13 2025

bart <bc@freeuk.com> writes:

On 27/11/2025 10:43, David Brown wrote:

[...]

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}
That compiles (with gcc on x86-64) to :
��movq rax, xmm0
��shr rax, 52
��and eax, 2047
��ret
There's nothing in C that suggests this must be put in memory or do
anything more than this.

(This only seems to work with gcc. Clang and MSVS don't like it.)

How exactly did clang and msvs express their dislike? What versions are
you using?

On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.

If your problem is that you're using older compilers that don't support compound literals, it would have saved some time if you had said so.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Fri Nov 28 00:11:53 2025

On 27/11/2025 23:59, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 27/11/2025 10:43, David Brown wrote:

[...]

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}
That compiles (with gcc on x86-64) to :
��movq rax, xmm0
��shr rax, 52
��and eax, 2047
��ret
There's nothing in C that suggests this must be put in memory or do
anything more than this.

(This only seems to work with gcc. Clang and MSVS don't like it.)

How exactly did clang and msvs express their dislike? What versions are
you using?

On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.

If your problem is that you're using older compilers that don't support compound literals, it would have saved some time if you had said so.

I said in a followup that I'd been using a C++ compiler by mistake (this
was on Godbolt).

That gcc's C++ compiler accepted the code wasn't helpful.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thu Nov 27 16:39:51 2025

bart <bc@freeuk.com> writes:

On 27/11/2025 23:59, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 27/11/2025 10:43, David Brown wrote:

[...]

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}

[...]

How exactly did clang and msvs express their dislike? What versions
are
you using?
On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
If your problem is that you're using older compilers that don't
support
compound literals, it would have saved some time if you had said so.

Can you *please* do something about the way your newsreader
(apparently Mozilla Thunderbird) mangles quoted text? That first
quoted line, starting with "> How exactly", would have been just
74 columns, but your newsreader folded it, making it more difficult
to read. It also deletes blank lines between paragraphs.

I don't recall similar problems from other Thunderbird users.

I said in a followup that I'd been using a C++ compiler by mistake
(this was on Godbolt).

That gcc's C++ compiler accepted the code wasn't helpful.

But not surprising, since as you know gcc (and likewise g++) is
not fully conforming by default. If you're compiling code with the
purpose of making a point about the language, invoke the compiler
in standard-conforming mode. And if a compiler "doesn't like"
the code you feed it, at least show us the diagnostic messages.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Fri Nov 28 01:49:44 2025

On 28/11/2025 00:39, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 27/11/2025 23:59, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 27/11/2025 10:43, David Brown wrote:

[...]

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}

[...]

How exactly did clang and msvs express their dislike? What versions
are
you using?
On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
If your problem is that you're using older compilers that don't
support
compound literals, it would have saved some time if you had said so.

Can you *please* do something about the way your newsreader
(apparently Mozilla Thunderbird) mangles quoted text? That first
quoted line, starting with "> How exactly", would have been just
74 columns, but your newsreader folded it, making it more difficult
to read. It also deletes blank lines between paragraphs.

I don't recall similar problems from other Thunderbird users.

I don't see anything amiss with quoted content in my own posts. My last
post looks like this to me:

https://github.com/sal55/langs/blob/master/tbird.png

In any case, I've no idea how to fix the problem, assuming it is at my end.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Fri Nov 28 03:33:49 2025

On 11/27/25 18:59, bart wrote:

On 27/11/2025 17:38, Ike Naar wrote:

On 2025-11-27, bart <bc@freeuk.com> wrote:

Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):

�� M�� C
�� -------------------------------------------------------------
[snip]
�� A.odd�� A & 1, or A % 1

"A % 1" ?

I guess A % 2 then.

You guess? - LOL - okay. :-)

Note my remark about error proneness later on.

Higher level abstractions (usually found in higher level languages)
are always less error prone than low-level (or composed) constructs.

"C" is inherently and by design a comparably low-level language, so
I wonder what you complain here about. (You won't change that.)

'even' and 'odd' are higher level abstractions than bit-operations,
and they are also _special cases_ (nonetheless useful; I like them,
and I appreciate if they are present in any language). The general
case of the terms like "odd" and "even" is defined mathematically,
though; so the natural way of describing them would (IMO) rather be
based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
syntax with '%' is probably more "appropriate". Mileages may vary.)

You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].) Omitting to add
special case operators or functions for things that can simply be
expressed by the respective arithmetic or boolean counterparts is
not an unreasonable language-detail design decision.[*]

You made a mistake above (or just a typo), never mind. I suppose it
stems from your primary "thinking in bits". - This is not meant to
be offensive. - Back in university days (I still remember!) I made
a similar typo but vice versa; I wanted to express "div 2" in some
assembler language and accidentally wrote "shift-right 2", the same
type of typo but the other way round. I *knew*, and didn't "guess",
though, that "shift-right 1" would have been correct. ;-)

Janis

[*] Compare to Algol 68 that introduced everything ("including the
kitchen sink"), and even in multiple variants! - A design decision
that is also not appreciated by everyone.

PS: BTW, I was always wondering why Pascal and Algol 68 supported
'odd' but not 'even'! - In the documents of the Genie compiler we
can read: "This is a relic of times long past.", but beyond that
it doesn't explain why it's a "relic". I can only guess that it's,
as a special case, considered just unnecessary in the presence of
the modulus operator.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thu Nov 27 19:36:22 2025

bart <bc@freeuk.com> writes:

On 28/11/2025 00:39, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 27/11/2025 23:59, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 27/11/2025 10:43, David Brown wrote:

[...]

uint64_t get_exponent(double x) {
�� return ((union { double d; uint64_t u;}) { x }.u >> 52)
�� & ((1ull << (62 - 52 + 1)) - 1);
}

[...]

How exactly did clang and msvs express their dislike? What versions
are
you using?
On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
If your problem is that you're using older compilers that don't
support
compound literals, it would have saved some time if you had said so.

Can you *please* do something about the way your newsreader
(apparently Mozilla Thunderbird) mangles quoted text? That first
quoted line, starting with "> How exactly", would have been just
74 columns, but your newsreader folded it, making it more difficult
to read. It also deletes blank lines between paragraphs.
I don't recall similar problems from other Thunderbird users.

I don't see anything amiss with quoted content in my own posts. My
last post looks like this to me:

https://github.com/sal55/langs/blob/master/tbird.png

In any case, I've no idea how to fix the problem, assuming it is at my end.

My apologies, the problem doesn't appear to be on your end.

I saved your post from my newsreader (Gnus), and the quoted text
was correctly formatted in the saved copy. The lines were not
unevenly wrapped, and blank lines between paragraphs were preserved.
The formatting is messed up when I view the article in Gnus, but ok
when I view it in Thunderbird.

Relevant headers in your article are:

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
...
User-Agent: Mozilla Thunderbird
...
Content-Language: en-GB

I think the "format=flowed" might be an issue (I suggest it's
not ideal for Usenet posts), but yours aren't the only posts that
use that.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Fri Nov 28 09:46:56 2025

On 27/11/2025 23:15, Michael S wrote:

On Thu, 27 Nov 2025 21:15:53 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 27/11/2025 15:02, Michael S wrote:

On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:

MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret

Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely a
bit better, but the difference is unlikely to be detected by simple
measurements.

I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.

I wonder, how do you have a nerve "to think" about things that you have absolutely no idea about?

I think about many things - and these are things I /do/ know about. But
I don't know all the details, and am happy to be corrected and learn more.

Instead of "thinking" you could just as well open Optimization
Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
Agner Fog's instruction tables. Move from XMM to GPR on these
processors is very slow: 8 clocks on BD, 7 on BbC.

Okay. But storing data to memory from xmm0 is also going to be slow,
and loading it to rax from memory is going to be slow. I am not an
expert at the x86 world or reading Fog's tables, but it looks to me that
on a Bulldozer, storing from xmm0 to memory has a latency of 6 cycles
and reading the memory into rax has a latency of 4 cycles. That adds up
to more than the 8 cycles for the direct register transfer, and I expect
(but do not claim to know for sure!) that the dependency limits the
scope for pipeline overlap - decode and address calculations can be
done, but the data can't be fetched until the previous store is complete.

So all in all, my estimate was, I think, quite reasonable. There may be unusual circumstances on particular cores if the instruction scheduling
and pipelining, combined with the stack engine, make that sequence
faster than the single register move.

I've now had a short look at the relevant table from Fog's site. My conclusion from that is that the register move - though surprisingly
slow - is probably marginally faster than passing it through memory.
Perhaps if I spend enough time studying the details, I might find out
more and discover that I was wrong. But that would be an extraordinary
effort to learn about a meaningless little detail of a long-gone processor.

I am also fairly confident that the function as a whole will be faster
with the register move since you will get better overlap and
superscaling with the call and return sequence when the instructions in
the middle don't access the stack.

Of curiosity, I compiled the code with gcc and "-march=bdver1", which I believe is the correct flag for that processor. It generated the
register move version, but with a "vmovq" instruction instead of "movq".
I don't know if there is any difference there - x86 instruction naming
seems to have a certain degree of variance. (gcc's models of
scheduling, pipelining and timing for processors is far from perfect,
but the gcc folks do study Agner Fog's publications as well as having contributors from AMD and Intel.)

More interesting, however, was that with "-march=bdver2" (up to bdver4)
gcc changed the "shr / and" sequence to a single "bextr" instruction. I didn't see that on other -march choices. It seems the two instruction shift-and-mask is faster than a single bit extract instruction on most
x86 processors.

All in all, it is a lesson on how small details of architectures can
make a difference.

BTW, AMD K8 has the opposite problem. Move from XMM to GPR is reasonably fast, but move from GPR to XMM is painfully slow.

On the other hand, moves "via memory" are reasonably fast on these
CPUs (except, may be, Bobcat? I am not sure about it), because data
does not really travels through memory or through cache. Load-store forwarding picks the data directly from the store queue.

Yes, and there can be even more specialised short-cuts for stack data.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Fri Nov 28 11:41:21 2025

On 27/11/2025 18:13, bart wrote:

On 27/11/2025 13:02, David Brown wrote:

On 27/11/2025 13:20, bart wrote:

I'm snipping most of this, because I don't think we are getting anywhere except down angry rabbit holes. Most of what we both have written has
been said many times before, and I don't want to re-hash old fights.
They bring out the worst in both of us, and we both get frustrated and annoyed. I'd rather reset the conversation before it gets out of hand,
and go back to exchanging opinions and ideas, and helping out.

(This only seems to work with gcc. Clang and MSVS don't like it.)

I think you are mistaken.� clang is fine with it.� It is standard C99,
so any decent C compiler from the last 25 years will handle it fine.
MS gave up on bothering to make C compilers before the turn of the
century (they make a reasonable enough C++ compiler).� Even your hero
tcc is fine with it (though on my attempts, it produces rubbish code -
maybe it needs different flags for optimisation).� The C code is not
made invalid by the existence of C90-only compilers.

I was mistaken. I used godbolt.org but it was set to C++. Presumably gcc
has some C++ extensions that make it valid.

You are not the first person to mix that up on godbolt.org, with a
different language and/or compiler from what you thought you had.

I usually make a point of explicitly specifying the standard in the
command line arguments - that means there is no doubt about what I am
asking for. And if you specify a C standard with g++, you will get an
error message (unless you also use "-x c" to tell g++ that you have C code).

My standard options are :

-std=c17 -Wall -Wextra -Wpedantic -O2

Of course I will vary the standard according to need - so for looking at _BitInt, I have -std=c23. I sometimes use -std=gnu17 or similar when I specifically want to use gcc extensions - in which case "-Wpedantic" is basically pointless. And for C++, I use appropriate C++ standards.
Note that without an explicit "-std=" option, gcc will use a "gnuXX"
version that depends on the compiler version. Thus gcc extensions are accepted by default.

"-Wall -Wextra" enable lots of warnings. For real work, I have
fine-tuned warning sets - I don't want all of these sets, and I want
some warnings that are not in these sets, but they give a good starting
point for code snippets on godbolt.

"-Wpedantic" gives warnings on deviations from the standard. It will
give you warnings if you accidentally use a gcc extension (such as using compound literals in C++). I don't think gcc is perfectly conforming
with "-std=c?? -Wpedantic", but it is as close as any other compiler I
have seen, and is IMHO the best starting point for checks.

And I almost always have -O2. -O3 can sometimes lead to overwhelming
amounts of extra inline code that make the assembly hard to follow. -O0 generates unreadably bad assembly. -O1 is easier to follow. But for
me, -O2 is generally the sweet spot. I have no real interest in using a compiler that doesn't do decent optimisation - if I am happy with slow
code, I'll use Python.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Fri Nov 28 13:12:17 2025

On Fri, 28 Nov 2025 09:46:56 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 27/11/2025 23:15, Michael S wrote:

On Thu, 27 Nov 2025 21:15:53 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 27/11/2025 15:02, Michael S wrote:

On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:

MSVC compilers compile your code and produce correct result, but
the code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret

Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely
a bit better, but the difference is unlikely to be detected by
simple measurements.

I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.

I wonder, how do you have a nerve "to think" about things that you
have absolutely no idea about?

I think about many things - and these are things I /do/ know about.
But I don't know all the details, and am happy to be corrected and
learn more.

Instead of "thinking" you could just as well open Optimization
Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
Agner Fog's instruction tables. Move from XMM to GPR on these
processors is very slow: 8 clocks on BD, 7 on BbC.

Okay. But storing data to memory from xmm0 is also going to be slow,
and loading it to rax from memory is going to be slow. I am not an
expert at the x86 world or reading Fog's tables, but it looks to me
that on a Bulldozer, storing from xmm0 to memory has a latency of 6
cycles and reading the memory into rax has a latency of 4 cycles.
That adds up to more than the 8 cycles for the direct register
transfer, and I expect (but do not claim to know for sure!) that the dependency limits the scope for pipeline overlap - decode and address calculations can be done, but the data can't be fetched until the
previous store is complete.

So all in all, my estimate was, I think, quite reasonable. There may
be unusual circumstances on particular cores if the instruction
scheduling and pipelining, combined with the stack engine, make that
sequence faster than the single register move.

It seems, you are correct in this particular case.
Latency tables, esp. those that are measured by software rather
than supplied by designer, are problematic in case of moves between
registers of different types, memory stores of all types and even
memory loads, with exception of memory load into GPR. Agner explains why
they are problematic in te preface to his tables. In short, there is no
direct way to measure this things in isolation, so one has to measure
latency of the sequence of instructions and then to apply either
guesswork or manufacturer's docs to somehow divide the combined
latency into individual parts.

So, the best way is to go by recommendations of the vendor in Opt.
Reference Manual.
There are no relevant recommendations for K8, unfortunately. I suspect
that all methods are slow here.
For Bobcat, there should be recommendations, but I don't have them and
too lazy to look for.

For Family 10h (Barcelona and derivatives):
"When moving data from a GPR to an MMX or XMM register, use separate
store and load instructions to move the data first from the source
register to a temporary location in memory and then from memory into
the destination register, taking the memory latency into account when scheduling both stages of the load-store sequence.

When moving data from an MMX or XMM register to a general-purpose
register, use the MOVD instruction.

Whenever possible, use loads and stores of the same data length. (See
5.3, ?Store-to-Load Forwarding Restrictions? on page 74 for
more
information.)"

For Family 15h (Bullozer and derivatives):
"When moving data from a GPR to an XMM register, use separate store and
load instructions to move the data first from the source register to a temporary location in memory and then from memory into the destination register, taking the memory latency into account when scheduling both
stages of the load-store sequence.

When moving data from an XMM register to a general-purpose register,
use the VMOVD instruction.

Whenever possible, use loads and stores of the same data length. (See
6.3, ?Store-to-Load Forwarding Restrictions? on page 98 for
more
information.)"

So, for both families, vendor recommends register move in direction from
SIMD to GPR and Store/Load sequence in direction from GPR to SIMD.
The suspect point here is specific mentioning of EVEX-encoded form
(VMOVD) in case of BD. It can mean that "legacy" (SSE-encoded) form is
slower or it can mean nothing. I suspect the latter.

I've now had a short look at the relevant table from Fog's site. My conclusion from that is that the register move - though surprisingly
slow - is probably marginally faster than passing it through memory.
Perhaps if I spend enough time studying the details, I might find out
more and discover that I was wrong. But that would be an
extraordinary effort to learn about a meaningless little detail of a long-gone processor.

I am also fairly confident that the function as a whole will be
faster with the register move since you will get better overlap and superscaling with the call and return sequence when the instructions
in the middle don't access the stack.

Of curiosity, I compiled the code with gcc and "-march=bdver1", which
I believe is the correct flag for that processor. It generated the
register move version, but with a "vmovq" instruction instead of
"movq". I don't know if there is any difference there - x86
instruction naming seems to have a certain degree of variance.
(gcc's models of scheduling, pipelining and timing for processors is
far from perfect, but the gcc folks do study Agner Fog's publications
as well as having contributors from AMD and Intel.)

More interesting, however, was that with "-march=bdver2" (up to
bdver4) gcc changed the "shr / and" sequence to a single "bextr"
instruction. I didn't see that on other -march choices. It seems
the two instruction shift-and-mask is faster than a single bit
extract instruction on most x86 processors.

All in all, it is a lesson on how small details of architectures can
make a difference.

Zen3 has its own can of worms in the area of moving data between
GPR and SIMD. The issues here are more subtle than those mentioned
above. And unfortunately almost completely non-documented in the
manuals. And despite that issues are subtle, performance impact can be
very significant.

I encountered these things when implementing alternative
(to those currently in use by gcc) IEEE binary128 arithmetic routines.
My conclusion was that designers of binary128 ABI in general and of ABI
of support routines in particular made a serious mistake by treating
binary128 (a.k.a. __float128, a.k.a _Float128, a.k.a. 'long double' on
ARM64) as "floating-point" type that is passed around in XMM registers
(or Neon registers on ARM64). Both passing it in pair of GPRs and via
memory would be significantly faster on AMD processors and detectably
faster on Intel processors.

BTW, AMD K8 has the opposite problem. Move from XMM to GPR is
reasonably fast, but move from GPR to XMM is painfully slow.

On the other hand, moves "via memory" are reasonably fast on these
CPUs (except, may be, Bobcat? I am not sure about it), because data
does not really travels through memory or through cache. Load-store forwarding picks the data directly from the store queue.

Yes, and there can be even more specialised short-cuts for stack data.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Fri Nov 28 12:45:58 2025

On 28/11/2025 12:12, Michael S wrote:

On Fri, 28 Nov 2025 09:46:56 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 27/11/2025 23:15, Michael S wrote:

On Thu, 27 Nov 2025 21:15:53 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 27/11/2025 15:02, Michael S wrote:

On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:

MSVC compilers compile your code and produce correct result, but
the code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret

Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely
a bit better, but the difference is unlikely to be detected by
simple measurements.

I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.

I wonder, how do you have a nerve "to think" about things that you
have absolutely no idea about?

I think about many things - and these are things I /do/ know about.
But I don't know all the details, and am happy to be corrected and
learn more.

Instead of "thinking" you could just as well open Optimization
Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
Agner Fog's instruction tables. Move from XMM to GPR on these
processors is very slow: 8 clocks on BD, 7 on BbC.

Okay. But storing data to memory from xmm0 is also going to be slow,
and loading it to rax from memory is going to be slow. I am not an
expert at the x86 world or reading Fog's tables, but it looks to me
that on a Bulldozer, storing from xmm0 to memory has a latency of 6
cycles and reading the memory into rax has a latency of 4 cycles.
That adds up to more than the 8 cycles for the direct register
transfer, and I expect (but do not claim to know for sure!) that the
dependency limits the scope for pipeline overlap - decode and address
calculations can be done, but the data can't be fetched until the
previous store is complete.

So all in all, my estimate was, I think, quite reasonable. There may
be unusual circumstances on particular cores if the instruction
scheduling and pipelining, combined with the stack engine, make that
sequence faster than the single register move.

It seems, you are correct in this particular case.
Latency tables, esp. those that are measured by software rather
than supplied by designer, are problematic in case of moves between
registers of different types, memory stores of all types and even
memory loads, with exception of memory load into GPR. Agner explains why
they are problematic in te preface to his tables. In short, there is no direct way to measure this things in isolation, so one has to measure
latency of the sequence of instructions and then to apply either
guesswork or manufacturer's docs to somehow divide the combined
latency into individual parts.

Well, if even Agner thinks it is difficult, then I don't feel bad for
having trouble!

So, the best way is to go by recommendations of the vendor in Opt.
Reference Manual.
There are no relevant recommendations for K8, unfortunately. I suspect
that all methods are slow here.
For Bobcat, there should be recommendations, but I don't have them and
too lazy to look for.

Fair enough. It is not information that is likely to be useful to
anyone here, so it's all for fun and interest. I certainly wouldn't
want you to spend effort finding out the details just for me.

For Family 10h (Barcelona and derivatives):
"When moving data from a GPR to an MMX or XMM register, use separate
store and load instructions to move the data first from the source
register to a temporary location in memory and then from memory into
the destination register, taking the memory latency into account when scheduling both stages of the load-store sequence.

When moving data from an MMX or XMM register to a general-purpose
register, use the MOVD instruction.

Whenever possible, use loads and stores of the same data length. (See
5.3, ?Store-to-Load Forwarding Restrictions? on page 74 for more information.)"

How much does advice like this take into account surrounding code?
That's what makes generating optimal code /really/ hard. And it means micro-optimising a short instruction sequence can be ineffective for real-world code. After all, no one is actually interested in minimising
the number of nanoseconds it takes to extract the exponent of a floating
point number - the speed only matters if you are doing lots of these,
probably in a big loop with data moving into and out of memory all the time.

This stuff was all /so/ much easier when we used PIC's and AVR's...

For Family 15h (Bullozer and derivatives):
"When moving data from a GPR to an XMM register, use separate store and
load instructions to move the data first from the source register to a temporary location in memory and then from memory into the destination register, taking the memory latency into account when scheduling both
stages of the load-store sequence.

When moving data from an XMM register to a general-purpose register,
use the VMOVD instruction.

Whenever possible, use loads and stores of the same data length. (See
6.3, ?Store-to-Load Forwarding Restrictions? on page 98 for more information.)"

So, for both families, vendor recommends register move in direction from
SIMD to GPR and Store/Load sequence in direction from GPR to SIMD.
The suspect point here is specific mentioning of EVEX-encoded form
(VMOVD) in case of BD. It can mean that "legacy" (SSE-encoded) form is
slower or it can mean nothing. I suspect the latter.

I've now had a short look at the relevant table from Fog's site. My
conclusion from that is that the register move - though surprisingly
slow - is probably marginally faster than passing it through memory.
Perhaps if I spend enough time studying the details, I might find out
more and discover that I was wrong. But that would be an
extraordinary effort to learn about a meaningless little detail of a
long-gone processor.

I am also fairly confident that the function as a whole will be
faster with the register move since you will get better overlap and
superscaling with the call and return sequence when the instructions
in the middle don't access the stack.

Of curiosity, I compiled the code with gcc and "-march=bdver1", which
I believe is the correct flag for that processor. It generated the
register move version, but with a "vmovq" instruction instead of
"movq". I don't know if there is any difference there - x86
instruction naming seems to have a certain degree of variance.
(gcc's models of scheduling, pipelining and timing for processors is
far from perfect, but the gcc folks do study Agner Fog's publications
as well as having contributors from AMD and Intel.)

More interesting, however, was that with "-march=bdver2" (up to
bdver4) gcc changed the "shr / and" sequence to a single "bextr"
instruction. I didn't see that on other -march choices. It seems
the two instruction shift-and-mask is faster than a single bit
extract instruction on most x86 processors.

All in all, it is a lesson on how small details of architectures can
make a difference.

Zen3 has its own can of worms in the area of moving data between
GPR and SIMD. The issues here are more subtle than those mentioned
above. And unfortunately almost completely non-documented in the
manuals. And despite that issues are subtle, performance impact can be
very significant.

I encountered these things when implementing alternative
(to those currently in use by gcc) IEEE binary128 arithmetic routines.
My conclusion was that designers of binary128 ABI in general and of ABI
of support routines in particular made a serious mistake by treating binary128 (a.k.a. __float128, a.k.a _Float128, a.k.a. 'long double' on
ARM64) as "floating-point" type that is passed around in XMM registers
(or Neon registers on ARM64). Both passing it in pair of GPRs and via
memory would be significantly faster on AMD processors and detectably
faster on Intel processors.

I can believe that. If you have to implement floating point routines in general integer hardware (and I expect that is the case for most of your implementation here) then I would think it is better to start and end
with the data in GPR's. On some targets, moving data into and out of
floating point or vector registers is efficient enough that those
registers can effectively be used as caches, but it sounds like that is
not the case here.

BTW, AMD K8 has the opposite problem. Move from XMM to GPR is
reasonably fast, but move from GPR to XMM is painfully slow.

On the other hand, moves "via memory" are reasonably fast on these
CPUs (except, may be, Bobcat? I am not sure about it), because data
does not really travels through memory or through cache. Load-store
forwarding picks the data directly from the store queue.

Yes, and there can be even more specialised short-cuts for stack data.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Fri Nov 28 11:49:40 2025

On 28/11/2025 02:33, Janis Papanagnou wrote:

On 11/27/25 18:59, bart wrote:

On 27/11/2025 17:38, Ike Naar wrote:

On 2025-11-27, bart <bc@freeuk.com> wrote:

Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):

�� M�� C
�� -------------------------------------------------------------
[snip]
�� A.odd�� A & 1, or A % 1

"A % 1" ?

I guess A % 2 then.

You guess? - LOL - okay. :-)

Note my remark about error proneness later on.

Higher level abstractions (usually found in higher level languages)
are always less error prone than low-level (or composed) constructs.

"C" is inherently and by design a comparably low-level language, so
I wonder what you complain here about. (You won't change that.)

So is mine. But it has many more 'commodity' features that make life
simpler. Plus a generally cleaner syntax to make it clearer.

'even' and 'odd' are higher level abstractions than bit-operations,
and they are also _special cases_ (nonetheless useful; I like them,
and I appreciate if they are present in any language). The general
case of the terms like "odd" and "even" is defined mathematically,
though;

The advantage of using '.odd' is that the language doesn't specify how
it works, just the behaviour.

(But internally, 'A.odd' is an alias for 'A.[0]', and 'A.even' is one
for 'not A.[0]', but with the extra proviso that these are read-only:
while `A.[0] := x' is possible, you can't do 'A.odd := x'.)

so the natural way of describing them would (IMO) rather be
based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
syntax with '%' is probably more "appropriate". Mileages may vary.)

I've made the mistake with % 1 more than once.

You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)

It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it could
be used for both, but that needs some special syntax, such as Ada-style "A'odd"; I've never got around to it.

In Pascal (where I copied it from) it is a reserved word.

You made a mistake above (or just a typo), never mind. I suppose it
stems from your primary "thinking in bits". - This is not meant to
be offensive. - Back in university days (I still remember!) I made
a similar typo but vice versa; I wanted to express "div 2" in some
assembler language and accidentally wrote "shift-right 2", the same
type of typo but the other way round. I *knew*, and didn't "guess",
though, that "shift-right 1" would have been correct. ;-)

I use a decimal type in another language. There bitwise operations don't
work. I would have to define what they might do. For example, the possibilities for `123 << 2` might be:

- Not valid (how it works now)
- 12300 (shift decimal digits)
- 492 (shift 'binary' digits)

That last simply defines as 'A << n' as meaning 'A * 2**n'.

PS: BTW, I was always wondering why Pascal and Algol 68 supported
'odd' but not 'even'! - In the documents of the Genie compiler we
can read: "This is a relic of times long past.", but beyond that
it doesn't explain why it's a "relic". I can only guess that it's,
as a special case, considered just unnecessary in the presence of
the modulus operator.

Maybe because you can trivially define 'even' as 'not odd'.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Fri Nov 28 15:33:55 2025

On Fri, 28 Nov 2025 12:45:58 +0100
David Brown <david.brown@hesbynett.no> wrote:

I can believe that. If you have to implement floating point routines
in general integer hardware (and I expect that is the case for most
of your implementation here) then I would think it is better to start
and end with the data in GPR's. On some targets, moving data into
and out of floating point or vector registers is efficient enough
that those registers can effectively be used as caches, but it sounds
like that is not the case here.

On Windows the problem is only of moving data between various types of registers.
On SysV things are worse: there is also a problem of absence of
caller-saved FP/SIMD registers. In theory, the problem could have been
solved by defining specialized ABI for support routines (__addtf3,
__subtf3, __multf3, etc...), but that was not done either.

I think, that it all comes from the old mental model of soft floating
point routines being very slow; so slow that ABI impedance mismatches
lost in noise. But in specific case of binary128 on modern CPUs, it's
simply not true - arithmetic itself is quite fast so ABI mismatches are significant.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Fri Nov 28 14:46:08 2025

On 28/11/2025 11:49, bart wrote:

On 28/11/2025 02:33, Janis Papanagnou wrote:

On 11/27/25 18:59, bart wrote:

On 27/11/2025 17:38, Ike Naar wrote:

On 2025-11-27, bart <bc@freeuk.com> wrote:

Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):

�� M�� C
�� -------------------------------------------------------------
[snip]
�� A.odd�� A & 1, or A % 1

"A % 1" ?

I guess A % 2 then.

You guess? - LOL - okay. :-)

Note my remark about error proneness later on.

Higher level abstractions (usually found in higher level languages)
are always less error prone than low-level (or composed) constructs.

"C" is inherently and by design a comparably low-level language, so
I wonder what you complain here about. (You won't change that.)

So is mine. But it has many more 'commodity' features that make life simpler. Plus a generally cleaner syntax to make it clearer.

I didn't answer your (JP's) question.

When I mention such micro-features of mine, the response is always overwhelmingly negative (even if I subsequently reveal they are inspired
by other languages).

In this thread, in response to a use-case of small BitInt types, I
suggested a more general set of bit-operations that didn't involve
emplying the type system.

But apparently, even in the world's most famous and truly 'bare-metal'
systems language, accessing the underlying bits of machine types is a
rarely used, niche feature.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Fri Nov 28 15:47:41 2025

On 28/11/2025 14:33, Michael S wrote:

On Fri, 28 Nov 2025 12:45:58 +0100
David Brown <david.brown@hesbynett.no> wrote:

I can believe that. If you have to implement floating point routines
in general integer hardware (and I expect that is the case for most
of your implementation here) then I would think it is better to start
and end with the data in GPR's. On some targets, moving data into
and out of floating point or vector registers is efficient enough
that those registers can effectively be used as caches, but it sounds
like that is not the case here.

On Windows the problem is only of moving data between various types of registers.
On SysV things are worse: there is also a problem of absence of
caller-saved FP/SIMD registers. In theory, the problem could have been
solved by defining specialized ABI for support routines (__addtf3,
__subtf3, __multf3, etc...), but that was not done either.

I think, that it all comes from the old mental model of soft floating
point routines being very slow; so slow that ABI impedance mismatches
lost in noise. But in specific case of binary128 on modern CPUs, it's
simply not true - arithmetic itself is quite fast so ABI mismatches are significant.

My only real experience with software floating point (using it, not
writing it) is on systems where they are either slow (like 32-bit
Cortex-M ARMs), or /very/ slow (like an 8-bit AVR). A little
inefficiency in the main ABI's is, as you say, just noise in these cases.

But in those systems, the floating point arithmetic routines were part
of the compiler support library. Functions there don't have to abide by
the platform ABI - they can use different registers according to what
suits best. Were you working on a library that integrates into the
compiler, or was it more "user level" (like a C++ "binary128" class with operator overrides) ?

ABI's are obviously useful for standardisation and intermixing of code
from different tools. But they can also be a pain, especially when they
are old and outdated or designed to be efficient on different processors
or with different kinds of code. I am finding the EABI for 32-bit ARM
to be a serious performance drain for some kinds of work. It doesn't
support passing anything bigger than 32-bit in registers, except for
"long long int" and "unsigned long long int". It has the same
restriction on return values. That means if you have something like a
C++ optional<uint32_t> type, or equivalent struct in C, it's all passed
back and forth on the stack. And unlike the AMD processors you mention,
on a Cortex-M core that is a lot slower!

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Fri Nov 28 09:48:06 2025

On 2025-11-27 12:38, Ike Naar wrote:

On 2025-11-27, bart <bc@freeuk.com> wrote:

Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):

M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1

"A % 1" ?

Probably a typo for A % 2.

Note to bart: A%2 has a value of -1 for odd negative numbers. In many
contexts (#if, !, &&, ||, ?:, if(), for(), while(), do while(),
assert(), or static_assert()), all that matters is that it's not equal
to 0. However, any time you're looking at the actual value, A&1 and A%2
are not equivalent.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Fri Nov 28 13:09:15 2025

On 11/27/2025 2:15 PM, David Brown wrote:

On 27/11/2025 15:02, Michael S wrote:

On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:

MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
�� 0:�� f2 0f 11 44 24 08�� movsd� %xmm0,0x8(%rsp)
�� 6:�� 48 8b 44 24 08�� mov�� 0x8(%rsp),%rax
�� b:�� 48 c1 e8 34�� shr�� $0x34,%rax
�� f:�� 25 ff 07 00 00�� and�� $0x7ff,%eax
�� 14:�� c3�� ret

Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely a bit
better, but the difference is unlikely to be detected by simple
measurements.

I think it is unlikely that this version - moving from xmm0 to rax via memory instead of directly - is faster on any processor.� But I fully
agree that it is unlikely to be a measurable difference in practice.

Also MSVC compiler does not like your style and produces following
warning:
dave_b.c(5): warning C4116: unnamed type definition in parentheses

Warnings are a matter of taste.� There's nothing wrong with my code, but
it may be against some code styles.

BTW, I don't like your style either. My preferred code will look
very similar to the code of Waldek Hebisch except that I'd declare
d_to_u() static.
I don't like union trick. Not just in this particular context, but
generally. memcpy() much cleaner in expressing programmer's intentions.

I particularly don't like using unions in compound literals like this
either - it was just to make a compact demonstration.� I'd write real
code in more re-usable bits with static inline functions.

I disagree, however, that memcpy() shows intent better.� The intention
is not to copy it to memory - the intention is to access the underlying
bit representation as a different type.� A type-punning union is at
least, if not more, clear for that purpose (IMHO - and judgements of
style and clarity are very much a matter of opinion).

FWIW, BGBCC allows:
double f;
u64 uli;
uli=(u64)((__m64)f);
And, you can extract an exponent as:
uli[62:52]
But, this is pretty nonstandard...

Here, "val[hi:lo]" works on pretty much any integer type, with the behavior:
If it is a normal integer type, will return a zero-extended value of the
same type as the input (so, similar to what shift and mask would do).

If the input type is a _BitInt or _UBitInt, the result will also be
_BitInt or _UBitInt with the same width as the bitfield selector.

It is possible to select a single bit:
uli[63]
But, this is only valid for _BitInt and _UBitInt, where if a normal
integer type it used here, it makes more sense to assume the user had
mistyped something, so "uli[63:63]" would be needed for a single bit
extract in this case.

It is also possible to compose values of _UBitInt and similar, say:
_UBitInt(24) rgb24;
_UBitInt(16) rgb5;
rgb5=(_UBitInt(16)) { 0b0u1, rgb24[23:19], rgb24[15:11], rgb24[7:3] };

Etc...

Some of this was partly inspired by Verilog in my case.

Partial merit in this case is that, beyond just being slightly more
concise and readable than a more traditional shifts-and-masks approach,
also makes it easier for the compiler to generate more efficient code...

Though, for this particular scenario, my ISA has a specialized CPU
instruction for RGB24 to RGB555 that is faster than manually repacking
the bits, so alas...

But, also there are special optional instructions for bitfield moves
that would allow the above to be expressed in 3 CPU instructions (vs the
11 or so instructions that would be needed with a more traditional
approach, or 8 if one gets clever with how they use shifts).

However, with the syntax, without the special CPU instruction, it can
infer the 8-instruction construct, and other similar constructs, it
cases where it might have otherwise been too much mental effort for a
human programmer (and where inferring this from shifts and masks is also asking too much from the compiler...).

...

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Fri Nov 28 19:46:43 2025

On 28/11/2025 10:41, David Brown wrote:

� But for
me, -O2 is generally the sweet spot.� I have no real interest in using a compiler that doesn't do decent optimisation - if I am happy with slow
code, I'll use Python.

That's like saying that if you can't go at 100mph, you're happy to walk!

There's no compromise at all?

I've taken a task (decode JPEG) which uses the same algorithm across
three languages, and applied it to the same input. These are the
runtimes, expressed in relative MPH:

Drive 1 mile:
gcc -O3 C 108 mph 33s
gcc -O2 C 100 mph 36s
mm M 77 mph (my lang) 47s
bcc C 55 mph (my product) 1m 05s
tcc C 25 mph 2m 24s
CPython Python 0.8 mph 1h 15m 00s

Actually, forget walking: you'd rather crawl on your hands on knees!

(The figure for PyPy for this task, which has lots of long loops to get
stuck into, is 19 mph, but the speedup is generally unpredictable.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Fri Nov 28 21:58:04 2025

On 28/11/2025 20:46, bart wrote:

On 28/11/2025 10:41, David Brown wrote:

� But for me, -O2 is generally the sweet spot.� I have no real
interest in using a compiler that doesn't do decent optimisation - if
I am happy with slow code, I'll use Python.

That's like saying that if you can't go at 100mph, you're happy to walk!

There's no compromise at all?

My work is mainly on microcontrollers, where efficient code is critical
(x86 processors are good at running unoptimised code quickly,
microcontrollers are not). And some of my work is programming on PC's,
where it is rarely important - it makes more sense to use a language
targeting faster development time than faster runtime. (The bulk of the
time spent when running Python code is in libraries, OS calls, waiting
for disks, IO, networks, etc.)

I'm sure plenty of people have use for "medium speed" languages, but I
don't see it for what I do.

Actually, the same goes for travelling. I'm happy to go out for a walk,
but if I am trying to get somewhere at a distance, I'll drive. I've
never thought "what I really want here to go to the shops is a car with
a max speed of 30 mph".

I've taken a task (decode JPEG) which uses the same algorithm across
three languages, and applied it to the same input. These are the
runtimes, expressed in relative MPH:

�� Drive 1 mile:
� gcc -O3� C�� 108�� mph�� 33s
� gcc -O2� C�� 100�� mph�� 36s
� mm�� M�� 77�� mph (my lang)�� 47s
� bcc�� C�� 55�� mph (my product)�� 1m 05s
� tcc�� C�� 25�� mph�� 2m 24s
� CPython� Python�� 0.8 mph�� 1h 15m 00s

Actually, forget walking: you'd rather crawl on your hands on knees!

(The figure for PyPy for this task, which has lots of long loops to get stuck into, is 19 mph, but the speedup is generally unpredictable.)

I don't write jpeg decoders on PC's. I very rarely write code that has
to be fast on a PC. (It has happened occasionally - but usually then I
use existing fast code like numpy to do the heavy lifting.) On the few occasions that I write C or C++ code on PC's, I use optimisation. For
one thing, it gives better static error checking. And while I probably
am not too bothered about the speed differences, it's just hard for me
to purposefully and pointlessly pessimise code.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Fri Nov 28 22:43:36 2025

On 28/11/2025 19:09, BGB wrote:

It is also possible to compose values of _UBitInt and similar, say:
� _UBitInt(24) rgb24;
� _UBitInt(16) rgb5;
� rgb5=(_UBitInt(16)) { 0b0u1, rgb24[23:19], rgb24[15:11], rgb24[7:3] };

This has given me an idea for an extended feature. Here, I would use
rgb.[x] syntax, where x is maybe 23, or 23..19, and rgb.[x, y] just
means rgb.[x].[y].

The latter is not that useful however; suppose that rgb.[x, y] actually combines rgb.[x] and rgb.[y]. That could then be used to express your
example like this:

rgb24.[23..19, 15..11, 7..3]

So the 3 distinct 5-bit bitfields are concatenated into one 15-bit field.

However the exact meaning and ordering would still need pinning down,
and there are various questions to be answered. I also think that such extaction can be a separate feature from packing multiple sub-word
values into one result.

I think this might be worth looking at. But I'm still not keen on
relying on the type system to give you the lengths of those fields. (In
my language, rgb.[23..19] is extracted into an i64 value so its bitfield
info is lost.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Fri Nov 28 15:23:23 2025

bart <bc@freeuk.com> writes:

On 28/11/2025 02:33, Janis Papanagnou wrote:

[...]

You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)

It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.

In Pascal (where I copied it from) it is a reserved word.

<OT>In Pascal, "odd" is not a reserved word. It's the name of a
predefined function.</OT>

[...]

PS: BTW, I was always wondering why Pascal and Algol 68 supported
'odd' but not 'even'! - In the documents of the Genie compiler we
can read: "This is a relic of times long past.", but beyond that
it doesn't explain why it's a "relic". I can only guess that it's,
as a special case, considered just unnecessary in the presence of
the modulus operator.

Maybe because you can trivially define 'even' as 'not odd'.

Or maybe because odd(n) can be implemented as "treat the low-order bit
of the argument as a Boolean".

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Sat Nov 29 00:08:46 2025

On 28/11/2025 23:23, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 28/11/2025 02:33, Janis Papanagnou wrote:

[...]

You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)

It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.

In Pascal (where I copied it from) it is a reserved word.

<OT>In Pascal, "odd" is not a reserved word. It's the name of a
predefined function.</OT>

So what's a 'reserved word' then? To me it is something not available as
a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Sat Nov 29 03:26:49 2025

On 28/11/2025 12.49, bart wrote:

On 28/11/2025 02:33, Janis Papanagnou wrote:

so the natural way of describing them would (IMO) rather be
based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
syntax with '%' is probably more "appropriate". Mileages may vary.)

I've made the mistake with % 1 more than once.

(If you know in what areas you commonly make mistakes you can
work on that! - Just a suggestion to think about.)

You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)

It is a reserved word, which means it can't be used as either a top-
level user identifier, or a member name. With extra effort, it could be
used for both, but that needs some special syntax, such as Ada-style "A'odd"; I've never got around to it.

In Pascal (where I copied it from) it is a reserved word.

As far as I recall, in Pascal it's a predefined function! - The
difference is that you cannot use reserved words as identifiers.
(It's similar, but not necessarily, with keywords; depending on
the language.)

That was basically also the background of my explanation; to my
knowledge "C" didn't want to introduce too many reserved words
that as a consequence then cannot be used as "language entity"
names (like variables, function names, etc.) any more. - That's
why introducing simple high-level functions unnecessarily may be
deprecated.

PS: BTW, I was always wondering why Pascal and Algol 68 supported
'odd' but not 'even'! - In the documents of the Genie compiler we
can read: "This is a relic of times long past.", but beyond that
it doesn't explain why it's a "relic". I can only guess that it's,
as a special case, considered just unnecessary in the presence of
the modulus operator.

Maybe because you can trivially define 'even' as 'not odd'.

But it's the same with 'odd'; you can trivially write it as an
boolean or as an arithmetic expression, whatever one prefers.

And that also doesn't explain why 'odd' is considered a "relic"
by Marcel. (I can only explain that opinion as I've done above.)
The point in Algol 68 is, though, even more relaxed; since you
have stropping there the conflicts of keywords with identifiers
aren't what they are in other languages.

Janis

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Sat Nov 29 03:32:36 2025

On 29/11/2025 03.26, Janis Papanagnou wrote:

...

That was basically also the background of my explanation; to my
knowledge "C" didn't want to introduce too many reserved words
that as a consequence then cannot be used as "language entity"
names (like variables, function names, etc.) any more. - That's
why introducing simple high-level functions unnecessarily may be
deprecated.

Please ignore the last sentence. - I was speaking about reserved
words or keywords and not about function names in the context of
the paragraph. - So it depends in what way you introduce elements
like 'odd'. As a "C" function it wouldn't matter much. In case of
"your language" - where you say it's a keyword! - it would matter,
though!

Janis

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Fri Nov 28 19:38:06 2025

bart <bc@freeuk.com> writes:

On 28/11/2025 23:23, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 28/11/2025 02:33, Janis Papanagnou wrote:

[...]

You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)

It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.

In Pascal (where I copied it from) it is a reserved word.

<OT>In Pascal, "odd" is not a reserved word. It's the name of a
predefined function.</OT>

So what's a 'reserved word' then? To me it is something not available
as a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.

Right. The name "odd" is available as a user-defined identifier.
If you define something named "odd" in Pascal, it hides the
predefined function of that name.

You can think of Pascal's predefined functions as being declared
in an outer scope, surrounding the main program. Pascal's rules
for declarations in inner scopes hiding identifiers in outer scopes
are similar to C's.

(C has no predefined functions.)

If there's more to say about this, I suggest comp.lang.misc or comp.lang.pascal.misc.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Sat Nov 29 11:24:19 2025

On 29/11/2025 03:38, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 28/11/2025 23:23, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 28/11/2025 02:33, Janis Papanagnou wrote:

[...]

You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)

It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.

In Pascal (where I copied it from) it is a reserved word.

<OT>In Pascal, "odd" is not a reserved word. It's the name of a
predefined function.</OT>

So what's a 'reserved word' then? To me it is something not available
as a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.

Right. The name "odd" is available as a user-defined identifier.
If you define something named "odd" in Pascal, it hides the
predefined function of that name.

I did test it with a toy Pascal compiler I have. Defining 'odd' as a
variable didn't work, but that was for other reasons.

You can think of Pascal's predefined functions as being declared
in an outer scope, surrounding the main program.

I took 'predefined functions' to mean 'built-in functions' (effectively, operators with function-like syntax), that cannot be overridden.

So 'odd' is not a reserved word in Pascal; I was mistaken.

(My opinion is that being able to shadow fundamental language features
is undesirable. Being able to reuse them as user identifiers is another matter, but that would involve tricks with syntax or context to avoid ambiguity.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Sat Nov 29 12:24:12 2025

On 29/11/2025 02:32, Janis Papanagnou wrote:

On 29/11/2025 03.26, Janis Papanagnou wrote:

...

That was basically also the background of my explanation; to my
knowledge "C" didn't want to introduce too many reserved words
that as a consequence then cannot be used as "language entity"
names (like variables, function names, etc.) any more. - That's
why introducing simple high-level functions unnecessarily may be
deprecated.

Please ignore the last sentence. - I was speaking about reserved
words or keywords and not about function names in the context of
the paragraph. - So it depends in what way you introduce elements
like 'odd'. As a "C" function it wouldn't matter much. In case of
"your language" - where you say it's a keyword! - it would matter,
though!

My syntax actually has a stropping mechanism, but it is applied to user-identifiers.

That's for when you really want to use a reserved as an identifier, for example if porting code from another language, or machine translation.
So this is possible:

int `odd := 3

if `odd.odd then

It is also case-preserving (syntax is usually case-insensitive):

int `int, `INT, `Int # three different variables

And (I've just discovered this), it be used when identifiers either
start with a digit, or are numbers:

int `1234 := 1235

But this is generally ugly and undesirable; you only do this as a last
resort. (The feature is most heavily used in machine-generated assembly.)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sat Nov 29 14:45:30 2025

On 29/11/2025 12:24, bart wrote:

On 29/11/2025 03:38, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 28/11/2025 23:23, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 28/11/2025 02:33, Janis Papanagnou wrote:

[...]

You can of course add as many commodity features to "your language" >>>>>> as you like. I seem to recall that one of the design principles of >>>>>> "C" was to not add too many keywords. (Not sure whether 'A.odd' is >>>>>> a function or keyword above [in "your language"].)

It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.

In Pascal (where I copied it from) it is a reserved word.

<OT>In Pascal, "odd" is not a reserved word.� It's the name of a
predefined function.</OT>

So what's a 'reserved word' then? To me it is something not available
as a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.

Right.� The name "odd" is available as a user-defined identifier.
If you define something named "odd" in Pascal, it hides the
predefined function of that name.

I did test it with a toy Pascal compiler I have. Defining 'odd' as a variable didn't work, but that was for other reasons.

You can think of Pascal's predefined functions as being declared
in an outer scope, surrounding the main program.

I took 'predefined functions' to mean 'built-in functions' (effectively, operators with function-like syntax), that cannot be overridden.

So 'odd' is not a reserved word in Pascal; I was mistaken.

(My opinion is that being able to shadow fundamental language features
is undesirable. Being able to reuse them as user identifiers is another matter, but that would involve tricks with syntax or context to avoid ambiguity.)

The issue is where you draw the line of what is a "fundamental language feature", and what is not. For Pascal, "begin" is a fundamental
language feature, part of the syntax. "odd" is not fundamental - it's
just a function in the Pascal's equivalent of the C standard library.
So no tricks or special syntax (like "stropping") are needed to re-use
the identifier for other purposes.

I agree that using words that are "fundamental" is not good. But if a language provides built-in functions in a global namespace, then it is a serious limitation if these cannot be shadowed or overridden.
Basically, it means that you are always at risk of conflicts with
existing code if later language versions add new functions. So if
someone wrote Pascal code with a local variable called "even", and a
later version introduced a built-in function "even", then it is critical
that this is an overrideable or shadowable (if that is a real word!) identifier.

That's why C is very conservative about adding new keywords, and uses
reserved namespaces for the purpose - thus C99 added "_Bool", not
"bool", to avoid conflict with existing code. Only now, over two
decades later, did the committee feel that uses of the identifier "bool"
other than as a typedef for _Bool (usually via <stdbool.h>) are so rare
that C23 could finally have "bool" as a keyword for the type. And they
still have challenges with good names for standard library functions -
now in C23, many new ones have names with a "stdc_" prefix.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online
Recent Visitors
- John F Kennedy
  Thu Nov 20 14:53:19 2025
  from crazyworldbbs.com:2323 via Telnet
- Guest
  Sat Nov 22 17:37:30 2025
  from Meh. Nah via Telnet
- Guest
  Wed Nov 26 06:46:07 2025
  from Gremlintown, Az via Telnet
- Guest
  Thu Nov 27 12:02:51 2025
  from Gremlintown, Az via Raw

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	14
Nodes:	8 (0 / 8)
Uptime:	238:04:54
Calls:	184
Files:	21,502
Messages:	82,427

Re: _BitInt(N)

Who's Online

Recent Visitors

System Info