• Re: _BitInt(N)

    From Philipp Klaus Krause@3:633/10 to All on Sun Nov 23 12:46:10 2025
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do. Also
    being able to use bit-fields wider than int.

    Saving memory for two reasons:

    * On small embedded systems where there is very little memory
    * For code that needs to be very fast on big systems to make data
    structures fit into cache

    Philipp


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Sun Nov 23 13:59:59 2025
    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do.

    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor that
    might actually have a 22-bit hardware types.

    Implementing such odd-size types on regular 8/16/32/64-bit hardware is
    full of problems if you want to do it without padding (in order to get
    the savings). On even with padding (to get the desired overflow semantics).

    Such as working out how pointers to them will work.


    Also
    being able to use bit-fields wider than int.

    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and int256_t.
    Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)

    * There appears to be no upper limit on size, so _BitInt(2997901) is a
    valid type

    So what is the result type of multiplying values of those two types?

    Integer sizes greater than 1K or 2K bits should use an arbitrary
    precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Sun Nov 23 17:06:54 2025
    On Sun, 23 Nov 2025 13:59:59 +0000
    bart <bc@freeuk.com> wrote:

    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do.

    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor
    that might actually have a 22-bit hardware types.

    Implementing such odd-size types on regular 8/16/32/64-bit hardware
    is full of problems if you want to do it without padding (in order to
    get the savings). On even with padding (to get the desired overflow semantics).

    Such as working out how pointers to them will work.


    Also
    being able to use bit-fields wider than int.

    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and
    int256_t. Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)

    * There appears to be no upper limit on size, so _BitInt(2997901) is
    a valid type


    Upper limit is implementation-defined.
    On both existing implementations the limit (on 64-bit targets) appears
    to be 2**16 or 2**16-1. I don't remember which one.


    So what is the result type of multiplying values of those two types?


    I think, traditional C rules for integer types apply here as well: type
    of result is the same as type of wider operand. It is arithmetically unsatisfactory, but consistent with the rest of language.
    And practically sufficient, because C programmers are already accustomed
    to write statements like:
    uint64_t foo(uint32_t x, uint16 y) { return (uint64_t)x*y; }

    So it would be natural for them to write:
    _BitInt(1536) foo(_BitInt(1024) x, _BitInt(512) y) {
    return _BitInt(1536)x*y;
    }

    Since the pattern is so common already, optimizing compiler is likely to understand the meaning and generate only necessary calculations.
    Or, at least, to not generate too much of unnecessary calculations.

    Integer sizes greater than 1K or 2K bits should use an arbitrary
    precision type (which is how large _BitInts will likely be
    implemented anyway), where the precision is a runtime attribute.


    I think, the Standard is written in such way that implementing _BitInt
    as an arbitrary precision numbers, i.e. with number of bits held as part
    of the data, is not allowed. Of course, Language Support Library can be
    (and hopefully is, at least for gcc; clang is messy a.t.m.) based on
    arbitrary precision core routines, but the API used by compiler should
    be similar to GMP's mpn_xxx family of functions rather than GMP's
    mpz_xxx family, i.e. # of bits as separate parameters from data arrays
    rather than combined.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Sun Nov 23 14:38:17 2025
    bart <bc@freeuk.com> writes:
    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:
    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?

    Saving memory by using the smallest multiple-of-8 N that will do.
    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor
    that might actually have a 22-bit hardware types.

    What rationale are you referring to? There hasn't been an official ISO
    C Rationale document since C99.

    Implementing such odd-size types on regular 8/16/32/64-bit hardware is
    full of problems if you want to do it without padding (in order to get
    the savings). On even with padding (to get the desired overflow
    semantics).

    Such as working out how pointers to them will work.

    Why would pointers to _BitInt types be a problem? A _BitInt object is
    a fixed-size chunk of memory, similar to a struct object.

    Also being able to use bit-fields wider than int.
    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and
    int256_t. Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)

    Why is that a problem? If you don't want odd-sized types, don't use them.

    * There appears to be no upper limit on size, so _BitInt(2997901) is a
    valid type

    The upper limit is specified by the implementation as BITINT_MAXWIDTH, a
    macro defined in <limits.h>.

    For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
    For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).

    clang seems to have some problems with _BitInt(8388608). For example,
    this program:

    #include <limits.h>

    _BitInt(BITINT_MAXWIDTH) n = 42;

    int main(void) {
    n *= n;
    }

    takes a *long* time to compile with clang. I believe it's generating
    inline code to do the 8388608 by 8388608 bit multiplication.

    So what is the result type of multiplying values of those two types?

    _BitInt types are exempt from the integer promotion rules (so _BitInt(3) doesn't promote to int), but the usual arithmetic conversions apply.
    If you multiply values of two _BitInt types, the result is the wider of
    the two types.

    N3220 is a draft of C23.

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf

    Integer sizes greater than 1K or 2K bits should use an arbitrary
    precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.

    _BitInt(n) objects are fixed-size. Addition and subtraction should be
    fairly straightforward. For multiplication and division, gcc generates
    calls to __mulbitint3 and __divmodbitint4, and clang generates huge
    amounts of inline code. My guess is that future llvm/clang releases
    will handle _BitInt types more efficiently.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 00:30:46 2025
    On 23/11/2025 22:38, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:
    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?

    Saving memory by using the smallest multiple-of-8 N that will do.
    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor
    that might actually have a 22-bit hardware types.

    What rationale are you referring to? There hasn't been an official ISO
    C Rationale document since C99.

    See Introduction and Rationale here:

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf




    Implementing such odd-size types on regular 8/16/32/64-bit hardware is
    full of problems if you want to do it without padding (in order to get
    the savings). On even with padding (to get the desired overflow
    semantics).

    Such as working out how pointers to them will work.

    Why would pointers to _BitInt types be a problem? A _BitInt object is
    a fixed-size chunk of memory, similar to a struct object.

    Saving memory was mentioned. To achieve that means having bitfields that
    may not start at bit 0 of a byte, and may cross byte- or word-boundaries.

    For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
    but storing packed values means it would use only 625K bytes.

    Anyway, pointers to individual values, or to some arbitrary element or
    slice of such an array, would need some extra info.


    Also being able to use bit-fields wider than int.
    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and
    int256_t. Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)

    Why is that a problem? If you don't want odd-sized types, don't use them.

    It is an unnecessary complication. There will be a lot of extra rules
    that maybe partly 'implementation defined', so behaviour may vary. And
    people WILL uses those types because they are there, and likely they
    will be inefficient.

    What happens when a 391-bit type, even unsigned, overflows? These larger
    types are likely to use a multiple of 64-bits, and for 391 bits will
    need 7 x 64 bits, of which the last word will have 57 bits of padding.
    It's very messy.

    Specifying a multiple of 64 bits is better; a power of two even better.


    * There appears to be no upper limit on size, so _BitInt(2997901) is a
    valid type

    The upper limit is specified by the implementation as BITINT_MAXWIDTH, a macro defined in <limits.h>.

    For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
    For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).

    clang seems to have some problems with _BitInt(8388608). For example,
    this program:

    #include <limits.h>

    _BitInt(BITINT_MAXWIDTH) n = 42;

    int main(void) {
    n *= n;
    }

    takes a *long* time to compile with clang. I believe it's generating
    inline code to do the 8388608 by 8388608 bit multiplication.

    Now try it with two disparate sizes.

    So what is the result type of multiplying values of those two types?

    _BitInt types are exempt from the integer promotion rules (so _BitInt(3) doesn't promote to int), but the usual arithmetic conversions apply.
    If you multiply values of two _BitInt types, the result is the wider of
    the two types.

    So multiplying even two one-million-bit types could overflow!

    Such limits for /fixed-width/ integers are ridiculous.

    You might say this is no different from defining an array of exactly
    123,456 elements. But the use-cases are very different.

    I starting going into details but I guess you don't care about such
    matters or whether the feature makes much sense.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Sun Nov 23 21:39:59 2025
    On 11/23/2025 7:59 AM, bart wrote:
    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do.

    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor that might actually have a 22-bit hardware types.

    Implementing such odd-size types on regular 8/16/32/64-bit hardware is
    full of problems if you want to do it without padding (in order to get
    the savings). On even with padding (to get the desired overflow semantics).

    Such as working out how pointers to them will work.


    In BGBCC, for any size <= 256 bits, it is padded to the next power-of-2
    size. Although if the size is NPOT, some extra handling exists to
    mask/extend it to the requested size.

    Sizes larger than 256 are padded to the next multiple of 128 bits.

    IIRC, GCC and Clang use smaller padding, but not looked into it.



    Also
    being able to use bit-fields wider than int.

    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and int256_t.
    Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)

    * There appears to be no upper limit on size, so _BitInt(2997901) is a
    valid type


    In BGBCC, there is a hard limit of IIRC 16384 bits.


    As an extension, it also allows for very large literals, though
    currently literals larger than 128 bits can only use hexadecimal or similar.

    This is encoded via suffixes, eg:
    I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
    I128, UI128: 128-bit
    I256, UI256: 256-bit
    other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).


    Larger decimal numbers could be supported, but for now I don't have a
    strong need for decimal literals beyond 128 bits.

    Implicitly, there is a limit of around 1K bits for literals mostly due
    to normal tokens having a limit of 255 characters. Compound string
    literals have a higher limit of 4096 (standard) or 65536 (implementation).



    Possibly, as a little bit of wonk, internally large literals are
    implemented in the compiler on top of Base85 strings.

    Where, say, for integer literals:
    48 bits or less: Stored directly in compiler-specific tagrefs;
    49-64 bits: Encoded via an index into a lookup table.
    65-128 bits: Split into a pair of 64-bit chunks as indices into a lookup table;
    129+: String cosplaying as an integer literal.


    So what is the result type of multiplying values of those two types?


    Typically the max of either input type...


    Integer sizes greater than 1K or 2K bits should use an arbitrary
    precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.



    Disagree, this would open up a whole big mess.

    Can't have this for similar reasons to why one doesn't have
    variable-sized structs.


    Decided to leave out the whole VLA mess.
    Better to just pretend VLAs don't exist.

    ...



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 10:29:30 2025
    On 23/11/2025 16:06, Michael S wrote:
    On Sun, 23 Nov 2025 13:59:59 +0000
    bart <bc@freeuk.com> wrote:

    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do.

    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor
    that might actually have a 22-bit hardware types.

    Implementing such odd-size types on regular 8/16/32/64-bit hardware
    is full of problems if you want to do it without padding (in order to
    get the savings). On even with padding (to get the desired overflow
    semantics).

    Such as working out how pointers to them will work.


    Also
    being able to use bit-fields wider than int.

    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and
    int256_t. Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)

    * There appears to be no upper limit on size, so _BitInt(2997901) is
    a valid type


    Upper limit is implementation-defined.
    On both existing implementations the limit (on 64-bit targets) appears
    to be 2**16 or 2**16-1. I don't remember which one.


    So what is the result type of multiplying values of those two types?


    I think, traditional C rules for integer types apply here as well: type
    of result is the same as type of wider operand. It is arithmetically unsatisfactory, but consistent with the rest of language.

    There is one key difference between the _BitInt() types and other
    integer types - with _BitInt(), there are no automatic promotions to
    other integer types. Thus if you are using _BitInt() operands in an arithmetic expression, these are not promoted to "int" or "unsigned int"
    even if they are smaller (lower rank). If you mix _BitInt()'s of
    different sizes, then the smaller one is first converted to the larger
    type. And if _BitInt(N) is mixed with unsigned _BitInt(N), that will
    mean the signed operand is converted to an unsigned _BitInt(N) -
    something that I think is "arithmetically unsatisfactory", as you put it.

    And practically sufficient, because C programmers are already accustomed
    to write statements like:
    uint64_t foo(uint32_t x, uint16 y) { return (uint64_t)x*y; }

    So it would be natural for them to write:
    _BitInt(1536) foo(_BitInt(1024) x, _BitInt(512) y) {
    return _BitInt(1536)x*y;
    }

    Since the pattern is so common already, optimizing compiler is likely to understand the meaning and generate only necessary calculations.
    Or, at least, to not generate too much of unnecessary calculations.

    Integer sizes greater than 1K or 2K bits should use an arbitrary
    precision type (which is how large _BitInts will likely be
    implemented anyway), where the precision is a runtime attribute.


    I think, the Standard is written in such way that implementing _BitInt
    as an arbitrary precision numbers, i.e. with number of bits held as part
    of the data, is not allowed.

    Correct. _BitInt(N) is a signed integer type with precisely N value
    bits. It can have padding bits if necessary (according to the target
    ABI), but it can't have any other information.

    Of course, Language Support Library can be
    (and hopefully is, at least for gcc; clang is messy a.t.m.) based on arbitrary precision core routines, but the API used by compiler should
    be similar to GMP's mpn_xxx family of functions rather than GMP's
    mpz_xxx family, i.e. # of bits as separate parameters from data arrays
    rather than combined.


    Yes, exactly. At the call site, the size of the _BitInt type is always
    a known compile-time constant, so it can easily be passed on. Thus :

    _BitInt(N) x;
    _BitInt(M) y;
    _BitInt(NM) z = x * y;

    can be implemented as something like :

    __bit_int_signed_mult(NM, (unsigned char *) &z,
    N, (const unsigned char *) &x,
    M, (const unsigned char *) &y);




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 11:17:05 2025
    On 24/11/2025 09:29, David Brown wrote:
    On 23/11/2025 16:06, Michael S wrote:
    On Sun, 23 Nov 2025 13:59:59 +0000
    bart <bc@freeuk.com> wrote:

    So what is the result type of multiplying values of those two types?


    I think, traditional C rules for integer types apply here as well: type
    of result is the same as type of wider operand. It is arithmetically
    unsatisfactory, but consistent with the rest of language.

    There is one key difference between the _BitInt() types and other
    integer types - with _BitInt(), there are no automatic promotions to
    other integer types.˙ Thus if you are using _BitInt() operands in an arithmetic expression, these are not promoted to "int" or "unsigned int" even if they are smaller (lower rank).˙ If you mix _BitInt()'s of
    different sizes, then the smaller one is first converted to the larger
    type.

    I think, the Standard is written in such way that implementing _BitInt
    as an arbitrary precision numbers, i.e. with number of bits held as part
    of the data, is not allowed.

    Correct.˙ _BitInt(N) is a signed integer type with precisely N value
    bits.˙ It can have padding bits if necessary (according to the target
    ABI), but it can't have any other information.

    Of course, Language Support Library can be
    (and hopefully is, at least for gcc; clang is messy a.t.m.) based on
    arbitrary precision core routines, but the API used by compiler should
    be similar to GMP's mpn_xxx family of functions rather than GMP's
    mpz_xxx family, i.e. # of bits as separate parameters from data arrays
    rather than combined.


    Yes, exactly.˙ At the call site, the size of the _BitInt type is always
    a known compile-time constant, so it can easily be passed on.˙ Thus :

    ˙˙˙˙_BitInt(N) x;
    ˙˙˙˙_BitInt(M) y;
    ˙˙˙˙_BitInt(NM) z = x * y;

    So what is NM here; is it N*M (the potential maximum size of the
    result), or max(N, M)?

    It sounds like the max precision you get will be the latter.


    can be implemented as something like :

    ˙˙˙˙__bit_int_signed_mult(NM, (unsigned char *) &z,
    ˙˙˙˙˙˙˙˙˙˙˙ N, (const unsigned char *) &x,
    ˙˙˙˙˙˙˙˙˙˙˙ M, (const unsigned char *) &y);




    How would you write a generic user function that operates on any size
    BitInt? For example:

    _BitInt(?) bi_square(_BitInt(?));

    Even if you passed the size as a parameter, there would be a problem
    with the BitInt type.

    This assumes BitInts are passed and returned by value, but even using
    BitInt* wouldn't help.

    This sets it apart from arrays, where you also define very large, fixed
    size arrays, but can use a T(*)[] type to write generic functions, that
    take an additional length parameter.

    This will be for a particular T, but for BitInt, T is also fixed; it
    happens to be an implicit bit type.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 12:17:58 2025
    On 24/11/2025 01:30, bart wrote:
    On 23/11/2025 22:38, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:
    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?

    Saving memory by using the smallest multiple-of-8 N that will do.
    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor
    that might actually have a 22-bit hardware types.

    What rationale are you referring to?˙ There hasn't been an official ISO
    C Rationale document since C99.

    See Introduction and Rationale here:

    ˙https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf


    That's a proposal document, rather than that actual C standard. But it
    is useful and relevant here, and explains some of the potential uses of _BitInt.




    Implementing such odd-size types on regular 8/16/32/64-bit hardware is
    full of problems if you want to do it without padding (in order to get
    the savings). On even with padding (to get the desired overflow
    semantics).

    Such as working out how pointers to them will work.

    Why would pointers to _BitInt types be a problem?˙ A _BitInt object is
    a fixed-size chunk of memory, similar to a struct object.

    Saving memory was mentioned. To achieve that means having bitfields that
    may not start at bit 0 of a byte, and may cross byte- or word-boundaries.


    No, that is incorrect.

    The proposal mentions saving /space/ as relevant in FPGAs - not saving /memory/. The authors use-case here is in writing code that can be
    compiled with a "normal" C compiler on a "normal" target, and also
    compiled to FPGA /hardware/, with the same semantics. In hardware, a
    5-bit by 5-bit single-cycle multiplier is very much smaller than an
    8-bit by 8-bit multiplier, and orders of magnitude smaller than if the
    5-bit integers are promoted to 32-bit before multiplying.

    The proposal is not about saving /memory/. It specifically says that a _BitInt(N) has the same size and alignment as the smallest basic type
    that can contain it, until you get to N greater than 64-bit, in which
    they are contained in an array of int64_t. (The reality is a little
    more formal, to handle targets that have other sizes of their basic types.)

    So on a "normal" target, a _BitInt(3) is the same size and alignment as
    a uint8_t, a _BitInt(35) is effectively contained in an uint64_t, and an
    array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or 128
    bits, not 68 bits.

    As far as I can see, the C23 standard does not specify these details,
    and leaves them up to the target ABI. But at the very least, they will
    always take an integer number of bytes - unsigned char. There can never
    be any crossing of byte boundaries.

    I expect most "big" implementations to follow the proposals
    recommendation with containers of 8, 16, 32 and 64 bits, then arrays of
    64-bit chunks after that. I expect some smaller targets to be a bit
    more flexible - 8-bit embedded targets are likely to use 8-bit chunks
    for everything, and 16-bit and 32-bit devices will use 16-bit and 32-bit chunks. I have not yet looked for implementations in order to check this.

    Compilers targeting FPGA hardware generation are, by their nature, weird
    in many ways. They will generate N-bit wide logic and registers for
    local data and expressions. How they implement things like arrays in
    memory will probably be very specialised - these are not tools you use
    with arbitrary C code, and almost everything is specially written.


    For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
    but storing packed values means it would use only 625K bytes.

    Anyway, pointers to individual values, or to some arbitrary element or
    slice of such an array, would need some extra info.


    Also being able to use bit-fields wider than int.
    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and
    int256_t. Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)

    Why is that a problem?˙ If you don't want odd-sized types, don't use
    them.

    It is an unnecessary complication. There will be a lot of extra rules
    that maybe partly 'implementation defined', so behaviour may vary. And people WILL uses those types because they are there, and likely they
    will be inefficient.

    Why? And why do you talk specifically about odd numbers? I can
    understand your concern about packing arrays of _BitInts that are not multiples of 8, though I hope you now understand that it is not the
    problem you thought it was. However, I see no reason to suppose that _BitInt(5) is any more or less "complicated" than _BitInt(6) just
    because 5 is an odd number!

    A major point of the _BitInt concept is to be able to specify and use
    integers of specific explicit sizes in a way that is as implementation independent as possible. Some aspects of the implementation cannot be
    avoided - such as the size of unsigned char and alignment and padding
    for storage. But the behaviour of the types is entirely independent of
    the implementation. There are no "extra rules" - neither for specific implementations, nor for specific sizes of _BitInt's.

    Efficiency of implementation is, of course, up to the implementation.
    But there is absolutely no reason to suppose that working with a _BitInt
    of size up to the implementation's maximum integer type is going to be
    less efficient than using other types and masking. For larger
    _BitInt's, there are different possible implementation strategies with different pros and cons in regard to efficiency.


    What happens when a 391-bit type, even unsigned, overflows? These larger types are likely to use a multiple of 64-bits, and for 391 bits will
    need 7 x 64 bits, of which the last word will have 57 bits of padding.
    It's very messy.


    It is not messy at all. Signed integer overflow is UB, unsigned integer overflow is wrapping. It's the same as always, and could not be
    simpler, clearer or neater.

    Specifying a multiple of 64 bits is better; a power of two even better.


    You can pick _BitInt sizes as you want - if you want a power of two or multiple of 64, use that. You get exactly the same overflow behaviour.


    * There appears to be no upper limit on size, so _BitInt(2997901) is a
    ˙˙ valid type

    The upper limit is specified by the implementation as BITINT_MAXWIDTH, a
    macro defined in <limits.h>.

    For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
    For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).

    clang seems to have some problems with _BitInt(8388608).˙ For example,
    this program:

    #include <limits.h>

    _BitInt(BITINT_MAXWIDTH) n = 42;

    int main(void) {
    ˙˙˙˙ n *= n;
    }

    takes a *long* time to compile with clang.˙ I believe it's generating
    inline code to do the 8388608 by 8388608 bit multiplication.

    Now try it with two disparate sizes.

    I think compiler implementations would do well to pick a max width that
    is a more realistic for real-world use-cases. (And having 2**16 - 1
    instead of 2**16 seems very strange to me.) Even more important for efficiency is to make a distinction between what sizes work well for
    inline code, and what should use more generic library code.


    So what is the result type of multiplying values of those two types?

    _BitInt types are exempt from the integer promotion rules (so _BitInt(3)
    doesn't promote to int), but the usual arithmetic conversions apply.
    If you multiply values of two _BitInt types, the result is the wider of
    the two types.

    So multiplying even two one-million-bit types could overflow!

    Such limits for /fixed-width/ integers are ridiculous.

    Um, I think you might want to re-read and re-phrase that. When you have fixed-width integers, you have a finite range. Try to go beyond that,
    and you have arithmetic overflow. There is no alternative for
    fixed-width integers. It doesn't matter if your integers are 8-bit or a million bits. Integer systems that don't have overflow need arbitrary precision - dynamic allocation for different sizes.


    You might say this is no different from defining an array of exactly
    123,456 elements. But the use-cases are very different.

    I starting going into details but I guess you don't care about such
    matters or whether the feature makes much sense.



    I am not sure what you mean by that.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Mon Nov 24 13:44:41 2025
    On Mon, 24 Nov 2025 12:17:58 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    The proposal is not about saving /memory/. It specifically says that
    a _BitInt(N) has the same size and alignment as the smallest basic
    type that can contain it, until you get to N greater than 64-bit, in
    which they are contained in an array of int64_t. (The reality is a
    little more formal, to handle targets that have other sizes of their
    basic types.)


    That is a bit unfortunate.
    Compiler support for arrays of 17 to 24bit numbers packed as 3 octet
    per item would have been handy. And not hard at all for compiler to
    implement, at least on architectures that has proper support for
    unaligned access, like x86, POWER, Arm and RISC-V.

    I certainly have real-world applications that use packed arrays like
    that. They could have been written in cleaner and less error-prone
    way if such feature was available.

    I suppose, packed numeric arrays with 5, 6 or 7 octets per item are also
    used by some people, although they are probably less common than my
    case.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 11:45:18 2025
    On 24/11/2025 03:39, BGB wrote:
    On 11/23/2025 7:59 AM, bart wrote:
    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do.

    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor
    that might actually have a 22-bit hardware types.

    Implementing such odd-size types on regular 8/16/32/64-bit hardware is
    full of problems if you want to do it without padding (in order to get
    the savings). On even with padding (to get the desired overflow
    semantics).

    Such as working out how pointers to them will work.


    In BGBCC, for any size <= 256 bits, it is padded to the next power-of-2 size. Although if the size is NPOT, some extra handling exists to mask/ extend it to the requested size.

    There are two kinds of BitInts: those smaller than 64 bits; and those
    larger than 64 bits, sometimes /much/ larger.

    I had been responding to the claim that those smaller types save memory, compared to using sizes 8/16/32 bits which are commonly available and
    have better hardware support.

    But if a _BitInt(17) is rounded up to 32 bits, there's not going to be
    any saving!

    Here, I wouldn't use the type system at all to define odd-sized fields.
    C already has bitfields within structs, that can be used to efficiently
    pack odd-sized data. But they have lots of restrictions, and are not an independent type.

    (In my stuff, I do the same, but with more control. I also have bitfield-operators that work within ordinary integers. And, in one
    language, arrays of 1/2/4 bits. But again none of these bitfields of
    sub-byte elements are proper types, although those u1/u2/u4 elements
    come close.)


    In BGBCC, there is a hard limit of IIRC 16384 bits.


    As an extension, it also allows for very large literals, though
    currently literals larger than 128 bits can only use hexadecimal or
    similar.

    This is encoded via suffixes, eg:
    ˙ I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
    ˙ I128, UI128: 128-bit
    ˙ I256, UI256: 256-bit
    ˙˙˙ other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).


    Larger decimal numbers could be supported, but for now I don't have a
    strong need for decimal literals beyond 128 bits.

    I did once have a very nice 128-bit type in my systems language, but it
    didn't get enough use to be worth supporting. It was awkward to
    implement too, since each value type took up two registers, or two stack
    slots (in some cases, one of each!)

    But my scripting language has an arbitrary-precision /decimal/ floating
    point type, which can also be used for pure integer calculations.

    I think the maximum range is 10**19000000000 (and a matching minimum). Precision is limited only by memory and runtime, but there are usually
    caps in place otherwise evaluating 1/3 would go on forever.

    This is one is actually more useful and a lot of fun.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Mon Nov 24 13:57:31 2025
    On Mon, 24 Nov 2025 11:45:18 +0000
    bart <bc@freeuk.com> wrote:

    But my scripting language has an arbitrary-precision /decimal/
    floating point type, which can also be used for pure integer
    calculations.


    Arbitrary-precision floating point? That sounds problematic, regardless
    of base. Unless you don't use the word 'arbitrary' in the same sense
    that it is used, for example, in GMP.
    Gnu MPFR is very careful to never call itself "arbitrary-precision" in
    official docs.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Mon Nov 24 14:10:33 2025
    On Mon, 24 Nov 2025 00:30:46 +0000
    bart <bc@freeuk.com> wrote:

    It is an unnecessary complication. There will be a lot of extra rules
    that maybe partly 'implementation defined', so behaviour may vary.
    And people WILL uses those types because they are there, and likely
    they will be inefficient.

    What happens when a 391-bit type, even unsigned, overflows? These
    larger types are likely to use a multiple of 64-bits, and for 391
    bits will need 7 x 64 bits, of which the last word will have 57 bits
    of padding. It's very messy.

    To me, it does not sound as a problem at all, at least for unsigned
    types. Masking out unnecessary MS bits in MS word is easy.
    Even for signed, sign extension of MS word is not as easy, as masking
    out, but hardly a rocket science. The problem with signed is that
    signed overflow is a saint cow of the temple of worshipers of nazal
    demons. So, authors of proposal were afraid of touching it.


    Specifying a multiple of 64 bits is better; a power of two even
    better.


    I strongly disagree. Being able to specify, say, 192-bit integers is
    a useful thing. Esp. in context of multiplication and division.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 04:29:19 2025
    bart <bc@freeuk.com> writes:
    On 23/11/2025 22:38, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 23/11/2025 13:32, Waldek Hebisch wrote:
    Philipp Klaus Krause <pkk@spth.de> wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:
    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?

    Saving memory by using the smallest multiple-of-8 N that will do.
    IIUC nothing in the standard says that it is smallest multiple-of-8.
    Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
    while the number cound fit in 3 bytes.

    The rationale mentions a use-case where there is a custom processor
    that might actually have a 22-bit hardware types.
    What rationale are you referring to? There hasn't been an official
    ISO C Rationale document since C99.

    See Introduction and Rationale here:

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

    Thanks.

    Implementing such odd-size types on regular 8/16/32/64-bit hardware is
    full of problems if you want to do it without padding (in order to get
    the savings). On even with padding (to get the desired overflow
    semantics).

    Such as working out how pointers to them will work.
    Why would pointers to _BitInt types be a problem? A _BitInt object
    is a fixed-size chunk of memory, similar to a struct object.

    Saving memory was mentioned. To achieve that means having bitfields
    that may not start at bit 0 of a byte, and may cross byte- or word-boundaries.

    That part of the rationale appears to be specific to FPGA hardware, not something I know much about.

    For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
    but storing packed values means it would use only 625K bytes.

    Anyway, pointers to individual values, or to some arbitrary element or
    slice of such an array, would need some extra info.

    On the implementations I have access to (gcc and clang), a _BitInt
    object is an ordinary object, with a size that's whole number of bytes. `unsigned _BitInt(4)`, for example, has 4 value bits and 4 padding bits. (Unless it's a bit-field, but that doesn't give you packed arrays.)

    I can see the benefit of tightly packing multiple bit-precise
    integers into an array, but I don't see a way to do that, either
    with the current gcc and llvm/clang implementations or with the C
    memory model.

    Also being able to use bit-fields wider than int.
    For me main gain is reasonably standard syntax for integers bigger
    that 64 bits.

    Standard syntax I guess would be something like int128_t and
    int256_t. Such wider integers tend to be powers of two.

    But there are two problems with _BitInt:

    * Any odd sizes are allowed, such as _BitInt(391)
    Why is that a problem? If you don't want odd-sized types, don't use
    them.

    It is an unnecessary complication. There will be a lot of extra rules
    that maybe partly 'implementation defined', so behaviour may vary. And
    people WILL uses those types because they are there, and likely they
    will be inefficient.

    Imposing arbitrary restrictions would introduce more unnecessary
    complication. As far as I've been able to tell, odd-sized _BitInt types
    are already implemented (though I've done very little testing).

    What happens when a 391-bit type, even unsigned, overflows? These
    larger types are likely to use a multiple of 64-bits, and for 391 bits
    will need 7 x 64 bits, of which the last word will have 57 bits of
    padding. It's very messy.

    An unsigned _BitInt(391) value wraps around modulo 2**391. In the
    current gcc and clang implementations, it has a size of 56 bytes, with
    391 value bits and 57 value bits. It doesn't seem to be a problem in
    practice.

    Specifying a multiple of 64 bits is better; a power of two even better.

    Then by all means do so. Operations on _BitInt(448) or _BitInt(512)
    might even be more efficient than operations on _BitInt(391).

    If you want the language to restrict allowed widths, how exactly
    would you specify it? Would you allow 32 but not 33? 64 but
    not 65? 72? 80?

    You can impose whatever restrictions you like in your own code.

    * There appears to be no upper limit on size, so _BitInt(2997901) is a
    valid type
    The upper limit is specified by the implementation as
    BITINT_MAXWIDTH, a macro defined in <limits.h>. For gcc 15.2.0 on
    x86_64, BITINT_MAXWIDTH is 65535 (2**16-1). For clang 21.1.5 it's
    8388608 (2**23 bits, 1048576 bytes). clang seems to have some
    problems with _BitInt(8388608). For example, this program: #include
    <limits.h> _BitInt(BITINT_MAXWIDTH) n = 42;
    int main(void) {
    n *= n;
    }
    takes a *long* time to compile with clang. I believe it's generating
    inline code to do the 8388608 by 8388608 bit multiplication.

    Now try it with two disparate sizes.

    Why? llvm/clang currently has a known problem with multiplication
    and division on very large _BitInt types. It shouldn't be too
    difficult for them to correct it. Operations on disparate sizes
    don't add much complexity (the narrower operand is promoted to the
    wider type).

    If you're curious, here's the bug report (I've commented on it),
    but it's an implementation issue, not a language issue.

    https://github.com/llvm/llvm-project/issues/126384

    So what is the result type of multiplying values of those two types?
    _BitInt types are exempt from the integer promotion rules (so
    _BitInt(3) doesn't promote to int), but the usual arithmetic
    conversions apply. If you multiply values of two _BitInt types, the
    result is the wider of the two types.

    So multiplying even two one-million-bit types could overflow!

    Of course. These are fixed-width types, not arbitrary precision types.

    If you want to multiply two _BitInt(1'000'000) values without overflow,
    you can convert to _BitInt(2'000'000) -- if the compiler supports it.
    (Don't expect the code to be efficient, at least for now.)

    Such limits for /fixed-width/ integers are ridiculous.

    I acknowledge that you think so.

    I honestly don't know why the gcc maintainers felt it was worthwhile
    to support _BitInt types up to 65535 bits, or why the llvm/clang
    maintainers decided to support up to 8388608 bits. But that's
    what they've done, and again, you don't have to use it if you don't
    want to. There could easily be a perfectly valid reason that you
    and I are not aware of.

    It's likely that implementing million-bit integers isn't
    significantly harder than implementing thousand-bit integers.

    Bit-precise integers up to, say, 128 or 256 bits seem to be
    implemented reasonably efficiently. How exactly does the fact that
    compilers support much wider types inconvenience you?

    You might say this is no different from defining an array of exactly
    123,456 elements. But the use-cases are very different.

    I starting going into details but I guess you don't care about such
    matters or whether the feature makes much sense.

    On the contrary, I'm curious about it. But if two different compiler
    teams have already done the work of implementing bit-precise
    integers with extremely large and/or odd widths, I can think of
    no reason to complain about it. Even if it doesn't make sense,
    I didn't have to do the work of implementing it.

    Incidentally, something odd happens to quoted text in your followups.
    Blank lines are lost, and paragraphs are reformatted oddly, often
    with alternating long and short lines. Is your newsreader doing
    that, or is it something else? Can you do something about it?

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 12:31:44 2025
    On 24/11/2025 11:17, David Brown wrote:
    On 24/11/2025 01:30, bart wrote:

    Saving memory was mentioned. To achieve that means having bitfields
    that may not start at bit 0 of a byte, and may cross byte- or word-
    boundaries.


    No, that is incorrect.

    The proposal mentions saving /space/ as relevant in FPGAs - not saving / memory/.

    But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.

    That's not going to happen if they are simply rounded up to the next power-of-two type.

    If the purpose is, say, a 17-bit type that wraps past values of 131071,
    then that sounds like a lot of extra code needed, for something that
    does not sound that useful. Why modulo 2**17; why not 100,000? Or any
    value more relevant to the task.


    ˙ The authors use-case here is in writing code that can be
    compiled with a "normal" C compiler on a "normal" target, and also
    compiled to FPGA /hardware/, with the same semantics.˙ In hardware, a 5-
    bit by 5-bit single-cycle multiplier is very much smaller than an 8-bit
    by 8-bit multiplier, and orders of magnitude smaller than if the 5-bit integers are promoted to 32-bit before multiplying.

    The proposal is not about saving /memory/.˙ It specifically says that a _BitInt(N) has the same size and alignment as the smallest basic type
    that can contain it, until you get to N greater than 64-bit, in which
    they are contained in an array of int64_t.˙ (The reality is a little
    more formal, to handle targets that have other sizes of their basic types.)

    So on a "normal" target, a _BitInt(3) is the same size and alignment as
    a uint8_t, a _BitInt(35) is effectively contained in an uint64_t, and an array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or 128
    bits, not 68 bits.

    As far as I can see, the C23 standard does not specify these details,
    and leaves them up to the target ABI.˙ But at the very least, they will always take an integer number of bytes - unsigned char.˙ There can never
    be any crossing of byte boundaries.

    What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These could actually be practically implemented, with a few restrictions, and could
    save a lot of memory.

    Why?˙ And why do you talk specifically about odd numbers?˙ I can
    understand your concern about packing arrays of _BitInts that are not multiples of 8, though I hope you now understand that it is not the
    problem you thought it was.˙ However, I see no reason to suppose that _BitInt(5) is any more or less "complicated" than _BitInt(6) just
    because 5 is an odd number!

    I mean odd compared with powers-of-two, or multiples of 8.


    A major point of the _BitInt concept is to be able to specify and use integers of specific explicit sizes in a way that is as implementation independent as possible.˙ Some aspects of the implementation cannot be avoided - such as the size of unsigned char and alignment and padding
    for storage.˙ But the behaviour of the types is entirely independent of
    the implementation.˙ There are no "extra rules" - neither for specific implementations, nor for specific sizes of _BitInt's.

    Efficiency of implementation is, of course, up to the implementation.
    But there is absolutely no reason to suppose that working with a _BitInt
    of size up to the implementation's maximum integer type is going to be
    less efficient than using other types and masking.˙ For larger
    _BitInt's, there are different possible implementation strategies with different pros and cons in regard to efficiency.


    What happens when a 391-bit type, even unsigned, overflows? These
    larger types are likely to use a multiple of 64-bits, and for 391 bits
    will need 7 x 64 bits, of which the last word will have 57 bits of
    padding. It's very messy.


    It is not messy at all.˙ Signed integer overflow is UB, unsigned integer overflow is wrapping.˙ It's the same as always, and could not be
    simpler, clearer or neater.

    In my 391-bit example, the top 7 bits will be within a 64-bit word. What values will those extra 57 bits be?

    Taking just those 7 bits by themselves, if the value is 1111111, that is:
    00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)

    and you do an arithmetic right shift, then you will get 0111111 not

    1111111, since the hardware sign bit is bit 63 not bit 6. It needs more
    work.


    Such limits for /fixed-width/ integers are ridiculous.

    Um, I think you might want to re-read and re-phrase that.˙ When you have fixed-width integers, you have a finite range.

    No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
    int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
    those still aren't built-in).

    And allowing random sizes such as int817838_t. (See, it seems much
    sillier using this syntax!)

    For such sizes it makes much more sense to acknowledge the existence of arbitrary-precision support, so that the equivalents of int1000000_t and int817838_t would be compatible types. Or you can forget specific widths
    and just have the one bigint type.

    (I use such types, but within a library, and there there are ways cap
    the precision.)




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 04:37:33 2025
    BGB <cr88192@gmail.com> writes:
    [...]
    In BGBCC, there is a hard limit of IIRC 16384 bits.

    As an extension, it also allows for very large literals, though
    currently literals larger than 128 bits can only use hexadecimal or
    similar.

    This is encoded via suffixes, eg:
    I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
    I128, UI128: 128-bit
    I256, UI256: 256-bit
    other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).

    In C23, an integer constant with a "wb" or "WB" suffix is of type
    _BitInt(n). One with a "wbu" suffix is of type unsigned _BitInt(n).
    The value of n is the smallest that can accomodate the value of the
    constant.

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 12:56:58 2025
    On 24/11/2025 11:57, Michael S wrote:
    On Mon, 24 Nov 2025 11:45:18 +0000
    bart <bc@freeuk.com> wrote:

    But my scripting language has an arbitrary-precision /decimal/
    floating point type, which can also be used for pure integer
    calculations.


    Arbitrary-precision floating point? That sounds problematic, regardless
    of base. Unless you don't use the word 'arbitrary' in the same sense
    that it is used, for example, in GMP.
    Gnu MPFR is very careful to never call itself "arbitrary-precision" in official docs.


    If you mean problems like repeated multiplies giving ever larger
    numbers, then that will happen also with integers (or rationals).

    If you mean the problems with a divide operation potentially carrying on indefinitely, then a cap needs to be set on that.

    I haven't attempted libraries for working out trancendental functions;
    the problems there are in getting a particular precision even if you
    know that in advance.

    But for basic arithmetic, it works extremely well.

    (While it is built-in to my scripting language, it was originally a
    standalone library and has been ported to C. See the bignum.c and
    bignum.h files here:

    https://github.com/sal55/langs/tree/master/bignum

    You can try out division like this:

    #include <stdio.h>
    #include "bignum.h"

    int main() {
    Bignum a, b, c;

    a = bn_makeint(1);
    b = bn_makeint(7);
    c = bn_init();

    bn_div(c, a, b, 1000);
    bn_println(c);
    }

    (Build as 'gcc prog.c bignum.c' etc.)

    You can see that 'bn_div' needs a precision argument: this is the number
    of significant decimal digits. Using 100M here produced 100 million
    digits and took about 6 seconds.)




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 05:06:33 2025
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    Yes, exactly. At the call site, the size of the _BitInt type is
    always a known compile-time constant, so it can easily be passed on.
    Thus :

    _BitInt(N) x;
    _BitInt(M) y;
    _BitInt(NM) z = x * y;

    can be implemented as something like :

    __bit_int_signed_mult(NM, (unsigned char *) &z,
    N, (const unsigned char *) &x,
    M, (const unsigned char *) &y);

    That looks like it's supposed to avoid overflow (I'm assuming NM is N + M), but it wouldn't work. The type of a C expression is almost always determined
    by the expression itself, regardless of the context in which it appears.
    The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can
    overflow even if the full result would fit into z.

    You can do this instead (not tested):

    _BitInt(N) x;
    _BitInt(M) y;
    _Bit_Int(N+M) z = (_BitInt(N+M))x * y;

    (I'm assuming N+M is sufficient, but I might have missed an off-by-one
    error somewhere.)

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 05:12:12 2025
    bart <bc@freeuk.com> writes:
    On 24/11/2025 09:29, David Brown wrote:
    On 23/11/2025 16:06, Michael S wrote:
    On Sun, 23 Nov 2025 13:59:59 +0000
    bart <bc@freeuk.com> wrote:

    So what is the result type of multiplying values of those two types?


    I think, traditional C rules for integer types apply here as well: type
    of result is the same as type of wider operand. It is arithmetically
    unsatisfactory, but consistent with the rest of language.
    There is one key difference between the _BitInt() types and other
    integer types - with _BitInt(), there are no automatic promotions to
    other integer types.˙ Thus if you are using _BitInt() operands in an
    arithmetic expression, these are not promoted to "int" or "unsigned
    int" even if they are smaller (lower rank).˙ If you mix _BitInt()'s
    of different sizes, then the smaller one is first converted to the
    larger type.

    I think, the Standard is written in such way that implementing _BitInt
    as an arbitrary precision numbers, i.e. with number of bits held as part >>> of the data, is not allowed.

    Correct.˙ _BitInt(N) is a signed integer type with precisely N value
    bits.˙ It can have padding bits if necessary (according to the
    target ABI), but it can't have any other information.

    Of course, Language Support Library can be
    (and hopefully is, at least for gcc; clang is messy a.t.m.) based on
    arbitrary precision core routines, but the API used by compiler should
    be similar to GMP's mpn_xxx family of functions rather than GMP's
    mpz_xxx family, i.e. # of bits as separate parameters from data arrays
    rather than combined.

    Yes, exactly.˙ At the call site, the size of the _BitInt type is
    always a known compile-time constant, so it can easily be passed
    on.˙ Thus :
    ˙˙˙˙_BitInt(N) x;
    ˙˙˙˙_BitInt(M) y;
    ˙˙˙˙_BitInt(NM) z = x * y;

    So what is NM here; is it N*M (the potential maximum size of the
    result), or max(N, M)?

    I made the same mistake in my previous post, but corrected it before
    posting it. The required size for the product in N+M bits, not N*M.
    For example, N=32, M=64 -> NM=96.

    [...]

    How would you write a generic user function that operates on any size
    BitInt? For example:

    _BitInt(?) bi_square(_BitInt(?));

    I don't think you can. Each _BitInt(N) type is distinct.

    You could have a function that operates on arguments of type
    [unsigned] _BitInt(BITINT_MAXWIDTH) and depend on implicit
    conversions, but that's likely to be horribly inefficient.

    Or you can replace BITINT_MAXWIDTH by the maximum width you happen to
    need in your application.

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Mon Nov 24 15:17:49 2025
    On Mon, 24 Nov 2025 12:56:58 +0000
    bart <bc@freeuk.com> wrote:

    On 24/11/2025 11:57, Michael S wrote:
    On Mon, 24 Nov 2025 11:45:18 +0000
    bart <bc@freeuk.com> wrote:

    But my scripting language has an arbitrary-precision /decimal/
    floating point type, which can also be used for pure integer
    calculations.


    Arbitrary-precision floating point? That sounds problematic,
    regardless of base. Unless you don't use the word 'arbitrary' in
    the same sense that it is used, for example, in GMP.
    Gnu MPFR is very careful to never call itself "arbitrary-precision"
    in official docs.


    If you mean problems like repeated multiplies giving ever larger
    numbers, then that will happen also with integers (or rationals).

    If you mean the problems with a divide operation potentially carrying
    on indefinitely, then a cap needs to be set on that.


    Yes, that what I meant.

    I haven't attempted libraries for working out trancendental
    functions; the problems there are in getting a particular precision
    even if you know that in advance.

    But for basic arithmetic, it works extremely well.

    (While it is built-in to my scripting language, it was originally a standalone library and has been ported to C. See the bignum.c and
    bignum.h files here:

    https://github.com/sal55/langs/tree/master/bignum

    You can try out division like this:

    #include <stdio.h>
    #include "bignum.h"

    int main() {
    Bignum a, b, c;

    a = bn_makeint(1);
    b = bn_makeint(7);
    c = bn_init();

    bn_div(c, a, b, 1000);
    bn_println(c);
    }

    (Build as 'gcc prog.c bignum.c' etc.)

    You can see that 'bn_div' needs a precision argument: this is the
    number of significant decimal digits. Using 100M here produced 100
    million digits and took about 6 seconds.)






    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Mon Nov 24 15:27:36 2025
    On Mon, 24 Nov 2025 05:06:33 -0800
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    David Brown <david.brown@hesbynett.no> writes:
    [...]
    Yes, exactly. At the call site, the size of the _BitInt type is
    always a known compile-time constant, so it can easily be passed on.
    Thus :

    _BitInt(N) x;
    _BitInt(M) y;
    _BitInt(NM) z = x * y;

    can be implemented as something like :

    __bit_int_signed_mult(NM, (unsigned char *) &z,
    N, (const unsigned char *) &x,
    M, (const unsigned char *) &y);

    That looks like it's supposed to avoid overflow (I'm assuming NM is N
    + M), but it wouldn't work. The type of a C expression is almost
    always determined by the expression itself, regardless of the context
    in which it appears. The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can overflow even if the full result would fit
    into z.

    You can do this instead (not tested):

    _BitInt(N) x;
    _BitInt(M) y;
    _Bit_Int(N+M) z = (_BitInt(N+M))x * y;

    (I'm assuming N+M is sufficient, but I might have missed an off-by-one
    error somewhere.)


    You missed nothing. N+M is both sufficient and necessary. The latter
    because of -(2**(N-1)) * -(2**(M-1)).


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 05:33:03 2025
    bart <bc@freeuk.com> writes:
    On 24/11/2025 11:17, David Brown wrote:
    On 24/11/2025 01:30, bart wrote:
    Saving memory was mentioned. To achieve that means having bitfields
    that may not start at bit 0 of a byte, and may cross byte- or word-
    boundaries.

    No, that is incorrect.
    The proposal mentions saving /space/ as relevant in FPGAs - not
    saving / memory/.

    But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.

    That's *your* presumption.

    The rationale section of N2709 mentions performance/space concerns only
    in the context of FPGAs.

    Packing arrays on ordinary hardware is impractical given C's memory
    model.

    [...]

    What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
    could actually be practically implemented, with a few restrictions,
    and could save a lot of memory.

    No, they couldn't. Array indexing is defined in terms of pointer
    arithmetic, and you can't have a pointer to something smaller than one
    byte.

    <OT>I can see something this being done in C++ with operator
    overloading. See, for example, the std::vector<bool> partial specialization.</OT>

    [...]

    In my 391-bit example, the top 7 bits will be within a 64-bit
    word. What values will those extra 57 bits be?

    Probably 0.

    Taking just those 7 bits by themselves, if the value is 1111111, that is:
    00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)

    and you do an arithmetic right shift, then you will get 0111111 not

    1111111, since the hardware sign bit is bit 63 not bit 6. It needs
    more work.

    Yes, the compiler has to do some extra work for types with padding bits,
    to ensure that those bits are either set to 0 or properly ignored.

    [...]

    No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
    int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
    those still aren't built-in).

    And allowing random sizes such as int817838_t. (See, it seems much
    sillier using this syntax!)

    Your complaint seems to be that the feature is too flexible.

    For such sizes it makes much more sense to acknowledge the existence
    of arbitrary-precision support, so that the equivalents of
    int1000000_t and int817838_t would be compatible types. Or you can
    forget specific widths and just have the one bigint type.

    Yes, there are a lot of things that C23 *could* have done, but didn't.

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 05:35:30 2025
    bart <bc@freeuk.com> writes:
    [...]
    There are two kinds of BitInts: those smaller than 64 bits; and those
    larger than 64 bits, sometimes /much/ larger.

    As far as I know, the standard makes no such distinction.

    I had been responding to the claim that those smaller types save
    memory, compared to using sizes 8/16/32 bits which are commonly
    available and have better hardware support.

    I don't recall any such claim. Do you have a citation (other than
    the FPGA-specific wording in N2709)?

    But if a _BitInt(17) is rounded up to 32 bits, there's not going to be
    any saving!

    Correct.

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 14:49:08 2025
    On 24/11/2025 12:17, bart wrote:
    On 24/11/2025 09:29, David Brown wrote:
    On 23/11/2025 16:06, Michael S wrote:
    On Sun, 23 Nov 2025 13:59:59 +0000
    bart <bc@freeuk.com> wrote:

    So what is the result type of multiplying values of those two types?


    I think, traditional C rules for integer types apply here as well: type
    of result is the same as type of wider operand. It is arithmetically
    unsatisfactory, but consistent with the rest of language.

    There is one key difference between the _BitInt() types and other
    integer types - with _BitInt(), there are no automatic promotions to
    other integer types.˙ Thus if you are using _BitInt() operands in an
    arithmetic expression, these are not promoted to "int" or "unsigned
    int" even if they are smaller (lower rank).˙ If you mix _BitInt()'s of
    different sizes, then the smaller one is first converted to the larger
    type.

    I think, the Standard is written in such way that implementing _BitInt
    as an arbitrary precision numbers, i.e. with number of bits held as part >>> of the data, is not allowed.

    Correct.˙ _BitInt(N) is a signed integer type with precisely N value
    bits.˙ It can have padding bits if necessary (according to the target
    ABI), but it can't have any other information.

    Of course, Language Support Library can be
    (and hopefully is, at least for gcc; clang is messy a.t.m.) based on
    arbitrary precision core routines, but the API used by compiler should
    be similar to GMP's mpn_xxx family of functions rather than GMP's
    mpz_xxx family, i.e. # of bits as separate parameters from data arrays
    rather than combined.


    Yes, exactly.˙ At the call site, the size of the _BitInt type is
    always a known compile-time constant, so it can easily be passed on.
    Thus :

    ˙˙˙˙˙_BitInt(N) x;
    ˙˙˙˙˙_BitInt(M) y;
    ˙˙˙˙˙_BitInt(NM) z = x * y;

    So what is NM here; is it N*M (the potential maximum size of the
    result), or max(N, M)?

    No, it is whatever you want it to be. I didn't want to use the next
    letter after N because _BitInt(O) could easily be misunderstood. But of course NM could be misunderstood too. Perhaps N1, N2 and N3 would have
    been better choices than N, M and NM.

    You pick the size of "z" here according to your needs for your code.
    The multiplication will be done, logically, at max(N, M) bits. The
    result will then be converted to NM bits. Like always in C, the
    semantics of the calculation is entirely independent of the type of the variable you assign the results to. And like always in C, the compiler
    may take advantage of knowledge of the assigned type in order to give
    more efficient code, as long as it does not stray from giving the same
    value as if it took the code literally.

    So if you want the full range of values of x and y to be usable here,
    then NM would have to be N * M. But you would also need a cast, such as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want to
    multiply two 32-bit ints as a 64-bit operation.

    Alternatively, you might know more about the values that might be in x
    and y, and have a smaller NM (though you still need a cast if it is
    greater than both N and M). Or you might be using unsigned types and
    want the wrapping / masking behaviour.

    The point was not what size NM is, but that it is known to the compiler
    at the time of writing the expression.


    It sounds like the max precision you get will be the latter.


    can be implemented as something like :

    ˙˙˙˙˙__bit_int_signed_mult(NM, (unsigned char *) &z,
    ˙˙˙˙˙˙˙˙˙˙˙˙ N, (const unsigned char *) &x,
    ˙˙˙˙˙˙˙˙˙˙˙˙ M, (const unsigned char *) &y);




    How would you write a generic user function that operates on any size BitInt? For example:

    ˙˙ _BitInt(?) bi_square(_BitInt(?));


    You can't. _BitInt(N) and _BitInt(M) are distinct types, for differing
    N and M. You can't write a generic user function in C that implements
    "T foo(T)" where T can be "int", "short", "long int", or other types. C simply does not have type-generic functions.

    You /can/ write generic macros that handle different _BitInt types, but
    that would quickly get painful given that you'd need a case for each
    size of _BitInt you wanted for the _Generic macro.

    If you want generics, you are better off with a language that supports generics, such as C++.

    Even if you passed the size as a parameter, there would be a problem
    with the BitInt type.

    Yes. But you could use a void* pointer for more generic parameters.

    However, _BitInt types are for "bit-precise integer types". They are
    for specific fixed sizes, not for arbitrary precision integers. They
    are not ideally suited for tasks for which they were not designed -
    that's hardly surprising.


    This assumes BitInts are passed and returned by value, but even using BitInt* wouldn't help.

    Yes, they are passed around as values - they are integer types and are
    passed around like other integer types. (Implementations may use stack
    blocks and pointers for passing the values around if they are too big
    for registers, just as implementations can do with any value type.
    That's an implementation detail - logically, they are passed and
    returned as values.)


    This sets it apart from arrays, where you also define very large, fixed
    size arrays, but can use a T(*)[] type to write generic functions, that
    take an additional length parameter.

    _BitInt's are fixed-size integer types, not arrays. Again, it is not
    then surprising that they are different from arrays.


    This will be for a particular T, but for BitInt, T is also fixed; it
    happens to be an implicit bit type.


    _BitInt's are not arrays, they are scalars - they are integer types.
    There is no concept of a type "_BitInt" - they always have compile-time
    fixed sizes, such as "_BitInt(12)". So the idea of passing around
    generic _BitInt's makes no more sense than passing around any other kind
    of generic integer types. (Of course you can have an array of _BitInt's
    of any given size.)


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 14:51:09 2025
    On 24/11/2025 14:06, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    Yes, exactly. At the call site, the size of the _BitInt type is
    always a known compile-time constant, so it can easily be passed on.
    Thus :

    _BitInt(N) x;
    _BitInt(M) y;
    _BitInt(NM) z = x * y;

    can be implemented as something like :

    __bit_int_signed_mult(NM, (unsigned char *) &z,
    N, (const unsigned char *) &x,
    M, (const unsigned char *) &y);

    That looks like it's supposed to avoid overflow (I'm assuming NM is N + M), but
    it wouldn't work. The type of a C expression is almost always determined
    by the expression itself, regardless of the context in which it appears.
    The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can
    overflow even if the full result would fit into z.

    You can do this instead (not tested):

    _BitInt(N) x;
    _BitInt(M) y;
    _Bit_Int(N+M) z = (_BitInt(N+M))x * y;

    (I'm assuming N+M is sufficient, but I might have missed an off-by-one
    error somewhere.)


    It /looks/ like NM means "N + M" (or N * M, as both Bart and I wrote
    without thinking), but that was not my intention. I simply meant a
    constant that may be chosen differently from N and M, and did not want
    to go on to the letter O. In hindsight, NM was a poor choice.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 15:02:59 2025
    On 24/11/2025 12:44, Michael S wrote:
    On Mon, 24 Nov 2025 12:17:58 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    The proposal is not about saving /memory/. It specifically says that
    a _BitInt(N) has the same size and alignment as the smallest basic
    type that can contain it, until you get to N greater than 64-bit, in
    which they are contained in an array of int64_t. (The reality is a
    little more formal, to handle targets that have other sizes of their
    basic types.)


    That is a bit unfortunate.
    Compiler support for arrays of 17 to 24bit numbers packed as 3 octet
    per item would have been handy. And not hard at all for compiler to implement, at least on architectures that has proper support for
    unaligned access, like x86, POWER, Arm and RISC-V.

    I certainly have real-world applications that use packed arrays like
    that. They could have been written in cleaner and less error-prone
    way if such feature was available.

    I suppose, packed numeric arrays with 5, 6 or 7 octets per item are also
    used by some people, although they are probably less common than my
    case.


    There may certainly be use-cases for such "packed arrays", but I think
    that would just add complications to the definitions of _BitInt and
    require more implementation-specific behaviour. And then someone would
    insist that they be packed by bit, rather than by byte, and cause all
    the problems that Bart feared.

    I think this kind of thing is probably best left to
    implementation-specific features - just like "packed" attributes and
    pragmas today.

    Alternatively, a standardised syntax for detailed control of packing and ordering in structs, arrays, and especially bit-fields, could be
    developed and added to the standards. I don't see a good reason to
    handle _BitInt's differently.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 14:21:23 2025
    On 24/11/2025 13:35, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    There are two kinds of BitInts: those smaller than 64 bits; and those
    larger than 64 bits, sometimes /much/ larger.

    As far as I know, the standard makes no such distinction.

    *I* am making the distinction. From an implementation point of view (and assuming 64-bit hardware), they are quite different.

    And that leads to different kinds of language features.

    If the possibilities above 64 bits were less ambitious (say i128 and
    i256), then the concept might be stretched to cover both. But not when
    when you can also have i1234567.

    It would be having a GETBITS macro, which is not limited to a 1- to
    63-bit bitfield of a u64 value, but could return a slice of an
    arbitrarily large array.


    I had been responding to the claim that those smaller types save
    memory, compared to using sizes 8/16/32 bits which are commonly
    available and have better hardware support.

    I don't recall any such claim. Do you have a citation (other than
    the FPGA-specific wording in N2709)?

    This is where it came up in this thread:

    On 23/11/2025 11:46, Philipp Klaus Krause wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do. Also
    being able to use bit-fields wider than int.

    Saving memory for two reasons:

    * On small embedded systems where there is very little memory
    * For code that needs to be very fast on big systems to make data
    structures fit into cache


    Although this doesn't go as far as using odd bit-sizes: it would mean
    using sizes like 24, 40, 48, and 56 bits instead of 32 or 64 bits.

    The savings would be sparse.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 14:41:03 2025
    On 24/11/2025 13:33, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:

    What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
    could actually be practically implemented, with a few restrictions,
    and could save a lot of memory.

    No, they couldn't. Array indexing is defined in terms of pointer
    arithmetic, and you can't have a pointer to something smaller than one
    byte.


    The restrictions I mentioned were to do with pointers to individual bits.

    It is possible that operations such as:

    x = A[i]
    A[i] = x

    can be well defined when A is an array of 1/2/4-bit values, even if
    expressed like this:

    *(A + i)

    But this would have to be indivisible when A is such an array: only the
    whole thing is valid, not (A + i) by itself, or A by itself; you'd need &A.

    This would need a small tweak to the language, but that is nothing
    compared to supporting (i3783467 * i999 / i3) >> i17.

    But I write a script in my dynamic language, which does support arrays
    of 'u1 u2 u4', and it gives these results:

    Array of u1 uses 12,500,000 bytes
    Array of u2 uses 25,000,000 bytes
    Array of u4 uses 50,000,000 bytes
    Array of u8 uses 100,000,000 bytes
    Array of u16 uses 200,000,000 bytes
    Array of u32 uses 400,000,000 bytes
    Array of u64 uses 800,000,000 bytes

    C can only get down to that u8 figure (100MB) using its 'char' type.
    Even 'bool' doesn't make it smaller (presumably for the reasons you mentioned).

    You are forced to emulate such arrays in user-code using shifts and masks.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 15:41:13 2025
    On 24/11/2025 13:31, bart wrote:
    On 24/11/2025 11:17, David Brown wrote:
    On 24/11/2025 01:30, bart wrote:

    Saving memory was mentioned. To achieve that means having bitfields
    that may not start at bit 0 of a byte, and may cross byte- or word-
    boundaries.


    No, that is incorrect.

    The proposal mentions saving /space/ as relevant in FPGAs - not saving
    / memory/.

    But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.


    OK. However, that is not what is in the proposal, nor in the C23 standard.

    That's not going to happen if they are simply rounded up to the next power-of-two type.

    Correct (with the proviso that after 64 bits, rounding is to whatever
    type can contain an int64_t).

    As I mentioned, I don't think the C standards require that rounding-up
    size, even though it was in the proposal. It may be worth punting that question over to the "comp.std.c" newsgroup to see if someone has a
    definite answer.

    For the kind of small systems that had been mentioned in the context of
    saving memory, compilers often have extensions or
    implementation-specific features (attributes, pragmas, etc.) to go
    beyond standard C in order to get greater efficiency on tiny systems.
    These may support smaller containers or tighter array packing.


    If the purpose is, say, a 17-bit type that wraps past values of 131071,
    then that sounds like a lot of extra code needed, for something that
    does not sound that useful. Why modulo 2**17; why not 100,000? Or any
    value more relevant to the task.


    Signed _BitInt's don't wrap - arithmetic overflow is UB. Unsigned
    _BitInt's wrap, just like with all other unsigned integer types in C.
    And wrapping is /not/ a lot of extra code - wrapping an N-bit type is
    just a and instruction with the constant (2 << N) - 1. This can be done
    once at the end of complex arithmetic expressions, in most cases
    (shift-right and division can mean extra masking is needed).

    Why not provide wrapping types with arbitrary wrapping values? Why not
    indeed - some languages do (Ada springs to mind). They are not actually
    that often needed, so it's easier just to put a "% X" operation in the
    user code.

    The rational in the proposal that you linked said why these _BitInt
    types can be useful. They are for expressing the intent of the
    programmer more clearly, making it more convenient to work with somewhat bigger integer sizes (such as for cryptography), and improving FPGA development. Wrapping is not a big point (and it does not apply at all
    to signed _BitInt).


    ˙ The authors use-case here is in writing code that can be compiled
    with a "normal" C compiler on a "normal" target, and also compiled to
    FPGA /hardware/, with the same semantics.˙ In hardware, a 5- bit by
    5-bit single-cycle multiplier is very much smaller than an 8-bit by
    8-bit multiplier, and orders of magnitude smaller than if the 5-bit
    integers are promoted to 32-bit before multiplying.

    The proposal is not about saving /memory/.˙ It specifically says that
    a _BitInt(N) has the same size and alignment as the smallest basic
    type that can contain it, until you get to N greater than 64-bit, in
    which they are contained in an array of int64_t.˙ (The reality is a
    little more formal, to handle targets that have other sizes of their
    basic types.)

    So on a "normal" target, a _BitInt(3) is the same size and alignment
    as a uint8_t, a _BitInt(35) is effectively contained in an uint64_t,
    and an array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or
    128 bits, not 68 bits.

    As far as I can see, the C23 standard does not specify these details,
    and leaves them up to the target ABI.˙ But at the very least, they
    will always take an integer number of bytes - unsigned char.˙ There
    can never be any crossing of byte boundaries.

    What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These could actually be practically implemented, with a few restrictions, and could
    save a lot of memory.


    They could - but that would add a lot of complications (the once you
    worried about). I would assume that this was considered both by the
    authors of the proposal, and by the C committee, and rejected as not
    being worth the cost.

    Why?˙ And why do you talk specifically about odd numbers?˙ I can
    understand your concern about packing arrays of _BitInts that are not
    multiples of 8, though I hope you now understand that it is not the
    problem you thought it was.˙ However, I see no reason to suppose that
    _BitInt(5) is any more or less "complicated" than _BitInt(6) just
    because 5 is an odd number!

    I mean odd compared with powers-of-two, or multiples of 8.

    Okay. "Unusual" might have been a better choice of term, or you could
    have explained what you meant. But that makes more sense.



    A major point of the _BitInt concept is to be able to specify and use
    integers of specific explicit sizes in a way that is as implementation
    independent as possible.˙ Some aspects of the implementation cannot be
    avoided - such as the size of unsigned char and alignment and padding
    for storage.˙ But the behaviour of the types is entirely independent
    of the implementation.˙ There are no "extra rules" - neither for
    specific implementations, nor for specific sizes of _BitInt's.

    Efficiency of implementation is, of course, up to the implementation.
    But there is absolutely no reason to suppose that working with a
    _BitInt of size up to the implementation's maximum integer type is
    going to be less efficient than using other types and masking.˙ For
    larger _BitInt's, there are different possible implementation
    strategies with different pros and cons in regard to efficiency.


    What happens when a 391-bit type, even unsigned, overflows? These
    larger types are likely to use a multiple of 64-bits, and for 391
    bits will need 7 x 64 bits, of which the last word will have 57 bits
    of padding. It's very messy.


    It is not messy at all.˙ Signed integer overflow is UB, unsigned
    integer overflow is wrapping.˙ It's the same as always, and could not
    be simpler, clearer or neater.

    In my 391-bit example, the top 7 bits will be within a 64-bit word. What values will those extra 57 bits be?


    They are padding bits. They don't contribute to the value of the object.

    An implementation, or rather an ABI, can decide that they should always
    be zero, or always zero for unsigned _BitInt and always a sign extension
    for signed _BitInt, or it can decide that they are always ignored.
    Giving a specific value means masking may be needed before storing a
    value in memory or passing it on to an external function, while making
    it ignored can mean masking might be needed when reading from memory or
    using a returned value.

    It is not really any different from other padding bits or bytes, such as
    all but the LSB in a _Bool, or padding in structs.

    Taking just those 7 bits by themselves, if the value is 1111111, that is:
    ˙ 00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)

    and you do an arithmetic right shift, then you will get 0111111 not

    C does not have an "arithmetic right shift" operation - that's an assembly-level operation. Signed right-shift of negative values is implementation-defined in C.


    1111111, since the hardware sign bit is bit 63 not bit 6. It needs more work.


    If the value of those 7 bits is 0111'1111, you have a negative value and right-shifting that is implementation-defined. The compiler
    implementation can pick whatever it feels is efficient and a good choice
    for its users. Maybe that means it defines the right-shift to work as
    though the type was unsigned - you get 0011'1111. Maybe it means it
    defines the padding bits for signed _BitInt to use sign extension, and
    signed right-shift instructions. Maybe it means that the compiler will
    mask the value, then sign-extend it, then do a signed right-shift
    instruction, then mask it again. That's all up to the implementation.

    You are worrying about completely negligible things here. (If you are considering adding support for _BitInt to your own C tools, then I
    understand wanting to get all the details right.)


    Such limits for /fixed-width/ integers are ridiculous.

    Um, I think you might want to re-read and re-phrase that.˙ When you
    have fixed-width integers, you have a finite range.

    No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
    int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
    those still aren't built-in).

    And allowing random sizes such as int817838_t. (See, it seems much
    sillier using this syntax!)

    I had taken your "ridiculous" comment to be part of your complaint that "multiplying even two one-million-bit types could overflow". But those statements are independent, then only the first is silly - of course arithmetic on any finite sized type can overflow unless specifically
    limited (such as by wrapping behaviour for unsigned types). I agree
    that huge fixed-size integer types are not useful, though I am not sure
    where the ideal limit lies. The biggest use-case for very large
    integers is cryptography. I find it hard to imagine sizes greater than
    16 kbit being directly useful, and thus 32 kbit sizes for intermediary results. Fixed sizes can be more efficient than arbitrary precision
    types when the same sized objects are used repeatedly.


    For such sizes it makes much more sense to acknowledge the existence of arbitrary-precision support, so that the equivalents of int1000000_t and int817838_t would be compatible types. Or you can forget specific widths
    and just have the one bigint type.

    (I use such types, but within a library, and there there are ways cap
    the precision.)





    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 15:59:41 2025
    On 24/11/2025 14:17, Michael S wrote:
    On Mon, 24 Nov 2025 12:56:58 +0000
    bart <bc@freeuk.com> wrote:

    On 24/11/2025 11:57, Michael S wrote:
    On Mon, 24 Nov 2025 11:45:18 +0000
    bart <bc@freeuk.com> wrote:

    But my scripting language has an arbitrary-precision /decimal/
    floating point type, which can also be used for pure integer
    calculations.


    Arbitrary-precision floating point? That sounds problematic,
    regardless of base. Unless you don't use the word 'arbitrary' in
    the same sense that it is used, for example, in GMP.
    Gnu MPFR is very careful to never call itself "arbitrary-precision"
    in official docs.


    If you mean problems like repeated multiplies giving ever larger
    numbers, then that will happen also with integers (or rationals).

    If you mean the problems with a divide operation potentially carrying
    on indefinitely, then a cap needs to be set on that.


    Yes, that what I meant.


    I remember a fun programming task at university in a language similar to Haskell, which involved writing an arbitrary precision fixed-point
    decimal arithmetic package. It included support for an infinite
    polynomial expansion for arctan, and then use a Maclin-like formula to
    get a "value" for pi. It all worked well, as long as you remembered to
    limit how many digits you printed out...



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Mon Nov 24 11:52:33 2025
    On 11/24/2025 6:37 AM, Keith Thompson wrote:
    BGB <cr88192@gmail.com> writes:
    [...]
    In BGBCC, there is a hard limit of IIRC 16384 bits.

    As an extension, it also allows for very large literals, though
    currently literals larger than 128 bits can only use hexadecimal or
    similar.

    This is encoded via suffixes, eg:
    I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
    I128, UI128: 128-bit
    I256, UI256: 256-bit
    other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).

    In C23, an integer constant with a "wb" or "WB" suffix is of type
    _BitInt(n). One with a "wbu" suffix is of type unsigned _BitInt(n).
    The value of n is the smallest that can accomodate the value of the
    constant.


    OK, I missed that part.

    I had a need though in this case to specify an exact width for the
    constant in some use cases, rather than merely just specify its largeness.

    But, yeah, I<nn> and U<nn> / UI<nn> are non-standard, but alas...

    Follows a similar pattern as for printf modifiers, say:
    printf("%I64u\n", longValue); //MSVC specific
    Vs, say:
    printf("%llu\n", longValue); //Most everything else

    In this case, the I<nn> notation being extended to also cover __int128
    and _BitInt.

    ...



    [...]



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 18:35:01 2025
    On 24/11/2025 14:41, David Brown wrote:
    On 24/11/2025 13:31, bart wrote:

    That's all up to the implementation.

    You are worrying about completely negligible things here.

    Is it that negligible? That's easy to say when you're not doing the implementing! However it may impact on the size and performance of code.


    And allowing random sizes such as int817838_t. (See, it seems much
    sillier using this syntax!)

    I had taken your "ridiculous" comment to be part of your complaint that "multiplying even two one-million-bit types could overflow".˙ But those statements are independent, then only the first is silly - of course arithmetic on any finite sized type can overflow unless specifically
    limited (such as by wrapping behaviour for unsigned types).˙ I agree
    that huge fixed-size integer types are not useful, though I am not sure where the ideal limit lies.

    You don't think it strange that C doesn't even have a 128-bit type yet
    (it only barely has width-specific 64-bit ones).

    There is just the poor gnu extension where 128-bit integers didn't have
    a literal form, and there was no way to print such values.

    But now there is this huge leap, not only to 128/256/512/1024 bits, but
    to conceivably millions, plus the ability to specify any weird type you
    like, like 182 bits (eg. somebody makes a typo for _BitInt(128), but
    they silently get a viable type that happens to be a little less
    efficient!).

    So, 20 years of having 64-bit processors with little or no support for
    even double-word types, and now there is this explosion in capabilities.

    Or, are literals and print facilities for these new types still missing?

    Personally I think they should have got the basics right first, like a
    decent 128-bit type, proper literals, and ways to print.

    This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
    allocated on the stack?). A poorly suited, hard-to-implement feature.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Mon Nov 24 13:12:54 2025
    On 11/24/2025 8:21 AM, bart wrote:
    On 24/11/2025 13:35, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    There are two kinds of BitInts: those smaller than 64 bits; and those
    larger than 64 bits, sometimes /much/ larger.

    As far as I know, the standard makes no such distinction.

    *I* am making the distinction. From an implementation point of view (and assuming 64-bit hardware), they are quite different.

    And that leads to different kinds of language features.


    As noted, as I understand it there is no reason for the storage to be
    smaller than the next power-of-2 size.

    Supporting odd-sized values in memory would have added a lot more of a
    pain in terms of making things efficient (it is a lot more of an issue
    to store a 24-bit or 40-bit item to memory than 32 or 64).

    Though, one possibility could be "__packed _BitInt(n)" where in this
    case it would handle them as the nearest multiple of 8 bits rather than
    as the nearest power-of-2.


    As least on my ISA design, Load/Store ops are mostly only available in power-of-2 sizes, and the direct displacement case is limited to natural alignment (though using RISC-V encodings can sidestep this limitation in
    the case of the XG3 variant, or if targeting RISC-V, *).


    *: In my case, the ISA has split into multiple variants:
    XG1: Its original form.
    16/32/64/96 bit instructions.
    Mostly 5-bit register fields.
    XG2: Modified.
    Loses 16-bit encodings;
    Gains slightly larger immediate values;
    All register fields expand to 6 bits;
    Encoding scheme is slightly dog-chewed.
    XG3:
    Instructions were repacked to be compatible with RISC-V;
    Register numbering was made compatible with RISC-V;
    Un-dog-chewed the encoding scheme some vs its predecessors;
    Instruction stream can be mixed/matched with RV64G.
    However, while both RV64G and XG3 ops support superscalar.
    For reasons, my CPU core can't co-issue RV64 and XG3 instructions.
    So, it is more like the ISA can flip/flop every clock-cycle.

    However, can note that RISC-V also still lacks NPOT memory operations.

    And, if your memory store looks like:
    SRLI X6, X10, 16
    SW X10, 13(X12)
    SB X6, 15(X12)

    This isn't great, don't want to pay these sorts of penalties without reason.

    For odd-sized _BitInt, one pays the cost mostly by using sign/zero
    extension on certain operations.

    In basic forms of both ISAs, this can be done via a pair of shift instructions, say, zero-extending 24 bits:
    SLLI X10, X10, 40
    SRLI X10, X10, 40

    In my case, there is an optional feature that can allow this to be
    encoded as a single instruction. Although the instruction in question
    uses a 64-bit encoding; so doesn't save any code-size over the pair of
    shifts, but is faster; partly also because in my CPU core most
    instructions have a minimum latency of 2 clock cycles; which isn't ideal
    for a lot of RISC-V's patterns.

    Though, on the CPU in question, the ideal scheduling isn't so much to
    try to reuse a register immediately, but if possible to put around 5 instructions between modifying a register and trying to access its value
    again (but, this case really sucks for some constructs in RV).

    Like, one can't optimally schedule an array index load (needs 3
    instructions in RV64G) when such scheduling will most likely exceed the
    total length of the loop body (and trying to modulo-schedule array-loads
    is just kinda absurd).

    Well, technically, CPU isn't VLIW (at least for RV64 and XG3, XG1 and
    XG2 were "LIW"), but being 3-wide in-order, optimal case for performance
    is still to try to schedule things as-if they were (V)LIW.

    Though, the spacing drops to 3 intermediate instructions if scheduling
    for 2-wide; which may make sense either if there isn't sufficient ILP to optimize for 3-wide scheduling (most of the time) or the code is doing
    things that hinder 3-wide operation (minority case; but can happen as
    the 3rd lane in this case only does basic ALU instructions and is
    "eaten" by certain instructions, such as indexed-store, etc).

    ...


    My compiler still doesn't deal with all of this well (and sorta blows it
    off in the case of targeting RV64G or RV64GC), but this sort of thing
    seems to be sort of a pain case in general (and it sorta helps if the programmer also write their code in a way that helps the compiler along
    here; but helps some if ISA design limitations don't actively hinder the ability to generate efficient code in this area).

    ...


    Though, had noted that (curiously) writing code as-if one were targeting
    a modulo-scheduled VLIW seems to help with x86-64 as well, even if
    x86-64 has nowhere near enough registers to benefit here (it is almost
    as-if x86-64 has a mechanism in place to cheapen the cost of stack
    spills and reloads).

    In my case, I had instead used 64 GPRs (from the RV64G POV, it is just
    the X and F register spaces glued together). Where 64 is mostly enough
    to competently modulo-schedule things and not run out of registers.

    Though, it is only some kinds of code that can benefit from the power of
    64 GPRs.


    But, yeah, in any case, I guess the main issue is that NPOT loads/stores
    would suck here in the absence of dedicated CPU instructions (in a
    similar way to how much it hurts by RV64G lacking indexed-load/store;
    where array operations are often very common in the types of code one
    might want to optimize via modulo scheduling the loop).

    But, you don't really want to add NPOT Load/Store instructions either,
    because this more just offloads the pain onto the CPU.

    ...



    If the possibilities above 64 bits were less ambitious (say i128 and
    i256), then the concept might be stretched to cover both. But not when
    when you can also have i1234567.

    It would be having a GETBITS macro, which is not limited to a 1- to 63-
    bit bitfield of a u64 value, but could return a slice of an arbitrarily large array.


    I added some Verilog style notation, which can in premise be used for
    large _BitInts. However this case is untested and very likely runs into
    an "implementation hole" for types larger than 128 bits.



    I had been responding to the claim that those smaller types save
    memory, compared to using sizes 8/16/32 bits which are commonly
    available and have better hardware support.

    I don't recall any such claim.˙ Do you have a citation (other than
    the FPGA-specific wording in N2709)?

    This is where it came up in this thread:

    On 23/11/2025 11:46, Philipp Klaus Krause wrote:
    Am 22.10.25 um 14:45 schrieb Thiago Adams:


    Is anyone using or planning to use this new C23 feature?
    What could be the motivation?



    Saving memory by using the smallest multiple-of-8 N that will do. Also being able to use bit-fields wider than int.

    Saving memory for two reasons:

    * On small embedded systems where there is very little memory
    * For code that needs to be very fast on big systems to make data structures fit into cache


    Although this doesn't go as far as using odd bit-sizes: it would mean
    using sizes like 24, 40, 48, and 56 bits instead of 32 or 64 bits.

    The savings would be sparse.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Mon Nov 24 21:26:53 2025
    On 24/11/2025 19:35, bart wrote:
    On 24/11/2025 14:41, David Brown wrote:
    On 24/11/2025 13:31, bart wrote:

    That's all up to the implementation.

    You are worrying about completely negligible things here.

    Is it that negligible? That's easy to say when you're not doing the implementing!

    Of course I am not implementing it. As always with features in C, no
    one is particularly bothered about how much effort is needed by the implementers. The prime concern is always the compiler users, not the compiler writers.

    However it may impact on the size and performance of code.

    The impact of an extra mask operation when you are handling 6 (IIRC)
    chunks of 64-bit data is not going to give a very significant effect on
    the size or performance of the code.



    And allowing random sizes such as int817838_t. (See, it seems much
    sillier using this syntax!)

    I had taken your "ridiculous" comment to be part of your complaint
    that "multiplying even two one-million-bit types could overflow".˙ But
    those statements are independent, then only the first is silly - of
    course arithmetic on any finite sized type can overflow unless
    specifically limited (such as by wrapping behaviour for unsigned
    types).˙ I agree that huge fixed-size integer types are not useful,
    though I am not sure where the ideal limit lies.

    You don't think it strange that C doesn't even have a 128-bit type yet
    (it only barely has width-specific 64-bit ones).

    How do you that I think that, from what I wrote? You are just making
    stuff up again.

    I think a 128-bit type can be useful. Many C compilers support one, and
    now the standard supports one too. It's called "_BitInt(128)", and you
    can expect it to perform exactly like __int128 or whatever
    compiler-specific 128-bit types you might have in a given tool.


    There is just the poor gnu extension where 128-bit integers didn't have
    a literal form, and there was no way to print such values.


    How many times have you felt the need to write a 128-bit literal? And
    how many times has that literal been in decimal (it's not difficult to
    put together a 128-bit value from two 64-bit values)? You really are
    making a mountain out of a molehill here.

    But now there is this huge leap, not only to 128/256/512/1024 bits, but
    to conceivably millions, plus the ability to specify any weird type you like, like 182 bits (eg. somebody makes a typo for _BitInt(128), but
    they silently get a viable type that happens to be a little less efficient!).


    And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,
    types with no more than a simple typedef if you don't like the names. I
    can't see your problem here.

    So, 20 years of having 64-bit processors with little or no support for
    even double-word types, and now there is this explosion in capabilities.

    Or, are literals and print facilities for these new types still missing?

    Personally I think they should have got the basics right first, like a decent 128-bit type, proper literals, and ways to print.

    This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
    allocated on the stack?). A poorly suited, hard-to-implement feature.


    You are joking, right? How is dealing with a _BitInt(1000000) any more difficult than dealing with a "struct { uint64_t chunks[125000]; }" ?


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Mon Nov 24 22:27:10 2025
    On 24/11/2025 20:26, David Brown wrote:
    On 24/11/2025 19:35, bart wrote:

    There is just the poor gnu extension where 128-bit integers didn't
    have a literal form, and there was no way to print such values.


    How many times have you felt the need to write a 128-bit literal?˙ And
    how many times has that literal been in decimal

    I don't think there were hex literals either.


    (it's not difficult to
    put together a 128-bit value from two 64-bit values)?˙ You really are
    making a mountain out of a molehill here.

    Well, it seems that such literals now exist (with 'wb' suffix). So I
    guess somebody other than you decided that feature WAS worth adding!

    But you can't as yet print out such values; I guess you can't 'scanf'
    them either. These are necessary to perform I/O on such data from/to
    text files.

    I must say you have a very laidback attibute to language design:

    "Let's add this 128-bit type, but let's not bother providing a way to
    enter such values, or add any facilities to print them out. How often
    would somebody need to do that anyway? But if they really /have/ to,
    then there are plenty of hoops they can jump through to achieve it!"

    (In my implementation of 128-bit types, from 2021, I allowed full
    128-bit decimal, hex and binary literals, and they could be printed in
    any base.

    But they weren't used enough and were dropped, in favour of an unlimited precision type in my other language.

    On interesting use-case for literals was short-strings; 128 bits allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I think C is
    still stuck at one, or 4 if you're lucky.)


    But now there is this huge leap, not only to 128/256/512/1024 bits,
    but to conceivably millions, plus the ability to specify any weird
    type you like, like 182 bits (eg. somebody makes a typo for
    _BitInt(128), but they silently get a viable type that happens to be a
    little less efficient!).


    And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,

    And 821 bits. This is what I don't get. Why is THAT so important?

    Why couldn't 128/256/etc have been added first, and then those funny
    ones if the demand was still there?

    If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
    set of types by a few more entries on the right, say 'u128 u256 u512',
    would anyone have been clamouring for types like 'u1187'? I doubt it.

    For sub-64-bit types on conventional hardware, I simply can't see the
    point, not if they are rounded up anyway. Either have a full range-based
    types like Ada, or not at all.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 16:46:32 2025
    bart <bc@freeuk.com> writes:
    On 24/11/2025 13:33, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:

    What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
    could actually be practically implemented, with a few restrictions,
    and could save a lot of memory.
    No, they couldn't. Array indexing is defined in terms of pointer
    arithmetic, and you can't have a pointer to something smaller than one
    byte.

    The restrictions I mentioned were to do with pointers to individual bits.

    Right. C doesn't have pointers to individual bits.

    It is possible that operations such as:

    x = A[i]
    A[i] = x

    can be well defined when A is an array of 1/2/4-bit values, even if
    expressed like this:

    *(A + i)

    Not in C as it's currently defined.

    But this would have to be indivisible when A is such an array: only
    the whole thing is valid, not (A + i) by itself, or A by itself; you'd
    need &A.

    This would need a small tweak to the language, but that is nothing
    compared to supporting (i3783467 * i999 / i3) >> i17.

    It would hardly be a "small tweak".

    I can imagine some future version of C adding support for indexing
    packed arrays, but I don't think it would have been worthwhile
    just so that large arrays of small _BitInts can be stored more
    efficiently. Doing that on ordinary hardware was not part of the
    rationale for C23's bit-precise integer types, and I haven't seen
    any such proposals for C2y.

    And assuming that "(i3783467 * i999 / i3) >> i17" means what I think
    it means, huge bit-precise integers are already standard (they're
    part of C23), and the work of implementing them is largely done in
    gcc and llvm/clang.

    But I write a script in my dynamic language,
    [...]

    C can only get down to that u8 figure (100MB) using its 'char'
    type. Even 'bool' doesn't make it smaller (presumably for the reasons
    you mentioned).

    You are forced to emulate such arrays in user-code using shifts and masks.

    Yes. C doesn't support packed arrays, and is unlikely to do so
    any time in the near future. C23 added a feature that doesn't do
    everything you want it to do. You can of course implement such
    things in a library, but the syntax for using it would probably be
    a bit ugly.

    And in fact at least one person has done so. (I've known about
    this for about a minute, so I have no comment other than that
    it exists.)

    https://github.com/gpakosz/PackedArray/

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 17:00:17 2025
    BGB <cr88192@gmail.com> writes:
    On 11/24/2025 8:21 AM, bart wrote:
    On 24/11/2025 13:35, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    There are two kinds of BitInts: those smaller than 64 bits; and those
    larger than 64 bits, sometimes /much/ larger.

    As far as I know, the standard makes no such distinction.

    *I* am making the distinction. From an implementation point of view
    (and assuming 64-bit hardware), they are quite different.
    And that leads to different kinds of language features.

    As noted, as I understand it there is no reason for the storage to be
    smaller than the next power-of-2 size.

    Really?

    Rounding up to 8, 16, 32, or the next multiple of 64 bits seems
    reasonable. Rounding 1025 bits up to 2048 does not (and is not
    the current gcc and llvm/clang implementations do).

    What advantage does rounding 1025 up to 2048 give you over rounding
    it up to 1088 (17*64)? It seems to me that the only real difference
    is in how many times a loop has to iterate.

    My understanding is that power-of-two sizes lose their advantages
    beyond about 64 or 128 bits. Am I mistaken?

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 17:23:47 2025
    David Brown <david.brown@hesbynett.no> writes:
    On 24/11/2025 12:17, bart wrote:
    On 24/11/2025 09:29, David Brown wrote:
    [...]
    So if you want the full range of values of x and y to be usable here,
    then NM would have to be N * M. But you would also need a cast, such
    as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want
    to multiply two 32-bit ints as a 64-bit operation.

    N + M, not N * M.

    Alternatively, you might know more about the values that might be in x
    and y, and have a smaller NM (though you still need a cast if it is
    greater than both N and M). Or you might be using unsigned types and
    want the wrapping / masking behaviour.

    The point was not what size NM is, but that it is known to the
    compiler at the time of writing the expression.

    It sounds like the max precision you get will be the latter.

    can be implemented as something like :

    ˙˙˙˙˙__bit_int_signed_mult(NM, (unsigned char *) &z,
    ˙˙˙˙˙˙˙˙˙˙˙˙ N, (const unsigned char *) &x,
    ˙˙˙˙˙˙˙˙˙˙˙˙ M, (const unsigned char *) &y);


    How would you write a generic user function that operates on any
    size BitInt? For example:
    ˙˙ _BitInt(?) bi_square(_BitInt(?));


    You can't. _BitInt(N) and _BitInt(M) are distinct types, for
    differing N and M. You can't write a generic user function in C that implements "T foo(T)" where T can be "int", "short", "long int", or
    other types. C simply does not have type-generic functions.

    Sort of. C23 defines the term "generic function" (N3220 7.26.5.1,
    string search functions). For example, strchr() can take a const void* argument and return a const void* result, or it can take a void*
    argument and return a void* result. (C++ does this by having two
    overloaded strchr() functions.)

    These "generic functions" are (almost certainly) implemented as macros
    that use _Generic. If you bypass the macro definition, you get the
    function that can take a const char* and return a char*.

    So C doesn't have type-generic functions, but it does have feature that
    let you implement things that act like type-generic functions.

    You /can/ write generic macros that handle different _BitInt types,
    but that would quickly get painful given that you'd need a case for
    each size of _BitInt you wanted for the _Generic macro.

    Indeed. A _Generic selection that handles all the ordinary non-extended integer types needs to handle 12 cases if I'm counting correctly, which
    is feasible. But the addition of bit-precise types adds
    BITINT_MAXWIDTH*2-1 new distinct predefined types, and a generic
    selection would need one case for each.

    However, you could have a function that takes a void*, a size, and a
    width as arguments and operates on a _BitInt(?) or unsigned _BitInt(?)
    type. In fact, gcc has internal functions like that for multiplication
    and division. (You mentioned something like that in text that I've
    snipped.)

    [...]

    This assumes BitInts are passed and returned by value, but even
    using BitInt* wouldn't help.

    Yes, they are passed around as values - they are integer types and are
    passed around like other integer types. (Implementations may use
    stack blocks and pointers for passing the values around if they are
    too big for registers, just as implementations can do with any value
    type. That's an implementation detail - logically, they are passed and returned as values.)

    Yes, and in general a _BitInt argument has to be copied to the
    corresponding parameter, since a change to the parameter can't affect
    the value of the argument.

    But passing huge _BitInts by value is no more problematic than passing
    huge structs by value.

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 18:03:00 2025
    bart <bc@freeuk.com> writes:
    On 24/11/2025 14:41, David Brown wrote:
    On 24/11/2025 13:31, bart wrote:
    That's all up to the implementation.
    You are worrying about completely negligible things here.

    Is it that negligible? That's easy to say when you're not doing the implementing! However it may impact on the size and performance of
    code.

    You're right, it's easy to say when I'm not doing the implementing.
    Which I'm not.

    The maintainers of gcc and llvm/clang have done that for me, so I don't
    have to worry about it.

    Are you planning to implement bit-precise integer types yourself? I
    don't think you've said so in this thread. If you are, you have at
    least two existing implementations you can look at for ideas.

    [...]

    You don't think it strange that C doesn't even have a 128-bit type yet
    (it only barely has width-specific 64-bit ones).

    C doesn't *require* 128-bit types. It certainly allows them. A C90 implementation could in principle have had 128-bit long, and a C99 or
    later implementation can have 128-bit long and/or an extended 128-bit
    type.

    As of C99 or C11, *requiring* support for 128-bit integers probably
    wouldn't have been reasonable.

    Please distinguish between the language and implementations.

    There is just the poor gnu extension where 128-bit integers didn't
    have a literal form, and there was no way to print such values.

    But now there is this huge leap, not only to 128/256/512/1024 bits,
    but to conceivably millions, plus the ability to specify any weird
    type you like, like 182 bits (eg. somebody makes a typo for
    _BitInt(128), but they silently get a viable type that happens to be a
    little less efficient!).

    Yes. With the addition of bit-precise types, gcc's __int128 might be
    obsolete (though there's bound to be existing code that depends on it).
    I can imagine that gcc might make __int128 an alias for _BitInt(128).

    So, 20 years of having 64-bit processors with little or no support for
    even double-word types, and now there is this explosion in
    capabilities.

    Those 20 years are in the past. Not much we can do about that now.

    Seriously, is your problem with _BitInt types that they're too flexible?
    What advantage do you expect from imposing additional restrictions on
    a feature that has already been defined and implemented?

    Or, are literals and print facilities for these new types still missing?

    C23 has literals for bit-precise integer types, using a "wb" or "WB"
    suffix. That's something you could have found out by reading the N3220
    C23 draft, or by reading one of my posts earlier in this thread. But I
    don't mind answering questions.

    There doesn't seem to be printf/scanf support for bit-precise integer
    types, which is a little disappointing. But since they're all distinct
    types, it could be difficult to define.

    Personally I think they should have got the basics right first, like a
    decent 128-bit type, proper literals, and ways to print.

    No language changes would be necessary to support 128-bit integer types. Implementations are free to support [u]int128_t and/or to make long long
    128 bits.

    It would have been nice if gcc's __int128 had been developed further,
    but for whatever reason that didn't happen. (Maybe there wasn't enough demand.)

    This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
    allocated on the stack?). A poorly suited, hard-to-implement feature.

    It doesn't look particularly like VLAs to me. The width is a
    compile-time constant. Allocating large _BitInt objects is no
    harder or easier than allocating large struct objects.

    Here's an idea. Rather than asserting that _BitInt(1'000'000)
    is silly and obviously useless, try *asking* how it's useful.
    I personally don't know what I'd do with a million-bit integer,
    but maybe somebody out there has a valid use for it. Meanwhile,
    its existence doesn't bother me.

    My guess is that once you've implemented integers wider than 128
    or 256 bits, million-bit integers aren't much extra effort.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Mon Nov 24 18:10:13 2025
    bart <bc@freeuk.com> writes:
    On 24/11/2025 20:26, David Brown wrote:
    [...]
    And this huge leap also lets you have 128-bit, 256-bit, 512-bit,
    etc.,

    And 821 bits. This is what I don't get. Why is THAT so important?

    Why couldn't 128/256/etc have been added first, and then those funny
    ones if the demand was still there?

    Because a more general definition, allowing all widths up to some
    maximum, is *simpler* than a definition with arbitrary restrictions.
    And since it's already been implemented, what the heck are you
    complaining about?

    If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
    set of types by a few more entries on the right, say 'u128 u256 u512',
    would anyone have been clamouring for types like 'u1187'? I doubt it.

    You do know that u8, u16, et al are not C types, right? (Yes, I know
    what you mean by those names.)

    For sub-64-bit types on conventional hardware, I simply can't see the
    point, not if they are rounded up anyway. Either have a full
    range-based types like Ada, or not at all.

    Great, so don't use them.

    If the ISO C committee withdrew the current official 2023 standard
    document and replaced it with one that imposes restrictions on _BitInt
    types, and gcc and clang withdrew their implementations, would that
    satisfy you?

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Mon Nov 24 20:10:54 2025
    On 11/24/2025 7:00 PM, Keith Thompson wrote:
    BGB <cr88192@gmail.com> writes:
    On 11/24/2025 8:21 AM, bart wrote:
    On 24/11/2025 13:35, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    There are two kinds of BitInts: those smaller than 64 bits; and those >>>>> larger than 64 bits, sometimes /much/ larger.

    As far as I know, the standard makes no such distinction.

    *I* am making the distinction. From an implementation point of view
    (and assuming 64-bit hardware), they are quite different.
    And that leads to different kinds of language features.

    As noted, as I understand it there is no reason for the storage to be
    smaller than the next power-of-2 size.

    Really?

    Rounding up to 8, 16, 32, or the next multiple of 64 bits seems
    reasonable. Rounding 1025 bits up to 2048 does not (and is not
    the current gcc and llvm/clang implementations do).


    Granted, I meant for smaller sizes (below 128 bits).

    BGBCC rounds larger sizes up to the next multiple of 128 bits.

    However, 384 bits is the first size where rounding up to a multiple of
    128 bits differs from the next power of 2.


    What advantage does rounding 1025 up to 2048 give you over rounding
    it up to 1088 (17*64)? It seems to me that the only real difference
    is in how many times a loop has to iterate.

    My understanding is that power-of-two sizes lose their advantages
    beyond about 64 or 128 bits. Am I mistaken?

    [...]


    I mentioned a few messages up that this was not the scheme I am using.

    So:
    1.. 8 => 8
    9.. 16 => 16
    17.. 32 => 32
    33.. 64 => 64
    65..128 => 128
    129..256 => 256
    256..384 => 384 (first point of divergence)
    385..512 => 512
    513..640 => 640 (second point of divergence)
    641..768 => 768 (third point of divergence)
    ...

    But, alas, reason for keeping small sizes power-of-2 is to optimize for
    memory loads/stores.

    Reason for multiples of 128 bits for larger sizes was this was the most efficient option for the target ISA (ans also less complicated for the
    support code).

    Though, if optimizing for RISC-V, a case could be made for using the
    next multiple of 64 bits instead.

    ...


    While theoretically possible, multiples of a smaller size would end up
    being a worse option in terms of performance than just "wasting" a few
    extra bytes.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Tue Nov 25 07:56:30 2025
    On 25/11/2025 02:23, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 24/11/2025 12:17, bart wrote:
    On 24/11/2025 09:29, David Brown wrote:
    [...]
    So if you want the full range of values of x and y to be usable here,
    then NM would have to be N * M. But you would also need a cast, such
    as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want
    to multiply two 32-bit ints as a 64-bit operation.

    N + M, not N * M.

    Of course. (I /really/ should have picked a different third identifier...)


    Alternatively, you might know more about the values that might be in x
    and y, and have a smaller NM (though you still need a cast if it is
    greater than both N and M). Or you might be using unsigned types and
    want the wrapping / masking behaviour.

    The point was not what size NM is, but that it is known to the
    compiler at the time of writing the expression.

    It sounds like the max precision you get will be the latter.

    can be implemented as something like :

    ˙˙˙˙˙__bit_int_signed_mult(NM, (unsigned char *) &z,
    ˙˙˙˙˙˙˙˙˙˙˙˙ N, (const unsigned char *) &x,
    ˙˙˙˙˙˙˙˙˙˙˙˙ M, (const unsigned char *) &y);


    How would you write a generic user function that operates on any
    size BitInt? For example:
    ˙˙ _BitInt(?) bi_square(_BitInt(?));


    You can't. _BitInt(N) and _BitInt(M) are distinct types, for
    differing N and M. You can't write a generic user function in C that
    implements "T foo(T)" where T can be "int", "short", "long int", or
    other types. C simply does not have type-generic functions.

    Sort of. C23 defines the term "generic function" (N3220 7.26.5.1,
    string search functions). For example, strchr() can take a const void* argument and return a const void* result, or it can take a void*
    argument and return a void* result. (C++ does this by having two
    overloaded strchr() functions.)

    These "generic functions" are (almost certainly) implemented as macros
    that use _Generic. If you bypass the macro definition, you get the
    function that can take a const char* and return a char*.

    So C doesn't have type-generic functions, but it does have feature that
    let you implement things that act like type-generic functions.


    Yes. It has also had type-generic maths functions for a good while.
    But it doesn't have a general generic function mechanism other than
    _Generic macros.

    You /can/ write generic macros that handle different _BitInt types,
    but that would quickly get painful given that you'd need a case for
    each size of _BitInt you wanted for the _Generic macro.

    Indeed. A _Generic selection that handles all the ordinary non-extended integer types needs to handle 12 cases if I'm counting correctly, which
    is feasible. But the addition of bit-precise types adds
    BITINT_MAXWIDTH*2-1 new distinct predefined types, and a generic
    selection would need one case for each.

    However, you could have a function that takes a void*, a size, and a
    width as arguments and operates on a _BitInt(?) or unsigned _BitInt(?)
    type. In fact, gcc has internal functions like that for multiplication
    and division. (You mentioned something like that in text that I've
    snipped.)


    You could, yes. I started thinking about how you might make one that
    didn't require the user to manually include the bitcount of the _BitInt
    to use it, but I couldn't figure out a good way. You can get a start,
    from using sizeof on the _BitInt parameter, but I can't think of a way
    to get bitcount exactly (even using _Generic's).

    [...]

    This assumes BitInts are passed and returned by value, but even
    using BitInt* wouldn't help.

    Yes, they are passed around as values - they are integer types and are
    passed around like other integer types. (Implementations may use
    stack blocks and pointers for passing the values around if they are
    too big for registers, just as implementations can do with any value
    type. That's an implementation detail - logically, they are passed and
    returned as values.)

    Yes, and in general a _BitInt argument has to be copied to the
    corresponding parameter, since a change to the parameter can't affect
    the value of the argument.

    The workings of C parameter passing were unfortunately cut in stone
    before anyone thought of passing large types as parameters. In
    hindsight it's easy to see it could have been better to say that
    function parameters are implicitly "const" and attempting to modify them
    is UB - just make a local copy if you want to make a change. But it's
    too late now!


    But passing huge _BitInts by value is no more problematic than passing
    huge structs by value.


    Exactly, yes.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Tue Nov 25 11:38:32 2025
    On 25/11/2025 02:03, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 24/11/2025 14:41, David Brown wrote:
    On 24/11/2025 13:31, bart wrote:
    That's all up to the implementation.
    You are worrying about completely negligible things here.

    Is it that negligible? That's easy to say when you're not doing the
    implementing! However it may impact on the size and performance of
    code.

    You're right, it's easy to say when I'm not doing the implementing.
    Which I'm not.

    The maintainers of gcc and llvm/clang have done that for me, so I don't
    have to worry about it.

    Are you planning to implement bit-precise integer types yourself? I
    don't think you've said so in this thread. If you are, you have at
    least two existing implementations you can look at for ideas.

    No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits, and played with 1/2/4 bits, but my view is that above this range, using
    exact bit-sizes is the wrong way to go.

    While for odd sizes up to 64 bits, bitfields are more apt than employing
    the type system.

    Here's an idea. Rather than asserting that _BitInt(1'000'000)
    is silly and obviously useless, try *asking* how it's useful.
    I personally don't know what I'd do with a million-bit integer,
    but maybe somebody out there has a valid use for it. Meanwhile,
    its existence doesn't bother me.

    Again, my view is that types like _BitInt(123456) (could they have made
    it any more fiddly to type?!) is the same mistake that early Pascal made
    with arrays.

    It is common that an N-array of T and an M-array of T are not
    compatible, but usually there are ways to deal generically with both.


    My guess is that once you've implemented integers wider than 128
    or 256 bits, million-bit integers aren't much extra effort.

    I've implemented 128-bit arithmetic, and have seen some scary-looking C
    code that implemented 256-bit arithmetic. Neither of those would scale
    to N-bits where N can be arbitrary large /and/ might not be a multiple
    of either 64 or 8.

    You would need pretty much the same algorithms as used for arbitrary precision. Those usually require N to be some multiple of 'limb' size.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Tue Nov 25 14:12:07 2025
    On Tue, 25 Nov 2025 11:38:32 +0000
    bart <bc@freeuk.com> wrote:


    No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
    and played with 1/2/4 bits, but my view is that above this range,
    using exact bit-sizes is the wrong way to go.


    Either that or manifestation of your NIH syndrome.
    Which explanation do you consider more likely?

    While for odd sizes up to 64 bits, bitfields are more apt than
    employing the type system.


    int sign_extend12(unsigned x)
    {
    return (_BitInt(12))x;
    }

    Nice, is not it?
    Doing the same with bit fields is possible, but less obvious and less convenient. Also it potentially can play havoc with compiler that took
    strict aliasing rules more seriously than they deserve.

    int sign_extend12(unsigned x)
    {
    struct bar {
    signed a: 12;
    };
    return ((struct bar*)&x)->a;;
    }

    Doing the same with shifts is almost as convenient as with _BitInt and
    it works great on all popular compilers, but according to wording of C
    Standard it is Undefined Behavior.

    int sign_extend12(unsigned x)
    {
    return (int32_t)((uint32_t)x << 20) >> 20;
    }


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Tue Nov 25 14:57:17 2025
    On 25/11/2025 12:12, Michael S wrote:
    On Tue, 25 Nov 2025 11:38:32 +0000
    bart <bc@freeuk.com> wrote:


    No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
    and played with 1/2/4 bits, but my view is that above this range,
    using exact bit-sizes is the wrong way to go.


    Either that or manifestation of your NIH syndrome.
    Which explanation do you consider more likely?

    I can invent anything I like. I've looked at such things many times, and
    came to the conclusion that using types is the wrong approach, certainly
    for this level of language.

    (Yes, long ago I allowed type denotations such as:

    int*N a a has N bytes or N*8 bits (from Fortran)
    int:N b b has N bits

    Then I realised I was never going to use anything other than some
    power-of-two size of 8 bits or more, for discrete variables.)



    While for odd sizes up to 64 bits, bitfields are more apt than
    employing the type system.


    int sign_extend12(unsigned x)
    {
    return (_BitInt(12))x;
    }

    Nice, is not it?

    By 'bitfields' I mean bitfields within structs, but also bitfield
    operators whch work on any integer values.

    Bitfields are nearly always unsigned in my projects, so I don't have an
    exact equivalent to this example.

    But a solution not using types would look like this:

    y := x.[0..11] # get first 12 bits
    y := x.[12..23] # next 12 bits

    x.[24..35] := y # set next 12 bits (x, y are 64 bits!)

    y := x.[0..i] # get first i+1 bits

    To optionally interpret a bitfield extraction as signed, I'd need to
    think up some way of denoting that. For bitfield insertion it doesn't
    matter.

    Your example is interesting but rather limited; while it does deal with
    a signed field:

    * That field can only start at bit zero, without extra manipulations

    * The size is fixed at 12 (if you decide to change the field size, or
    you want it as a constant parameter somewhere, it starts getting
    awkward)

    * If you are dealing with a range of bitfield sizes, you will need a
    dedicated function, or somehow enumerate all possibilities using
    _Generic.

    * It's not clear how bitfield insertion would work, whether you'd still
    employ a _BitInt type, and/or just revert to those shifts and masks.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Tue Nov 25 18:29:43 2025
    On Tue, 25 Nov 2025 14:57:17 +0000
    bart <bc@freeuk.com> wrote:

    On 25/11/2025 12:12, Michael S wrote:
    On Tue, 25 Nov 2025 11:38:32 +0000
    bart <bc@freeuk.com> wrote:


    No, apart from the usual set of 8/16/32/64 bits. I've done 128
    bits, and played with 1/2/4 bits, but my view is that above this
    range, using exact bit-sizes is the wrong way to go.


    Either that or manifestation of your NIH syndrome.
    Which explanation do you consider more likely?

    I can invent anything I like. I've looked at such things many times,
    and came to the conclusion that using types is the wrong approach,
    certainly for this level of language.

    (Yes, long ago I allowed type denotations such as:

    int*N a a has N bytes or N*8 bits (from Fortran)
    int:N b b has N bits

    Then I realised I was never going to use anything other than some power-of-two size of 8 bits or more, for discrete variables.)



    While for odd sizes up to 64 bits, bitfields are more apt than
    employing the type system.


    int sign_extend12(unsigned x)
    {
    return (_BitInt(12))x;
    }

    Nice, is not it?

    By 'bitfields' I mean bitfields within structs, but also bitfield
    operators whch work on any integer values.

    Bitfields are nearly always unsigned in my projects, so I don't have
    an exact equivalent to this example.

    But a solution not using types would look like this:

    y := x.[0..11] # get first 12 bits
    y := x.[12..23] # next 12 bits

    x.[24..35] := y # set next 12 bits (x, y are 64 bits!)

    y := x.[0..i] # get first i+1 bits

    To optionally interpret a bitfield extraction as signed, I'd need to
    think up some way of denoting that. For bitfield insertion it doesn't matter.

    Your example is interesting but rather limited; while it does deal
    with a signed field:

    * That field can only start at bit zero, without extra manipulations

    * The size is fixed at 12 (if you decide to change the field size, or
    you want it as a constant parameter somewhere, it starts getting
    awkward)

    * If you are dealing with a range of bitfield sizes, you will need a
    dedicated function, or somehow enumerate all possibilities using
    _Generic.

    * It's not clear how bitfield insertion would work, whether you'd
    still employ a _BitInt type, and/or just revert to those shifts and
    masks.




    My example is from real world. Dealing with A-to-D converters. I need
    sign extension of that sort quite often.
    * I don't recollect needing to sign-extend field that does not start at
    offset zero, but if it happens then logical left shift [before cast] is
    an obvious and natural solution.
    * My ADCs have fixed # of bits. It does not change in the middle of
    project. And even if it does then a new value is also fixed, so
    constant (enum or define) works fine.
    Same for your other points - I don't recollect that I neeed something
    like that sufficiently often to ... well... recollect.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Tue Nov 25 18:33:30 2025
    On 25/11/2025 16:29, Michael S wrote:
    On Tue, 25 Nov 2025 14:57:17 +0000
    bart <bc@freeuk.com> wrote:

    On 25/11/2025 12:12, Michael S wrote:
    On Tue, 25 Nov 2025 11:38:32 +0000
    bart <bc@freeuk.com> wrote:


    No, apart from the usual set of 8/16/32/64 bits. I've done 128
    bits, and played with 1/2/4 bits, but my view is that above this
    range, using exact bit-sizes is the wrong way to go.


    Either that or manifestation of your NIH syndrome.
    Which explanation do you consider more likely?

    I can invent anything I like. I've looked at such things many times,
    and came to the conclusion that using types is the wrong approach,
    certainly for this level of language.

    (Yes, long ago I allowed type denotations such as:

    int*N a a has N bytes or N*8 bits (from Fortran)
    int:N b b has N bits

    Then I realised I was never going to use anything other than some
    power-of-two size of 8 bits or more, for discrete variables.)



    While for odd sizes up to 64 bits, bitfields are more apt than
    employing the type system.


    int sign_extend12(unsigned x)
    {
    return (_BitInt(12))x;
    }

    Nice, is not it?

    By 'bitfields' I mean bitfields within structs, but also bitfield
    operators whch work on any integer values.

    Bitfields are nearly always unsigned in my projects, so I don't have
    an exact equivalent to this example.

    But a solution not using types would look like this:

    y := x.[0..11] # get first 12 bits
    y := x.[12..23] # next 12 bits

    x.[24..35] := y # set next 12 bits (x, y are 64 bits!)

    y := x.[0..i] # get first i+1 bits

    To optionally interpret a bitfield extraction as signed, I'd need to
    think up some way of denoting that. For bitfield insertion it doesn't
    matter.

    Your example is interesting but rather limited; while it does deal
    with a signed field:

    * That field can only start at bit zero, without extra manipulations

    * The size is fixed at 12 (if you decide to change the field size, or
    you want it as a constant parameter somewhere, it starts getting
    awkward)

    * If you are dealing with a range of bitfield sizes, you will need a
    dedicated function, or somehow enumerate all possibilities using
    _Generic.

    * It's not clear how bitfield insertion would work, whether you'd
    still employ a _BitInt type, and/or just revert to those shifts and
    masks.




    My example is from real world. Dealing with A-to-D converters. I need
    sign extension of that sort quite often.

    OK, I've looked at datasheets for two 12-ADCs. Both had a choice of
    analog inputs, and in both the digital value was clocked out serially
    (one with the input channel number as 4 extra bits).

    The first apparently had a pin-selectable signed/unsigned mode; the
    second didn't mention that, but did mention 000h and FFFh limits which
    suggest unsigned.

    But in any case, some extra circuitry would be needed to get the 12
    parallel bits before they can be input via a 16-bit read. Here, you
    might just tie D11-D15 together, so that a twos complement 12-bit value becomes a 16-bit one.

    Or maybe the CPU has its own serial input pin. The point is, the whole
    thing is a rather trivial matter, and it can be taken care of in several places.

    I don't know the details in your case, but if BitInt helps you save a
    couple of lines of code, then fine. Although I don't think this feature
    would be worth adding just for that purpose.

    (The only ADCs I've used were 4-bit (homemade) and 8-bit, both giving
    unsigned data in parallel, used for frame-grabbing video circuits so
    read directly into memory rather than via an explicit memory- or
    port-read instruction.)






    * I don't recollect needing to sign-extend field that does not start
    offset zero,

    So what's in the rest of the 32-bit field, garbage?


    Same for your other points - I don't recollect that I neeed something
    like that sufficiently often to ... well... recollect.

    Yours is one of a thousand possible applications. Everyone will have
    different needs. Maybe someone else will have a 16 or 32-bit value with assorted bitfields of different widths.

    Then maybe C bitfields could be used, but a bigger problem with those is
    poor control over layout, which is anyway implementation-defined. (Mine
    of course don't have that problem!)

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Tue Nov 25 21:25:01 2025
    On 24/11/2025 23:27, bart wrote:
    On 24/11/2025 20:26, David Brown wrote:
    On 24/11/2025 19:35, bart wrote:

    There is just the poor gnu extension where 128-bit integers didn't
    have a literal form, and there was no way to print such values.


    How many times have you felt the need to write a 128-bit literal?˙ And
    how many times has that literal been in decimal

    I don't think there were hex literals either.


    (it's not difficult to put together a 128-bit value from two 64-bit
    values)?˙ You really are making a mountain out of a molehill here.

    Well, it seems that such literals now exist (with 'wb' suffix). So I
    guess somebody other than you decided that feature WAS worth adding!

    But you can't as yet print out such values; I guess you can't 'scanf'
    them either. These are necessary to perform I/O on such data from/to
    text files.

    I must say you have a very laidback attibute to language design:

    "Let's add this 128-bit type, but let's not bother providing a way to
    enter such values, or add any facilities to print them out. How often
    would somebody need to do that anyway? But if they really /have/ to,
    then there are plenty of hoops they can jump through to achieve it!"

    (In my implementation of 128-bit types, from 2021, I allowed full 128-
    bit decimal, hex and binary literals, and they could be printed in any
    base.

    But they weren't used enough and were dropped, in favour of an unlimited precision type in my other language.

    On interesting use-case for literals was short-strings; 128 bits allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I think C is still stuck at one, or 4 if you're lucky.)


    I have no idea or opinion on why /you/ might want 128-bit or larger
    integer types. I believe there is very little use for "normal" numbers
    - things you might want to write as literals, calculate with, and read
    or write - that won't fit perfectly well within 64 bit types, and would
    not be better served by arbitrary sized integers. Arbitrary sized
    integers are a very different kettle of fish from large fixed-size
    integers, and are not something that would fit in the C language - they
    need a library.

    I can tell you why /I/ might find larger integer types useful. They
    include :

    * 128-bit for IPv6 address. These use a variety of styles for input and display, and thus would use specialised routines, not simple literals or printf-style IO.

    * Big units for passing data around with larger memory transfers, using
    SIMD registers. IO is irrelevant here.

    * Cryptography. IO is irrelevant here. But a variety of sizes are
    useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048, 3072,
    4096, 7680, 8096 bits. There may be more common sizes - I'm just
    thinking of DES, 3DES, AES, SHA, ECC and RSA.



    Smaller sizes can be useful for holding RGB pixel values, audio data, etc.


    In none of these cases are bit-precise integer types essential. People
    have been doing cryptography for a long time without them. But they can
    be convenient, and help people write code that is simpler, clearer, or
    more directly expresses their intent. The only specific additional
    power you get from these is that you can do arithmetic on bigger types
    without having to write the code manually. I don't know if compilers currently do a good enough job for that to be suitable for
    multiplication and modulo of larger integers (addition is easy, but for
    big sizes, smarter multiplication techniques can be a significant
    performance gain).


    But those are just the uses /I/ see for them, in things /I/ work with.
    (I might also use them for FPGA programming in the future, but I'm not
    doing that at the moment.) However, unlike some people, I don't think
    the C language should pick features based purely on what I personally
    want to use, or what would be even sillier, what I personally think is
    easy to implement in a compiler. Other people will have other uses for different sizes.



    But now there is this huge leap, not only to 128/256/512/1024 bits,
    but to conceivably millions, plus the ability to specify any weird
    type you like, like 182 bits (eg. somebody makes a typo for
    _BitInt(128), but they silently get a viable type that happens to be
    a little less efficient!).


    And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,

    And 821 bits. This is what I don't get. Why is THAT so important?

    Why couldn't 128/256/etc have been added first, and then those funny
    ones if the demand was still there?

    The folks behind the proposal provided both. The fact that you can
    write _BitInt(821) does not in any way hinder use of _BitInt(256). I
    really don't get your problem here.


    If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
    set of types by a few more entries on the right, say 'u128 u256 u512',
    would anyone have been clamouring for types like 'u1187'? I doubt it.

    /You/ might not have wanted them, but other people would.


    For sub-64-bit types on conventional hardware, I simply can't see the
    point, not if they are rounded up anyway. Either have a full range-based types like Ada, or not at all.


    Fortunately for the C world, you are not on the C committee - it doesn't matter if you can't see beyond the end of your nose.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Tue Nov 25 21:54:20 2025
    On 25/11/2025 13:12, Michael S wrote:
    On Tue, 25 Nov 2025 11:38:32 +0000
    bart <bc@freeuk.com> wrote:


    No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
    and played with 1/2/4 bits, but my view is that above this range,
    using exact bit-sizes is the wrong way to go.


    Either that or manifestation of your NIH syndrome.
    Which explanation do you consider more likely?

    While for odd sizes up to 64 bits, bitfields are more apt than
    employing the type system.


    int sign_extend12(unsigned x)
    {
    return (_BitInt(12))x;
    }

    Nice, is not it?
    Doing the same with bit fields is possible, but less obvious and less convenient. Also it potentially can play havoc with compiler that took
    strict aliasing rules more seriously than they deserve.

    int sign_extend12(unsigned x)
    {
    struct bar {
    signed a: 12;
    };
    return ((struct bar*)&x)->a;;
    }


    int sign_extend12(unsigned x)
    {
    union {
    struct { unsigned u : 12; };
    struct { signed s : 12; };
    } u = {{ x }};
    return u.s;
    }


    No need for messing about with aliases - type-punning unions are safe
    and efficient (on good compilers).

    But the _BitInt version is definitely neater. I can see myself using _BitInt(12) and similar sizes for things like values read from hardware sensors of different resolutions.

    (The code for all three is the same with gcc on x86 or arm64 -
    unfortunately, gcc does not yet support _BitInt on many targets.)


    Doing the same with shifts is almost as convenient as with _BitInt and
    it works great on all popular compilers, but according to wording of C Standard it is Undefined Behavior.

    int sign_extend12(unsigned x)
    {
    return (int32_t)((uint32_t)x << 20) >> 20;
    }



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Tue Nov 25 23:11:58 2025
    On Tue, 25 Nov 2025 14:12:07 +0200
    Michael S <already5chosen@yahoo.com> wrote:

    Doing the same with shifts is almost as convenient as with _BitInt and
    it works great on all popular compilers, but according to wording of C Standard it is Undefined Behavior.

    int sign_extend12(unsigned x)
    {
    return (int32_t)((uint32_t)x << 20) >> 20;
    }


    Before someone corrects me, I'd correct myself: the code above does not
    contain Undefined Behavior. It's merely Implementation Defined Behavior.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Tue Nov 25 13:42:37 2025
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    But the _BitInt version is definitely neater. I can see myself using _BitInt(12) and similar sizes for things like values read from
    hardware sensors of different resolutions.

    (The code for all three is the same with gcc on x86 or arm64 -
    unfortunately, gcc does not yet support _BitInt on many targets.)
    [...]

    Is support for _BitInt limited by target or by version?

    It looks like _BitInt support was introduced in gcc 14.1.0. You might
    have older versions of gcc on other platforms.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Tue Nov 25 21:58:02 2025
    On 25/11/2025 20:25, David Brown wrote:
    On 24/11/2025 23:27, bart wrote:

    On interesting use-case for literals was short-strings; 128 bits
    allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I
    think C is still stuck at one, or 4 if you're lucky.)


    I have no idea or opinion on why /you/ might want 128-bit or larger
    integer types.˙ I believe there is very little use for "normal" numbers
    - things you might want to write as literals, calculate with, and read
    or write - that won't fit perfectly well within 64 bit types, and would
    not be better served by arbitrary sized integers.


    ˙ Arbitrary sized
    integers are a very different kettle of fish from large fixed-size
    integers, and are not something that would fit in the C language - they
    need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000 bits.

    Maybe the latter is autoranging, and might give a 200,000-bit result.

    Presumably the former doesn't use inline code, so it would be surprising
    if each distinct size of BitInt had dedicated sets of routines for this.
    So it sounds like they have to use a generic library anyway.

    And sure enough, gcc-generated code contains stuff like this:

    mov r8, rcx
    mov edx, 50000 # (BitInt(50000)
    mov rcx, rax
    call __mulbitint3

    So, BitInts are different in that they /don't/ need a library?


    I can tell you why /I/ might find larger integer types useful.˙ They
    include :

    * 128-bit for IPv6 address.˙ These use a variety of styles for input and display, and thus would use specialised routines, not simple literals or printf-style IO.

    So, a better fit for a struct then? Here I'm curious as to what
    BitInt(128) brings to the table.


    * Big units for passing data around with larger memory transfers, using
    SIMD registers.˙ IO is irrelevant here.

    Structs and arrays again spring to mind if you just want an anonymous
    data block. (I wonder why it has to be bit-precise for byte-addressed
    memory?)


    * Cryptography.˙ IO is irrelevant here.˙ But a variety of sizes are
    useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048, 3072, 4096, 7680, 8096 bits.˙ There may be more common sizes - I'm just
    thinking of DES, 3DES, AES, SHA, ECC and RSA.

    And I'm again curious as to what /non-numeric/ use a 200,000-bit BitInt
    might be put to, that is not better served by an array or struct.

    Maybe bit-sets? But there are no special features for accessing
    individual bits.

    That BigInt() defaults to a signed integer (twos complement?), even for
    very large sizes suggests that /numeric/ applications are a primary use.



    Smaller sizes can be useful for holding RGB pixel values, audio data, etc.

    Except that these are probably rounded up, to the next multiple of two.
    So the benefit is minimal; it do something with those padding bits.

    And 821 bits. This is what I don't get. Why is THAT so important?

    Why couldn't 128/256/etc have been added first, and then those funny
    ones if the demand was still there?

    The folks behind the proposal provided both.˙ The fact that you can
    write _BitInt(821) does not in any way hinder use of _BitInt(256).˙ I
    really don't get your problem here.

    You've heard of 'code smell'? Well, this is the same, but for features.

    I've been doing this stuff long enough to recognise when a feature is over-elaborate, over-specified and over-flexible. You need to know the
    minimum you can get away with, not the maximum!

    Let me guess, some committee members have been looking too long at how
    C++ does things? That language is utterly incapable of creating anything
    small and simple.



    If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
    set of types by a few more entries on the right, say 'u128 u256 u512',
    would anyone have been clamouring for types like 'u1187'? I doubt it.

    /You/ might not have wanted them, but other people would.



    OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
    signed integer. It might only have two values of 0 and -1; doesn't
    nobody want that particular combination?





    For sub-64-bit types on conventional hardware, I simply can't see the
    point, not if they are rounded up anyway. Either have a full range-
    based types like Ada, or not at all.


    Fortunately for the C world, you are not on the C committee - it doesn't matter if you can't see beyond the end of your nose.

    Maybe unfortunately. C used to be a fairly simple language with a lot of baggage; now it's a much heftier one with a lot of baggage!

    At least, I've been able to add to my collection of C types that
    represent an 8-bit byte:

    signed char
    unsigned char
    int8_t
    uint8_t
    _BitInt(8)
    unsigned _BitInt(8)

    The last two are apparently incompatible with the char versions.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Tue Nov 25 15:20:43 2025
    bart <bc@freeuk.com> writes:
    On 25/11/2025 20:25, David Brown wrote:
    [...]
    ˙ Arbitrary sized integers are a very different kettle of fish from
    large fixed-size integers, and are not something that would fit in
    the C language - they need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000
    bits.

    It's not about the code that implements multiplication. In gcc, that's
    done by calling a built-in function that can operate on arbitrary data
    widths.

    Think about memory management.

    A _BitInt(128) object has a fixed size, like a struct. It can be
    allocated locally ("on the stack"), passed to a function, returned
    as a function result, used in expressions, etc. Likewise for
    _BitInt(2048).

    A hypothetical _BitInt(*) object would require an amount of storage
    that varies with its current value. That storage would have to be
    allocated using malloc() or equivalent, and deallocated using free()
    or equivalent. C++ template classes with automatically invoked
    constructors and destructors are great for that kind of thing.
    C has no such mechanisms, and there's little support for adding
    it just for this feature. (There are C container libraries.
    I haven't used them, but they tend to require construction and
    destruction to be explicit.)

    Perhaps a future standard will provide a more flexible flavor of
    _BitInt. It might allow the n in _BitInt(n) to be non-constant, or
    empty, or "*", to denote an arbitrary-precision integer. But it's
    hard to see how that could be done without adding other fundamental
    features to the language. And a lot of people's response would be
    that if you want C++, you know where to find it.

    Similarly, C99 added complex types as a built-in language feature.
    C++ added complex types as a template class, because C++ has language
    features that support that kind of thing, including user-defined
    literals.

    If you can think of a way to add arbitrary-precision integers to C
    without other radical changes to the language, let us know.

    It could also be nice to be able to write code that deals with
    multiple widths of _BitInt types, as we can do for arrays even
    without VLAs. But C's treatment of arrays is messy, and I'm not
    sure duplicating that mess for _BitInt types would be a great idea.
    And I wouldn't want to lose the ability to pass _BitInt values
    to functions.

    [...]

    So, a better fit for a struct then? Here I'm curious as to what
    BitInt(128) brings to the table.

    It brings a 128-bit integer type with constants and straightforward
    assignment, comparison, and arithmetic operators.

    [...]

    That BigInt() defaults to a signed integer (twos complement?), even
    for very large sizes suggests that /numeric/ applications are a
    primary use.

    Yes, C23 requires two's-complement for signed integers. (It mandates two's-complement representation, not wraparound behavior; signed
    overflow is still UB).

    [...]

    OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
    signed integer. It might only have two values of 0 and -1; doesn't
    nobody want that particular combination?

    I don't know. The language allows 1-bit signed bit-fields, so
    _BitInt(1) would make some sense, but the language requires N to
    be at least 1 for unsigned _BitInt and 2 for signed _BitInt.

    It doesn't bother me too much, since I'm unlikely to have a
    use for signed _BitInt(1). But it's an arbitrary restriction.
    (And I thought you liked arbitrary restrictions.)

    [...]

    At least, I've been able to add to my collection of C types that
    represent an 8-bit byte:

    signed char
    unsigned char
    int8_t
    uint8_t
    _BitInt(8)
    unsigned _BitInt(8)

    The last two are apparently incompatible with the char versions.

    You forgot plain char, int_least8_t, and uint_least8_t. And of
    course the char types are CHAR_BIT bits, not necessarily 8 bits.

    It's mildly interesting that unsigned _BitInt(8) gives you a way to
    define an octet even on systems with CHAR_BIT > 8. But of course an
    unsigned _BitInt(8) object will still have a size of CHAR_BIT bits.
    (Again, saving space on ordinary hardware isn't part of the rationale
    for _BitInt types.)

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Wed Nov 26 02:08:05 2025
    On 25/11/2025 23:20, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 25/11/2025 20:25, David Brown wrote:
    [...]
    ˙ Arbitrary sized integers are a very different kettle of fish from
    large fixed-size integers, and are not something that would fit in
    the C language - they need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that for
    multiplying two abitrary-precision ints that happen to be 100,000
    bits.

    It's not about the code that implements multiplication. In gcc, that's
    done by calling a built-in function that can operate on arbitrary data widths.

    Think about memory management.

    Well, I was responding to a suggestion that BitInt support didn't need a library.

    But memory management is a good point. Actual, variable-sized bigints
    would be awkward in C if you want to use them in ordinary expressions.

    Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.



    Perhaps a future standard will provide a more flexible flavor of
    _BitInt. It might allow the n in _BitInt(n) to be non-constant, or
    empty, or "*", to denote an arbitrary-precision integer. But it's
    hard to see how that could be done without adding other fundamental
    features to the language. And a lot of people's response would be
    that if you want C++, you know where to find it.

    I think I would have responded better to BitInt if presented as a
    'bit-set', effectively a fixed-size bit-array, but passed by value.
    This is something that I'd considered myself at one time.

    Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal bitsets.)

    More significantly, an unbounded version could be passed by reference,
    with an accompanying length (I could also use slices that have the
    length) as happens with arrays in C.

    Similarly, C99 added complex types as a built-in language feature.
    C++ added complex types as a template class, because C++ has language features that support that kind of thing, including user-defined
    literals.

    If you can think of a way to add arbitrary-precision integers to C
    without other radical changes to the language, let us know.

    I have considered adding my actual arbitrary precision library to my
    systems language. It would have been superfical (such types would not be nestable within other data structures), but would have been simpler to
    use than function calls.

    Some degree of automatic memory management would have been needed
    (initialise locals on function entry, free on exit, deal with
    intermediates), but not on the C++ scale due to the restrictions.

    But I rejected that as being too high-level a feature, and my use-cases
    more suitable for a scripting language.


    It could also be nice to be able to write code that deals with
    multiple widths of _BitInt types, as we can do for arrays even
    without VLAs. But C's treatment of arrays is messy, and I'm not
    sure duplicating that mess for _BitInt types would be a great idea.
    And I wouldn't want to lose the ability to pass _BitInt values
    to functions.

    [...]

    So, a better fit for a struct then? Here I'm curious as to what
    BitInt(128) brings to the table.

    It brings a 128-bit integer type with constants and straightforward assignment, comparison, and arithmetic operators.

    I was commenting on the ipv6 example, where structs give you that
    already, except arithmetic which makes little sense.


    [...]

    That BigInt() defaults to a signed integer (twos complement?), even
    for very large sizes suggests that /numeric/ applications are a
    primary use.

    Yes, C23 requires two's-complement for signed integers. (It mandates two's-complement representation, not wraparound behavior; signed
    overflow is still UB).

    Even though it will now likely be under software control? OK.

    At least, I've been able to add to my collection of C types that
    represent an 8-bit byte:

    signed char
    unsigned char
    int8_t
    uint8_t
    _BitInt(8)
    unsigned _BitInt(8)

    The last two are apparently incompatible with the char versions.

    You forgot plain char,

    I had char but took it out, as it's a outlier.

    int_least8_t, and uint_least8_t.

    And 'fast' versions? I still don't know what any of these mean! No other languages seem to have bothered.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Tue Nov 25 19:06:31 2025
    bart <bc@freeuk.com> writes:
    On 25/11/2025 23:20, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 25/11/2025 20:25, David Brown wrote:
    [...]
    ˙ Arbitrary sized integers are a very different kettle of fish from
    large fixed-size integers, and are not something that would fit in
    the C language - they need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that for
    multiplying two abitrary-precision ints that happen to be 100,000
    bits.
    It's not about the code that implements multiplication. In gcc,
    that's done by calling a built-in function that can operate on
    arbitrary data widths. Think about memory management.

    Well, I was responding to a suggestion that BitInt support didn't need
    a library.

    David didn't actually suggest that. He said that arbitrary-sized
    integers would need a library (and such libraries exist), not that
    fixed-size integers don't.

    The point, I think, is that arbitrary-sized integers, without radical
    changes to the language, would require a *visible* library while the
    _BitInt types are built into the language. Yes, some operations are implemented as function calls in some implementations. The same
    could be true for just about any operation. Some implementations
    have software floating-point. gcc implements a large struct
    assignment by generating a call to memcmp. And so on.

    But memory management is a good point. Actual, variable-sized bigints
    would be awkward in C if you want to use them in ordinary expressions.

    Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.

    Again, any such problems have already been solved by the gcc and
    llvm/clang implementations (aside from a clang problem with large multiplication and division). "This feature would be too difficult to implement" is a weak argument when implentations already exist.

    BTW, clang has had this feature (originally called _ExtInt rather than
    _BitInt) since 2019. Here's the git log entry. The committer is one of
    the authors of the N2021 paper, so the similarities are unsurprising.

    ```
    commit 61ba1481e200b5b35baa81ffcff81acb678e8508
    Author: Erich Keane <erich.keane@intel.com>
    Date: 2019-12-24 07:28:40 -0800

    Implement _ExtInt as an extended int type specifier.

    Introduction/Motivation:
    LLVM-IR supports integers of non-power-of-2 bitwidth, in the iN syntax.
    Integers of non-power-of-two aren't particularly interesting or useful
    on most hardware, so much so that no language in Clang has been
    motivated to expose it before.

    However, in the case of FPGA hardware normal integer types where the
    full bitwidth isn't used, is extremely wasteful and has severe
    performance/space concerns. Because of this, Intel has introduced this
    functionality in the High Level Synthesis compiler[0]
    under the name "Arbitrary Precision Integer" (ap_int for short). This
    has been extremely useful and effective for our users, permitting them
    to optimize their storage and operation space on an architecture where
    both can be extremely expensive.

    We are proposing upstreaming a more palatable version of this to the
    community, in the form of this proposal and accompanying patch. We are
    proposing the syntax _ExtInt(N). We intend to propose this to the WG14
    committee[1], and the underscore-capital seems like the active direction
    for a WG14 paper's acceptance. An alternative that Richard Smith
    suggested on the initial review was __int(N), however we believe that
    is much less acceptable by WG14. We considered _Int, however _Int is
    used as an identifier in libstdc++ and there is no good way to fall
    back to an identifier (since _Int(5) is indistinguishable from an
    unnamed initializer of a template type named _Int).

    [0]https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html)
    [1]http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf

    Differential Revision: https://reviews.llvm.org/D73967
    ```

    [...]

    I think I would have responded better to BitInt if presented as a
    'bit-set', effectively a fixed-size bit-array, but passed by
    value. This is something that I'd considered myself at one time.

    Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal
    bitsets.)

    So rather than a new feature for wide integer types, you would
    have preferred something that DOESN'T SUPPORT ARITHMETIC?? How is
    that relevant to _BitInt? Bit vectors are great, but they aren't
    integers.

    This might interest you :
    https://github.com/michaeldipperstein/bitarray

    More significantly, an unbounded version could be passed by reference,
    with an accompanying length (I could also use slices that have the
    length) as happens with arrays in C.

    Right, like arrays of unsigned char.

    [...]

    It could also be nice to be able to write code that deals with
    multiple widths of _BitInt types, as we can do for arrays even
    without VLAs. But C's treatment of arrays is messy, and I'm not
    sure duplicating that mess for _BitInt types would be a great idea.
    And I wouldn't want to lose the ability to pass _BitInt values
    to functions.
    [...]

    So, a better fit for a struct then? Here I'm curious as to what
    BitInt(128) brings to the table.
    It brings a 128-bit integer type with constants and straightforward
    assignment, comparison, and arithmetic operators.

    I was commenting on the ipv6 example, where structs give you that
    already, except arithmetic which makes little sense.

    OK, I probably snipped too much context here. unsigned _BitInt(128)
    could be a reasonable way to represent an ipv6 address. So could
    unsigned char[16], or a struct containing an unsigned char[16].

    [...]

    At least, I've been able to add to my collection of C types that
    represent an 8-bit byte:

    signed char
    unsigned char
    int8_t
    uint8_t
    _BitInt(8)
    unsigned _BitInt(8)

    The last two are apparently incompatible with the char versions.
    You forgot plain char,

    I had char but took it out, as it's a outlier.

    OK, whatever works for you.

    int_least8_t, and uint_least8_t.

    And 'fast' versions? I still don't know what any of these mean! No
    other languages seem to have bothered.

    The "fast" versions could be larger than 8 bits (though I'm mildly
    surprised to see that [u]int8_fast_t types *are* 8 bits on several
    compilers I just tried).

    Of course C++ and Objective-C incorporate C's standard library.

    You say you don't know what they mean. Do you *want* to know?
    You can always read the standard's description if you're curious.
    I never assume that saying you don't know something means that you
    want to know about it.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Tue Nov 25 19:21:03 2025
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    bart <bc@freeuk.com> writes:
    [...]
    OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
    signed integer. It might only have two values of 0 and -1; doesn't
    nobody want that particular combination?

    I don't know. The language allows 1-bit signed bit-fields, so
    _BitInt(1) would make some sense, but the language requires N to
    be at least 1 for unsigned _BitInt and 2 for signed _BitInt.

    It doesn't bother me too much, since I'm unlikely to have a
    use for signed _BitInt(1). But it's an arbitrary restriction.
    [...]

    I just learned that there's a proposal to allow _BitInt(1) in C2y.

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3699.pdf

    The current restriction apparently was for historical reasons.
    Prior to C23, C didn't require two's complement for signed types,
    and signed _BitInt(1) doesn't make much sense for one's complement
    or sign-and-magnitude (it could only hold +0 and -0).

    Yes, C23 added both _BitInt and the requirement for two's complement,
    but preliminary implementations of _BitInt go back several years,
    and the requirements didn't catch up. Stuff happens.

    Incidentally, C23 requires BITINT_MAXWIDTH to be at least
    ULLONG_WIDTH, which is at least 64. clang/llvm sets it to 128 for
    some target systems.


    https://eisenwave.github.io/cpp-proposals/bitint.html
    is a proposal to add C23-style bit-precise integers to C++.
    </OT>

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 26 08:55:37 2025
    On 25/11/2025 22:58, bart wrote:
    On 25/11/2025 20:25, David Brown wrote:
    On 24/11/2025 23:27, bart wrote:

    On interesting use-case for literals was short-strings; 128 bits
    allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I
    think C is still stuck at one, or 4 if you're lucky.)


    I have no idea or opinion on why /you/ might want 128-bit or larger
    integer types.˙ I believe there is very little use for "normal"
    numbers - things you might want to write as literals, calculate with,
    and read or write - that won't fit perfectly well within 64 bit types,
    and would not be better served by arbitrary sized integers.


    ˙ Arbitrary sized integers are a very different kettle of fish from
    large fixed-size integers, and are not something that would fit in the
    C language - they need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000 bits.


    You are looking at things in completely the wrong way.

    Long before you start thinking of how to implement operations, think
    about what the types are at a fundamental level.

    A fixed-size integer is a value type of fixed, compile-time size. It is passed around as a value. Local instances can be put on a stack with compile-time fixed offsets (and thus using [sp + N] access modes in an implementation). The type has a single simple and obvious (albeit
    slightly implementation-dependent) bit representation. A _BitInt(32)
    will be identical at the low level to an int32_t. Bigger _BitInt types
    are just the same, only bigger. There is no difference in concept, or representation, whether the type is 32-bit or 32 million bits.

    An arbitrary sized integer is a dynamic type with variable size. The
    base object will hold information about pointers to data, sizes for that stored data - including both how much is in use, and how much is
    available. There are endless ways to make such types - you can support multiple allocation parts, or use a single contiguous allocation. You
    can store the data in binary, or some kind of packed decimal, or other formats. Passing them around might mean just passing around the base
    object, but sometimes you need to make deep copies. Operations might
    lead to heap memory allocations or deallocations.

    They are so /totally/ different that any similarities in the way you do
    a particular arithmetic operation are completely incidental.


    Maybe the latter is autoranging, and might give a 200,000-bit result.

    Presumably the former doesn't use inline code, so it would be surprising
    if each distinct size of BitInt had dedicated sets of routines for this.
    So it sounds like they have to use a generic library anyway.

    And sure enough, gcc-generated code contains stuff like this:

    ˙˙˙˙mov˙˙˙ r8, rcx
    ˙˙˙˙mov˙˙˙ edx, 50000˙˙˙˙˙˙ # (BitInt(50000)
    ˙˙˙˙mov˙˙˙ rcx, rax
    ˙˙˙˙call˙˙˙ __mulbitint3

    So, BitInts are different in that they /don't/ need a library?


    I can tell you why /I/ might find larger integer types useful.˙ They
    include :

    * 128-bit for IPv6 address.˙ These use a variety of styles for input
    and display, and thus would use specialised routines, not simple
    literals or printf-style IO.

    So, a better fit for a struct then? Here I'm curious as to what
    BitInt(128) brings to the table.


    A struct is certainly what I use today. But there may be times when it
    is convenient to hold the data in a single scalar object. Depending on
    the target device, registers, and operations, there might be registers
    that can hold a 128-bit scalar for passing it around, or for atomically accessing them.


    * Big units for passing data around with larger memory transfers,
    using SIMD registers.˙ IO is irrelevant here.

    Structs and arrays again spring to mind if you just want an anonymous
    data block. (I wonder why it has to be bit-precise for byte-addressed memory?)


    If I have a processor that has 256-bit vector registers, then moving
    data by loading and storing 256-bit blocks is going to be more efficient
    than doing a loop of 16 byte moves. Today, I would use uint64_t for the
    task, as the biggest type available. Why does it have to be
    bit-precise? It must be bit-precise because I would want to move 256
    bits - not 255 bits or 257 bits.


    * Cryptography.˙ IO is irrelevant here.˙ But a variety of sizes are
    useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
    3072, 4096, 7680, 8096 bits.˙ There may be more common sizes - I'm
    just thinking of DES, 3DES, AES, SHA, ECC and RSA.

    And I'm again curious as to what /non-numeric/ use a 200,000-bit BitInt might be put to, that is not better served by an array or struct.


    I don't have a use for a 200,000 bit integer type at the moment. But I
    cannot imagine any reason why the language specifications should have arbitrary limits. Are you suggesting that the C standards show say "You
    can have _BitInt's up to 8096 because someone found a use for them, but
    you can't have size 8097 and above - and 200,000 is right out - because someone else can't imagine they are useful" ?

    An implementation can - indeed, must - set a limit to the sizes it
    supports. Implementations can have many reasons to do so. Some implementations might have quite low limits (the size of "long long int"
    is the minimum allowed for conformance), but then that implementation
    might not be so useful to some people.

    Maybe bit-sets? But there are no special features for accessing
    individual bits.

    That BigInt() defaults to a signed integer (twos complement?), even for
    very large sizes suggests that /numeric/ applications are a primary use.


    Obviously the C standards should have made "_BitInt" signed up to size
    73 bits, and unsigned from then on. That would have been /so/ much
    clearer and simpler for everyone.



    Smaller sizes can be useful for holding RGB pixel values, audio data,
    etc.

    Except that these are probably rounded up, to the next multiple of two.
    So the benefit is minimal; it do something with those padding bits.


    I write C code. I want my C code to be clear and represent what I am handling, and then let the compiler do its job of generating efficient results. So if I am dealing with data that is 24-bit signed integer
    data, then _BitInt(24) (especially with a typedef name) is more accurate source code than "int" or "int32_t".

    And 821 bits. This is what I don't get. Why is THAT so important?

    Why couldn't 128/256/etc have been added first, and then those funny
    ones if the demand was still there?

    The folks behind the proposal provided both.˙ The fact that you can
    write _BitInt(821) does not in any way hinder use of _BitInt(256).˙ I
    really don't get your problem here.

    You've heard of 'code smell'? Well, this is the same, but for features.


    Your nose is blocked. Or to be more accurate, you are so obsessed with
    the idea that your own language is "perfect" that you simply cannot
    accept that other languages might have good features that your language
    does not, or that other programmers might want features that your
    language does not have.

    I've been doing this stuff long enough to recognise when a feature is over-elaborate, over-specified and over-flexible. You need to know the minimum you can get away with, not the maximum!

    NIH syndrome combined with megalomania. Other people do this stuff
    better than you.


    Let me guess, some committee members have been looking too long at how
    C++ does things? That language is utterly incapable of creating anything small and simple.


    And yet C and C++ programmers outnumber programmers of Bart's own
    language by millions. No language - except for yours, of course - is
    perfect. But it seems C and C++ are both pretty good for getting the
    job done.



    If the proposal had instead been simply to extend the 'u8 u16 u32
    u64' set of types by a few more entries on the right, say 'u128 u256
    u512', would anyone have been clamouring for types like 'u1187'? I
    doubt it.

    /You/ might not have wanted them, but other people would.



    OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
    signed integer. It might only have two values of 0 and -1; doesn't
    nobody want that particular combination?

    Apparently that one was ruled out. (I believe the C++ plans for _BitInt
    will allow it there - not because it is a useful type in itself, but
    because allowing it slightly simplifies generic programming with _BitInt types.)






    For sub-64-bit types on conventional hardware, I simply can't see the
    point, not if they are rounded up anyway. Either have a full range-
    based types like Ada, or not at all.


    Fortunately for the C world, you are not on the C committee - it
    doesn't matter if you can't see beyond the end of your nose.

    Maybe unfortunately. C used to be a fairly simple language with a lot of baggage; now it's a much heftier one with a lot of baggage!

    At least, I've been able to add to my collection of C types that
    represent an 8-bit byte:

    ˙˙ signed char
    ˙˙ unsigned char
    ˙˙ int8_t
    ˙˙ uint8_t
    ˙˙ _BitInt(8)
    ˙˙ unsigned _BitInt(8)

    The last two are apparently incompatible with the char versions.





    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Nov 26 11:12:47 2025
    On Tue, 25 Nov 2025 18:33:30 +0000
    bart <bc@freeuk.com> wrote:


    (The only ADCs I've used were 4-bit (homemade)

    Why am I not surprised? ;-)

    and 8-bit, both giving
    unsigned data in parallel, used for frame-grabbing video circuits so
    read directly into memory rather than via an explicit memory- or
    port-read instruction.)


    ADC technology is improving at decent rate.
    Recently we used converter with successive-approximation
    architecture that delivers better SNR than most delta-sigma
    converters of just few years ago. Without suffering from all
    dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.

    https://www.analog.com/en/products/ad4030-24.html


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Nov 26 11:29:16 2025
    On Tue, 25 Nov 2025 18:33:30 +0000
    bart <bc@freeuk.com> wrote:

    * I don't recollect needing to sign-extend field that does not
    start
    offset zero,

    So what's in the rest of the 32-bit field, garbage?


    Either garbage or zero or, rarely there could be meaningful flags.
    I don't see how the question is relevant.


    Same for your other points - I don't recollect that I neeed
    something like that sufficiently often to ... well... recollect.

    Yours is one of a thousand possible applications. Everyone will have different needs. Maybe someone else will have a 16 or 32-bit value
    with assorted bitfields of different widths.

    Then maybe C bitfields could be used, but a bigger problem with those
    is poor control over layout, which is anyway implementation-defined.
    (Mine of course don't have that problem!)

    According to the language of The Standard, it's not 'poor control'.
    As far as standard requirements goes, there is *no* control on layout of
    bit fields.
    Of course, implementer is encouraged to specify exact rules in his
    documents. In many (not all) cases bitfield layout is part of the ABI,
    so it is shared by all compilers on given platform. But that does not
    exactly help people that don't like reading ABI docs or compiler
    manuals. Also does not help those poor souls that try to write portable
    code.

    Shifts and masks provide much more solid ground.
    And combination of shifts with _BitInt() appears equally solid, but
    more convenient and more self-documenting.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Nov 26 11:52:07 2025
    On Tue, 25 Nov 2025 19:06:31 -0800
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:


    BTW, clang has had this feature (originally called _ExtInt rather than _BitInt) since 2019. Here's the git log entry. The committer is one
    of the authors of the N2021 paper, so the similarities are
    unsurprising.

    ```
    commit 61ba1481e200b5b35baa81ffcff81acb678e8508
    Author: Erich Keane <erich.keane@intel.com>
    Date: 2019-12-24 07:28:40 -0800

    Implement _ExtInt as an extended int type specifier.

    Introduction/Motivation:
    LLVM-IR supports integers of non-power-of-2 bitwidth, in the iN
    syntax. Integers of non-power-of-two aren't particularly interesting
    or useful on most hardware, so much so that no language in Clang has
    been motivated to expose it before.

    However, in the case of FPGA hardware normal integer types where
    the full bitwidth isn't used, is extremely wasteful and has severe
    performance/space concerns. Because of this, Intel has
    introduced this functionality in the High Level Synthesis compiler[0]
    under the name "Arbitrary Precision Integer" (ap_int for short).
    This has been extremely useful and effective for our users,
    permitting them to optimize their storage and operation space on an architecture where both can be extremely expensive.

    We are proposing upstreaming a more palatable version of this to
    the community, in the form of this proposal and accompanying patch.
    We are proposing the syntax _ExtInt(N). We intend to propose this to
    the WG14 committee[1], and the underscore-capital seems like the
    active direction for a WG14 paper's acceptance. An alternative that
    Richard Smith suggested on the initial review was __int(N), however
    we believe that is much less acceptable by WG14. We considered _Int,
    however _Int is used as an identifier in libstdc++ and there is no
    good way to fall back to an identifier (since _Int(5) is
    indistinguishable from an unnamed initializer of a template type
    named _Int).
    [0]https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html)
    [1]http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf

    Differential Revision: https://reviews.llvm.org/D73967
    ```

    [...]


    I like the feature in the form that it ended up, but I certainly
    dislike their motivation.
    [O.T. rant]
    High Level Synthesis, both by Altera (part of Intel in 2016-2024) and
    by Xilinx (part of AMD since 2022), is an archetypal snake oil.
    Bullshit doctors lure people into notion that they can save their time
    by not learning proper HDLs. But at the naive user than believed their
    crap end up spending more time rather than less.
    As far as Altera/Xilinx is concerned, short-term gain is that users
    make less efficient design, which mean that they have to by bigger,
    more expensive FPGA devices. But at the long term it is loss for FPGA ecosystem, because more people believe that FPGas are shite when in
    fact it's not FPGAs that are bad, but improper tools (HLS).
    [/O.T. rant]


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Nov 26 12:01:30 2025
    On Tue, 25 Nov 2025 13:42:37 -0800
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    David Brown <david.brown@hesbynett.no> writes:
    [...]
    But the _BitInt version is definitely neater. I can see myself
    using _BitInt(12) and similar sizes for things like values read from hardware sensors of different resolutions.

    (The code for all three is the same with gcc on x86 or arm64 - unfortunately, gcc does not yet support _BitInt on many targets.)
    [...]

    Is support for _BitInt limited by target or by version?

    It looks like _BitInt support was introduced in gcc 14.1.0. You might
    have older versions of gcc on other platforms.


    The most recent version of arm-none-eabi-gcc in my distribution of
    choice (msys2) is 13.3.0.
    I am too lazy to compile arm-none-eabi-gcc from source. Would rather
    wait.
    I suppose, David is like me in that regard, except that he probably
    uses even more conservative distribution.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Wed Nov 26 12:05:42 2025
    On 26/11/2025 07:55, David Brown wrote:
    On 25/11/2025 22:58, bart wrote:
    On 25/11/2025 20:25, David Brown wrote:
    On 24/11/2025 23:27, bart wrote:

    On interesting use-case for literals was short-strings; 128 bits
    allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'.
    I think C is still stuck at one, or 4 if you're lucky.)


    I have no idea or opinion on why /you/ might want 128-bit or larger
    integer types.˙ I believe there is very little use for "normal"
    numbers - things you might want to write as literals, calculate with,
    and read or write - that won't fit perfectly well within 64 bit
    types, and would not be better served by arbitrary sized integers.


    ˙ Arbitrary sized integers are a very different kettle of fish from
    large fixed-size integers, and are not something that would fit in
    the C language - they need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that for
    multiplying two abitrary-precision ints that happen to be 100,000 bits.


    You are looking at things in completely the wrong way.

    Long before you start thinking of how to implement operations, think
    about what the types are at a fundamental level.

    A fixed-size integer is a value type of fixed, compile-time size.˙ It is passed around as a value.˙ Local instances can be put on a stack with compile-time fixed offsets (and thus using [sp + N] access modes in an implementation).˙ The type has a single simple and obvious (albeit
    slightly implementation-dependent) bit representation.˙ A _BitInt(32)
    will be identical at the low level to an int32_t.˙ Bigger _BitInt types
    are just the same, only bigger.˙ There is no difference in concept, or representation, whether the type is 32-bit or 32 million bits.

    An arbitrary sized integer is a dynamic type with variable size.˙ The
    base object will hold information about pointers to data, sizes for that stored data - including both how much is in use, and how much is
    available.˙ There are endless ways to make such types - you can support multiple allocation parts, or use a single contiguous allocation.˙ You
    can store the data in binary, or some kind of packed decimal, or other formats.˙ Passing them around might mean just passing around the base object, but sometimes you need to make deep copies.˙ Operations might
    lead to heap memory allocations or deallocations.

    They are so /totally/ different that any similarities in the way you do
    a particular arithmetic operation are completely incidental.

    But BitInts /will/ need runtime library support?

    I've acknowledged in my last post that arbitrary precision would have
    memory management issues, /if/ you wanted to add them to the language in
    such a way that, if variables 'a b c d' had such a type, you can write:

    a = b + c * d;

    This is not what I had in mind; such arithmetic would use explicit
    function calls with explicit management of intermediates (like GMP).

    So from this point of view, fixed-size BitInts are better, but also a
    higher level ability than I would have considered added to the language.

    Even if BitInts were restricted to saner and smaller sizes, I'd consider actual arithmetic on 128 bits up to a few K bits and above a specialist,
    niche application.

    But logic operations (== & | ^) on unsigned BitInts are more reasonable (because they implement some features of bit-sets).

    For arithmetic on considerably larger numbers, I still think arbitrary precision is the best bet.


    Structs and arrays again spring to mind if you just want an anonymous
    data block. (I wonder why it has to be bit-precise for byte-addressed
    memory?)


    If I have a processor that has 256-bit vector registers, then moving
    data by loading and storing 256-bit blocks is going to be more efficient than doing a loop of 16 byte moves.˙ Today, I would use uint64_t for the task, as the biggest type available.˙ Why does it have to be bit-
    precise?˙ It must be bit-precise because I would want to move 256 bits -
    not 255 bits or 257 bits.

    By bit-precise I mean being able to specify 255 and 257 bits! Memory is usually expression in bytes or words; not bits.



    * Cryptography.˙ IO is irrelevant here.˙ But a variety of sizes are
    useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
    3072, 4096, 7680, 8096 bits.˙ There may be more common sizes - I'm
    just thinking of DES, 3DES, AES, SHA, ECC and RSA.

    And I'm again curious as to what /non-numeric/ use a 200,000-bit
    BitInt might be put to, that is not better served by an array or struct.


    I don't have a use for a 200,000 bit integer type at the moment.˙ But I cannot imagine any reason why the language specifications should have arbitrary limits.˙ Are you suggesting that the C standards show say "You
    can have _BitInt's up to 8096 because someone found a use for them, but
    you can't have size 8097 and above - and 200,000 is right out - because someone else can't imagine they are useful" ?

    And yet, integer widths have been roughly capped at double a machine
    word size for decades - until 64 bits came along and then few even
    bothered with double-width.

    Nobody thought how easy it would be to just have an integer of whatever
    size you like - you just generate whatever code is necessary to make it happen. We could have had BitInts on 32- and even 16-bit machines if
    only somebody had thought of it!



    An implementation can - indeed, must - set a limit to the sizes it
    supports.˙ Implementations can have many reasons to do so.˙ Some implementations might have quite low limits (the size of "long long int"
    is the minimum allowed for conformance), but then that implementation
    might not be so useful to some people.

    Maybe bit-sets? But there are no special features for accessing
    individual bits.

    That BigInt() defaults to a signed integer (twos complement?), even
    for very large sizes suggests that /numeric/ applications are a
    primary use.


    Obviously the C standards should have made "_BitInt" signed up to size
    73 bits, and unsigned from then on.˙ That would have been /so/ much
    clearer and simpler for everyone.

    Or unsigned could have been the default.




    Smaller sizes can be useful for holding RGB pixel values, audio data,
    etc.

    Except that these are probably rounded up, to the next multiple of
    two. So the benefit is minimal; it do something with those padding bits.


    I write C code.˙ I want my C code to be clear and represent what I am handling, and then let the compiler do its job of generating efficient results.˙ So if I am dealing with data that is 24-bit signed integer
    data, then _BitInt(24) (especially with a typedef name) is more accurate source code than "int" or "int32_t".

    Suddenly everybody is dealing with signed values of 12 and 24 bits!

    I actually had exactly that feature:

    int*3 a # from 1980s; a 3-byte or 24-bit signed type
    int:24 b # from 1990s; a 24-bit signed type

    Or at least, I had the syntax. Those odd values would have been
    rejected, as I didn't have support for them, or a way to emulate them
    (which is what BitInt(24) appears to do).

    So I got rid of the feature and ended up with int32 and then i32. (I
    think Zig allows types like i24 and i123456, presumably built upon
    LLVM's integer types which go up to 2**23 or 2**24 bits.)


    You've heard of 'code smell'? Well, this is the same, but for features.


    Your nose is blocked.˙ Or to be more accurate, you are so obsessed with
    the idea that your own language is "perfect" that you simply cannot
    accept that other languages might have good features that your language
    does not, or that other programmers might want features that your
    language does not have.

    I've been doing this stuff long enough to recognise when a feature is
    over-elaborate, over-specified and over-flexible. You need to know the
    minimum you can get away with, not the maximum!

    NIH syndrome combined with megalomania.˙ Other people do this stuff
    better than you.

    I've noticed that other languages tend to go overboard with things, and
    now it's happening to C.

    I made a decision to keep my systems language at a certain level
    regarding such things as the type system, while having lots of
    convenient micro-features:

    print int@(x+y).[52..62]

    This type-puns a float64 r-value expression into an int, and extracts
    that bitfield (which is the unsigned exponent field when float64 uses
    iee754).

    I'd be interested to see how you can do this better, using general
    language features (adding a dedicated .exponent property to floats would
    be cheating!).



    Let me guess, some committee members have been looking too long at how
    C++ does things? That language is utterly incapable of creating
    anything small and simple.


    And yet C and C++ programmers outnumber programmers of Bart's own
    language by millions.˙ No language - except for yours, of course - is perfect.˙ But it seems C and C++ are both pretty good for getting the
    job done.

    My systems language DOES have lots of very nice micro-features compared
    to C. And usually they are presented in a tidy fashion. I don't think
    there's any argument about that. (Look at C's ugly X-macros for example.)

    My language is not perfect; a big thing it's missing is Pascal-style enumeration types that are type-safe, that would detect a lot of errors.

    But as a systems language, it is much more enticing than C.

    (Today I need to start porting a 20Kloc application in my language, to
    C; proper C not machine transpiling. I'm not looking forward to all that typing!)

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 26 13:15:21 2025
    On 26/11/2025 03:08, bart wrote:
    On 25/11/2025 23:20, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 25/11/2025 20:25, David Brown wrote:
    [...]
    ˙˙ Arbitrary sized integers are a very different kettle of fish from
    large fixed-size integers, and are not something that would fit in
    the C language - they need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that for
    multiplying two abitrary-precision ints that happen to be 100,000
    bits.

    It's not about the code that implements multiplication.˙ In gcc, that's
    done by calling a built-in function that can operate on arbitrary data
    widths.

    Think about memory management.

    Well, I was responding to a suggestion that BitInt support didn't need a library.

    I did not say that. (You really need to get a better understanding of
    basic logic.) I said that arbitrary sized integers need a library - I
    did not say that fixed-sized integers do not need a library.

    Perhaps more clearly, arbitrary sized integers need a user-visible
    library in C. They need functions to allocate, deallocate, and copy the integers, as well as converting to and from normal integers, at a bare minimum.

    It is normal in C implementations that some operations are done with
    "hidden" library calls - functions in a "language support library" that
    you do not call directly. On an x86 machine, "x / y" might generate a
    divide instruction, while on a microcontroller it might generate a call
    to a "__divide_int" function in an internal compiler-specific library
    (with internal compiler-specific calling conventions). _BitInt support
    can certainly make use of such libraries, just like anything else in C.

    And it looks like the gcc implementation of _BitInt /does/ use such
    libraries for big enough _BitInt types, while using inline code for
    sizes that can be done reasonably efficiently. Clang, on the other
    hand, apparently generates inline code no matter what size of _BitInt
    you have. Those are implementation choices, and it's all hidden from
    the user.


    But memory management is a good point. Actual, variable-sized bigints
    would be awkward in C if you want to use them in ordinary expressions.


    Indeed.

    Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.

    You already support such types in C. If it is a problem, it is a
    problem that every vaguely compliant C compiler has already solved.

    struct Big { uint64_t xs[250000]; }

    That type is passed around, copied and assigned by value, even though it
    is 1 MB in size. _BigInt's don't add any new issues here.




    Perhaps a future standard will provide a more flexible flavor of
    _BitInt.˙ It might allow the n in _BitInt(n) to be non-constant, or
    empty, or "*", to denote an arbitrary-precision integer.˙ But it's
    hard to see how that could be done without adding other fundamental
    features to the language.˙ And a lot of people's response would be
    that if you want C++, you know where to find it.

    I think I would have responded better to BitInt if presented as a
    'bit-set',˙ effectively a fixed-size bit-array, but passed by value.
    This is something that I'd considered myself at one time.

    Certainly _BitInt's can be used as bitsets.


    Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal bitsets.)


    _BitInt's have logical operators. You can get access to individual bits
    from shifts and masks, just like for any other integer types.

    How efficiently a given compiler handles these is another matter -
    expect that early implementations will be correct but relatively
    inefficient and gradually improve as _BitInt's get more popular.

    More significantly, an unbounded version could be passed by reference,
    with an accompanying length (I could also use slices that have the
    length) as happens with arrays in C.

    _BitInt's have fixed sizes - if you want a variable size, use an array.
    No one is claiming that _BitInt types are somehow the perfect tool for
    any use-case.


    Similarly, C99 added complex types as a built-in language feature.
    C++ added complex types as a template class, because C++ has language
    features that support that kind of thing, including user-defined
    literals.

    If you can think of a way to add arbitrary-precision integers to C
    without other radical changes to the language, let us know.

    I have considered adding my actual arbitrary precision library to my
    systems language. It would have been superfical (such types would not be nestable within other data structures), but would have been simpler to
    use than function calls.

    Some degree of automatic memory management would have been needed (initialise locals on function entry, free on exit, deal with intermediates), but not on the C++ scale due to the restrictions.

    But I rejected that as being too high-level a feature, and my use-cases
    more suitable for a scripting language.

    Different languages can support different features in different ways. C cannot support types that involve memory management in a
    user-transparent manner - memory management is manual in C. In C++, it
    would be entirely possible to make arbitrary precision integers with
    automatic memory management. It would not even be particularly
    difficult (except for efficient implementation of arithmetic operations
    on large sizes), and not need any language changes. But that would not
    negate the uses of _BitInt, which is (AFAIUI) on its way into C++.



    It could also be nice to be able to write code that deals with
    multiple widths of _BitInt types, as we can do for arrays even
    without VLAs.˙ But C's treatment of arrays is messy, and I'm not
    sure duplicating that mess for _BitInt types would be a great idea.
    And I wouldn't want to lose the ability to pass _BitInt values
    to functions.

    [...]

    So, a better fit for a struct then? Here I'm curious as to what
    BitInt(128) brings to the table.

    It brings a 128-bit integer type with constants and straightforward
    assignment, comparison, and arithmetic operators.

    I was commenting on the ipv6 example, where structs give you that
    already, except arithmetic which makes little sense.


    Shifting and masking would definitely be useful operations. I can't see
    a point in adding or multiplying IPv6 addresses, but logical operations
    would definitely be useful. Things like netmasks are not always on neat
    octet boundaries.


    [...]

    That BigInt() defaults to a signed integer (twos complement?), even
    for very large sizes suggests that /numeric/ applications are a
    primary use.

    Yes, C23 requires two's-complement for signed integers.˙ (It mandates
    two's-complement representation, not wraparound behavior; signed
    overflow is still UB).

    Even though it will now likely be under software control? OK.


    They play by the same rules as all other integer types in C.

    At least, I've been able to add to my collection of C types that
    represent an 8-bit byte:

    ˙˙˙ signed char
    ˙˙˙ unsigned char
    ˙˙˙ int8_t
    ˙˙˙ uint8_t
    ˙˙˙ _BitInt(8)
    ˙˙˙ unsigned _BitInt(8)

    The last two are apparently incompatible with the char versions.

    You forgot plain char,

    I had char but took it out, as it's a outlier.

    int_least8_t, and uint_least8_t.

    And 'fast' versions? I still don't know what any of these mean! No other languages seem to have bothered.



    The "fast" versions are types that have a minimum given size, but might
    be faster than the exact or least versions for typical operations.

    So "int_fast32_t" is guaranteed to have at least 32 bits of precision,
    but is allowed to be bigger if that is faster. On x86, it is 64 bits
    because 64-bit arithmetic and register moves can often involve fewer
    masking or sign-extension operations than 32-bit operations. (Because
    of the oddities of the x86 world, it seems "int_fast8_t" is 8 bits
    rather than 64 bits.)


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 26 13:24:26 2025
    On 25/11/2025 22:42, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    But the _BitInt version is definitely neater. I can see myself using
    _BitInt(12) and similar sizes for things like values read from
    hardware sensors of different resolutions.

    (The code for all three is the same with gcc on x86 or arm64 -
    unfortunately, gcc does not yet support _BitInt on many targets.)
    [...]

    Is support for _BitInt limited by target or by version?


    Both - I expect it to be implemented for more targets in later versions :-)

    It looks like _BitInt support was introduced in gcc 14.1.0. You might
    have older versions of gcc on other platforms.


    It was added to x86 and AArch64 targets in gcc 14. It is not supported
    on any other targets as yet, as far as I know. Presumably it will come
    when someone has done the work for the backends. (Some of such implementations are target-independent, but some are backend specific.) Generally, x86-64 and AArch64 are the targets that get the most focus
    and support from the big companies, which 32-bit ARM, MIPS, PowerPC,
    etc., can often be a little slower due to fewer resources.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Wed Nov 26 12:45:04 2025
    On 26/11/2025 09:12, Michael S wrote:
    On Tue, 25 Nov 2025 18:33:30 +0000
    bart <bc@freeuk.com> wrote:


    (The only ADCs I've used were 4-bit (homemade)

    Why am I not surprised? ;-)

    and 8-bit, both giving
    unsigned data in parallel, used for frame-grabbing video circuits so
    read directly into memory rather than via an explicit memory- or
    port-read instruction.)


    ADC technology is improving at decent rate.
    Recently we used converter with successive-approximation
    architecture that delivers better SNR than most delta-sigma
    converters of just few years ago. Without suffering from all
    dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.

    https://www.analog.com/en/products/ad4030-24.html


    That's interesting; my 4-bit circuit also worked at 2M samples per
    second (128 samples every 52us), and probably would have worked much
    higher if I'd had the memory to store the results.

    This was in 1981.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Nov 26 15:08:54 2025
    On Wed, 26 Nov 2025 13:15:21 +0100
    David Brown <david.brown@hesbynett.no> wrote:


    I did not say that. (You really need to get a better understanding
    of basic logic.) I said that arbitrary sized integers need a library
    - I did not say that fixed-sized integers do not need a library.

    Perhaps more clearly, arbitrary sized integers need a user-visible
    library in C. They need functions to allocate, deallocate, and copy
    the integers, as well as converting to and from normal integers, at a
    bare minimum.


    Perhaps things will become even more clear if we make distinction
    between run-time library and compiler support library.

    In specific case of gcc, the latter is called libgcc. It is (almost)
    per architecture. (Almost) the same libgcc works on x86-64 Windows,
    Linux, BSD or Solaris. The same for other popular architectures.

    The former, on the other hand, is certainly different on different
    platforms with the same architecture, but sometimes can be different on
    the same platform/architecture. For example, newlib nowadays used
    almost exclusively on embedded platforms without real OS, but
    historically was invented for Linux, by people, not totally unlike Bart
    in their attitude, that hated code bloat of glibc.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Nov 26 15:31:06 2025
    On Wed, 26 Nov 2025 12:45:04 +0000
    bart <bc@freeuk.com> wrote:

    On 26/11/2025 09:12, Michael S wrote:
    On Tue, 25 Nov 2025 18:33:30 +0000
    bart <bc@freeuk.com> wrote:


    (The only ADCs I've used were 4-bit (homemade)

    Why am I not surprised? ;-)

    and 8-bit, both giving
    unsigned data in parallel, used for frame-grabbing video circuits
    so read directly into memory rather than via an explicit memory- or
    port-read instruction.)


    ADC technology is improving at decent rate.
    Recently we used converter with successive-approximation
    architecture that delivers better SNR than most delta-sigma
    converters of just few years ago. Without suffering from all
    dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.

    https://www.analog.com/en/products/ad4030-24.html


    That's interesting; my 4-bit circuit also worked at 2M samples per
    second (128 samples every 52us), and probably would have worked much
    higher if I'd had the memory to store the results.

    This was in 1981.

    I would guess that your circuit used Flash ADC architecture. https://en.wikipedia.org/wiki/Flash_ADC
    This architecture is great for low resolution and high sample rate, but
    can't be improved beyond 10-11 "true" bits of resolution. Or, may be,
    it can, but it's so hard that nobody bothers. Instead, high res/high
    rate converters use pipelined architecture - sort of cross between
    Flash and SAR. The cost of it is typically high power consumption.
    Also, resolution/SNR is still not as good as really good SAR.

    Example of pipelined ADC:
    https://www.analog.com/en/products/ad9652.html




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 26 15:08:27 2025
    On 26/11/2025 11:01, Michael S wrote:
    On Tue, 25 Nov 2025 13:42:37 -0800
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    David Brown <david.brown@hesbynett.no> writes:
    [...]
    But the _BitInt version is definitely neater. I can see myself
    using _BitInt(12) and similar sizes for things like values read from
    hardware sensors of different resolutions.

    (The code for all three is the same with gcc on x86 or arm64 -
    unfortunately, gcc does not yet support _BitInt on many targets.)
    [...]

    Is support for _BitInt limited by target or by version?

    It looks like _BitInt support was introduced in gcc 14.1.0. You might
    have older versions of gcc on other platforms.


    The most recent version of arm-none-eabi-gcc in my distribution of
    choice (msys2) is 13.3.0.
    I am too lazy to compile arm-none-eabi-gcc from source. Would rather
    wait.
    I suppose, David is like me in that regard, except that he probably
    uses even more conservative distribution.



    I have release 13.3 installed, but I haven't used it on any projects
    yet. I tend to use new releases on new projects, but I am very
    conservative about changing toolchains in existing projects.

    But for things like this, I use godbolt.org - it is /so/ much easier
    than testing individually. Just pick the compiler target and version
    from the list, and see if you get an error message when using _BitInt.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 26 15:49:58 2025
    On 26/11/2025 13:05, bart wrote:
    On 26/11/2025 07:55, David Brown wrote:
    On 25/11/2025 22:58, bart wrote:
    On 25/11/2025 20:25, David Brown wrote:
    On 24/11/2025 23:27, bart wrote:

    On interesting use-case for literals was short-strings; 128 bits
    allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. >>>>> I think C is still stuck at one, or 4 if you're lucky.)


    I have no idea or opinion on why /you/ might want 128-bit or larger
    integer types.˙ I believe there is very little use for "normal"
    numbers - things you might want to write as literals, calculate
    with, and read or write - that won't fit perfectly well within 64
    bit types, and would not be better served by arbitrary sized integers.


    ˙ Arbitrary sized integers are a very different kettle of fish from
    large fixed-size integers, and are not something that would fit in
    the C language - they need a library.

    Really? I wouldn't have thought there was any appreciable difference
    between the code for multiplying two 100,000-bit BitInts, and that
    for multiplying two abitrary-precision ints that happen to be 100,000
    bits.


    You are looking at things in completely the wrong way.

    Long before you start thinking of how to implement operations, think
    about what the types are at a fundamental level.

    A fixed-size integer is a value type of fixed, compile-time size.˙ It
    is passed around as a value.˙ Local instances can be put on a stack
    with compile-time fixed offsets (and thus using [sp + N] access modes
    in an implementation).˙ The type has a single simple and obvious
    (albeit slightly implementation-dependent) bit representation.˙ A
    _BitInt(32) will be identical at the low level to an int32_t.˙ Bigger
    _BitInt types are just the same, only bigger.˙ There is no difference
    in concept, or representation, whether the type is 32-bit or 32
    million bits.

    An arbitrary sized integer is a dynamic type with variable size.˙ The
    base object will hold information about pointers to data, sizes for
    that stored data - including both how much is in use, and how much is
    available.˙ There are endless ways to make such types - you can
    support multiple allocation parts, or use a single contiguous
    allocation.˙ You can store the data in binary, or some kind of packed
    decimal, or other formats.˙ Passing them around might mean just
    passing around the base object, but sometimes you need to make deep
    copies.˙ Operations might lead to heap memory allocations or
    deallocations.

    They are so /totally/ different that any similarities in the way you
    do a particular arithmetic operation are completely incidental.

    But BitInts /will/ need runtime library support?

    No, not if an implementation generates the code inline (as clang appears
    to do). An implementation /may/ use helper functions from a language
    support library - gcc does that, depending on the sizes of the _BitInt
    and the operations you are doing. That is no different from all sorts
    of other things in the language, and is not some external runtime
    library. Your code will not be calling "bigint.dll" or anything like that.


    I've acknowledged in my last post that arbitrary precision would have
    memory management issues, /if/ you wanted to add them to the language in such a way that, if variables 'a b c d' had such a type, you can write:

    ˙˙ a = b + c * d;


    Arbitrary precision integers have memory management issues no matter how
    you want to use them. They need dynamic memory. Either the language
    has some kind of automatic memory management (reference counting, RAII, garbage collection, etc.), or it must be done manually. It does not
    matter if you use operator notation or function-call notation - except
    that you cannot use operator notation with manual memory management.

    This is not what I had in mind; such arithmetic would use explicit
    function calls with explicit management of intermediates (like GMP).

    So from this point of view, fixed-size BitInts are better, but also a
    higher level ability than I would have considered added to the language.

    _BitInt's are certainly better in that they are scalar types with value semantics and no need for any dynamic memory. Of course arbitrary
    precision integers have other advantages. Although for some use-cases
    either would work, each can be significantly more appropriate for
    different situations.

    To my mind, the need for dynamic memory would mean arbitrary precision integers are not appropriate for C - either at the core language level,
    or as part of the standard library. I think it is reasonable to have different opinions as to how well fixed-size _BitInts are appropriate to
    have in the C core language, though as they are now in C23, the point is
    now moot.


    Even if BitInts were restricted to saner and smaller sizes, I'd consider actual arithmetic on 128 bits up to a few K bits and above a specialist, niche application.


    Fair enough.

    But logic operations (== & | ^) on unsigned BitInts are more reasonable (because they implement some features of bit-sets).

    For arithmetic on considerably larger numbers, I still think arbitrary precision is the best bet.



    Also fair enough.

    I don't think anyone is likely to be multiplying million-bit _BitInts in
    real code. But I don't think it is appropriate for the language
    standard to pick some arbitrary size and say "below that is fine, above
    that is too big and programmers should use something else". I don't
    think it is appropriate for compiler implementers either. (They may
    pick limits based on how they implement things internally - that's not
    an arbitrary limit.) Different people have different needs, and no
    particular limit fits all use-cases.

    Structs and arrays again spring to mind if you just want an anonymous
    data block. (I wonder why it has to be bit-precise for byte-addressed
    memory?)


    If I have a processor that has 256-bit vector registers, then moving
    data by loading and storing 256-bit blocks is going to be more
    efficient than doing a loop of 16 byte moves.˙ Today, I would use
    uint64_t for the task, as the biggest type available.˙ Why does it
    have to be bit- precise?˙ It must be bit-precise because I would want
    to move 256 bits - not 255 bits or 257 bits.

    By bit-precise I mean being able to specify 255 and 257 bits! Memory is usually expression in bytes or words; not bits.


    "Bit-precise" means "exactly the bit count I specify". I agree that for moving memory around, I would pick a bit size that is a multiple of 8,
    and very likely a power of 2.



    * Cryptography.˙ IO is irrelevant here.˙ But a variety of sizes are
    useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
    3072, 4096, 7680, 8096 bits.˙ There may be more common sizes - I'm
    just thinking of DES, 3DES, AES, SHA, ECC and RSA.

    And I'm again curious as to what /non-numeric/ use a 200,000-bit
    BitInt might be put to, that is not better served by an array or struct. >>>

    I don't have a use for a 200,000 bit integer type at the moment.˙ But
    I cannot imagine any reason why the language specifications should
    have arbitrary limits.˙ Are you suggesting that the C standards show
    say "You can have _BitInt's up to 8096 because someone found a use for
    them, but you can't have size 8097 and above - and 200,000 is right
    out - because someone else can't imagine they are useful" ?

    And yet, integer widths have been roughly capped at double a machine
    word size for decades - until 64 bits came along and then few even
    bothered with double-width.

    Nobody thought how easy it would be to just have an integer of whatever
    size you like - you just generate whatever code is necessary to make it happen. We could have had BitInts on 32- and even 16-bit machines if
    only somebody had thought of it!


    We certainly could have had these. And people /have/ thought about it.
    There are endless examples of libraries and "home-made" big integer
    types. The reason we have them /now/ is that some people have felt they
    were useful enough for their purposes that they bothered doing the work
    to implement them in clang and then write proposals to add them to the C standards. Getting something like this into standard C takes time,
    expertise, effort and money - commodities that are usually far less
    easily available than ideas and imagination.



    An implementation can - indeed, must - set a limit to the sizes it
    supports.˙ Implementations can have many reasons to do so.˙ Some
    implementations might have quite low limits (the size of "long long
    int" is the minimum allowed for conformance), but then that
    implementation might not be so useful to some people.

    Maybe bit-sets? But there are no special features for accessing
    individual bits.

    That BigInt() defaults to a signed integer (twos complement?), even
    for very large sizes suggests that /numeric/ applications are a
    primary use.


    Obviously the C standards should have made "_BitInt" signed up to size
    73 bits, and unsigned from then on.˙ That would have been /so/ much
    clearer and simpler for everyone.

    Or unsigned could have been the default.


    That would have been possible, but pointlessly out of step with all
    other integer types in C.




    Smaller sizes can be useful for holding RGB pixel values, audio
    data, etc.

    Except that these are probably rounded up, to the next multiple of
    two. So the benefit is minimal; it do something with those padding bits. >>>

    I write C code.˙ I want my C code to be clear and represent what I am
    handling, and then let the compiler do its job of generating efficient
    results.˙ So if I am dealing with data that is 24-bit signed integer
    data, then _BitInt(24) (especially with a typedef name) is more
    accurate source code than "int" or "int32_t".

    Suddenly everybody is dealing with signed values of 12 and 24 bits!


    I don't think I would count Michael and I as "everybody".

    But it is certainly true that data from hardware sensors is often of a resolution that does not fit exactly in a standard integer type size,
    and _BitInt - signed or unsigned - can be a clear way to work with these values.

    I actually had exactly that feature:

    ˙˙ int*3˙ a˙˙˙˙˙˙˙˙˙˙˙˙ # from 1980s; a 3-byte or 24-bit signed type
    ˙˙ int:24 b˙˙˙˙˙˙˙˙˙˙˙˙ # from 1990s; a 24-bit signed type

    Or at least, I had the syntax. Those odd values would have been
    rejected, as I didn't have support for them, or a way to emulate them
    (which is what BitInt(24) appears to do).

    So I got rid of the feature and ended up with int32 and then i32. (I
    think Zig allows types like i24 and i123456, presumably built upon
    LLVM's integer types which go up to 2**23 or 2**24 bits.)


    You've heard of 'code smell'? Well, this is the same, but for features.


    Your nose is blocked.˙ Or to be more accurate, you are so obsessed
    with the idea that your own language is "perfect" that you simply
    cannot accept that other languages might have good features that your
    language does not, or that other programmers might want features that
    your language does not have.

    I've been doing this stuff long enough to recognise when a feature is
    over-elaborate, over-specified and over-flexible. You need to know
    the minimum you can get away with, not the maximum!

    NIH syndrome combined with megalomania.˙ Other people do this stuff
    better than you.

    I've noticed that other languages tend to go overboard with things, and
    now it's happening to C.


    What seems to happen is that you read a little bit about a feature, then
    go bananas about how terrible it is because it is different from your
    own language - without learning about the feature, its use-cases, or why
    it was added to the language. When you are called out, and when -
    usually through many, many tea-spoon explanations - you understand the feature, you stick to your guns and continue to complain about it no
    matter how silly you sound.

    It's okay to think that simpler is sometimes better, or that you
    disagree with some of the newer features in C. People have different
    opinions on the direction newer C standards have taken. But if you want
    to critique a feature, do so on the basis of an understanding of the
    feature, an understanding of why it was added and what people might use
    it for, and an understanding of what pros and cons it has compared to alternatives. "I'm a genius language designer and I don't have this
    feature and I don't want to use it" is not a rational argument.


    I made a decision to keep my systems language at a certain level
    regarding such things as the type system, while having lots of
    convenient micro-features:

    ˙˙˙ print int@(x+y).[52..62]

    This type-puns a float64 r-value expression into an int, and extracts
    that bitfield (which is the unsigned exponent field when float64 uses iee754).

    I'd be interested to see how you can do this better, using general
    language features (adding a dedicated .exponent property to floats would
    be cheating!).


    What an absurd thing to ask for. You have a special feature in your
    language for writing obscure things that are rarely if ever useful in
    normal coding. Of course you can write the same effect in C, in a
    simple function a few lines long. And that's the way it should be -
    obscure things should not take up cognitive space that makes common
    things harder.



    Let me guess, some committee members have been looking too long at
    how C++ does things? That language is utterly incapable of creating
    anything small and simple.


    And yet C and C++ programmers outnumber programmers of Bart's own
    language by millions.˙ No language - except for yours, of course - is
    perfect.˙ But it seems C and C++ are both pretty good for getting the
    job done.

    My systems language DOES have lots of very nice micro-features compared
    to C. And usually they are presented in a tidy fashion. I don't think there's any argument about that. (Look at C's ugly X-macros for example.)

    My language is not perfect; a big thing it's missing is Pascal-style enumeration types that are type-safe, that would detect a lot of errors.

    But as a systems language, it is much more enticing than C.

    And that is presumably why it is so much more popular than C.


    (Today I need to start porting a 20Kloc application in my language, to
    C; proper C not machine transpiling. I'm not looking forward to all that typing!)


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Wed Nov 26 15:44:54 2025
    On 26/11/2025 14:49, David Brown wrote:
    On 26/11/2025 13:05, bart wrote:
    On 26/11/2025 07:55, David Brown wrote:

    NIH syndrome combined with megalomania.˙ Other people do this stuff
    better than you.

    I made a decision to keep my systems language at a certain level
    regarding such things as the type system, while having lots of
    convenient micro-features:

    ˙˙˙˙ print int@(x+y).[52..62]

    This type-puns a float64 r-value expression into an int, and extracts
    that bitfield (which is the unsigned exponent field when float64 uses
    iee754).

    I'd be interested to see how you can do this better, using general
    language features (adding a dedicated .exponent property to floats
    would be cheating!).


    What an absurd thing to ask for.

    You said, "Other people do this stuff better than you". Presumably,
    devising language features. So I gave an example of a small task, and
    asked which features those people would devise, or what solution they
    would use.

    ˙ You have a special feature in your
    language for writing obscure things that are rarely if ever useful in
    normal coding.

    Yes, I call them 'micro-features'.

    The examples showed rvalue type-punning and bitfield extraction, which
    were recent examples in this thread.

    In C, the solution for my example might look like this:

    double temp = x+y;
    printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

    Rather more fiddly and error prone, and it needs an auxiliary statement
    that makes it awkward to embed into an expression. (I also had to think
    twice about that format code.)

    BTW here is how my C transpiler translated it, so it /can/ be done
    without explicit temporaries:

    mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x + y)),(i64)52,(i64)62),NULL);


    ˙ Of course you can write the same effect in C, in a
    simple function a few lines long.

    Yes, everyone can invent their own solutions. (I've just taken that a
    few steps further with an entire language.)

    ˙ And that's the way it should be -
    obscure things should not take up cognitive space that makes common
    things harder.

    But _BitInt(12) was also used as an example of saving a few lines of
    code or having to write a function or macro (there, to sign-extend the
    low-N bits of an integer value, when N is known at compile-time).


    But as a systems language, it is much more enticing than C.

    And that is presumably why it is so much more popular than C.

    If it was generally available then I think quite a few would prefer it.
    As it is I enjoy the benefits myself.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 26 17:37:38 2025
    On 26/11/2025 16:44, bart wrote:
    On 26/11/2025 14:49, David Brown wrote:
    On 26/11/2025 13:05, bart wrote:
    On 26/11/2025 07:55, David Brown wrote:

    NIH syndrome combined with megalomania.˙ Other people do this stuff
    better than you.

    I made a decision to keep my systems language at a certain level
    regarding such things as the type system, while having lots of
    convenient micro-features:

    ˙˙˙˙ print int@(x+y).[52..62]

    This type-puns a float64 r-value expression into an int, and extracts
    that bitfield (which is the unsigned exponent field when float64 uses
    iee754).

    I'd be interested to see how you can do this better, using general
    language features (adding a dedicated .exponent property to floats
    would be cheating!).


    What an absurd thing to ask for.

    You said, "Other people do this stuff better than you". Presumably,
    devising language features. So I gave an example of a small task, and
    asked which features those people would devise, or what solution they
    would use.


    The "other people" I referred to are the folks behind the C language,
    not me.

    ˙ You have a special feature in your language for writing obscure
    things that are rarely if ever useful in normal coding.

    Yes, I call them 'micro-features'.

    The examples showed rvalue type-punning and bitfield extraction, which
    were recent examples in this thread.

    In C, the solution for my example might look like this:

    ˙˙˙ double temp = x+y;
    ˙˙˙ printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);


    No, that's not how a C solution would work. People who know C would
    know that. As a challenge for you, see if you can spot your mistake.

    (And of course if anyone wanted to do this stuff in real code, they'd
    wrap things in a static inline "bit_range_extract" function.)

    Rather more fiddly and error prone, and it needs an auxiliary statement
    that makes it awkward to embed into an expression. (I also had to think twice about that format code.)

    BTW here is how my C transpiler translated it, so it /can/ be done
    without explicit temporaries:

    ˙˙˙ mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x + y)),(i64)52,(i64)62),NULL);


    Avoiding explicit temporaries is not a goal to aspire to - unless you
    are trying to squeeze performance from a poorly optimising compiler.


    ˙ Of course you can write the same effect in C, in a simple function a
    few lines long.

    Yes, everyone can invent their own solutions. (I've just taken that a
    few steps further with an entire language.)

    ˙ And that's the way it should be - obscure things should not take up
    cognitive space that makes common things harder.

    But _BitInt(12) was also used as an example of saving a few lines of
    code or having to write a function or macro (there, to sign-extend the
    low-N bits of an integer value, when N is known at compile-time).


    No, what was shown was how _BitInt(12) could let people write clearer C
    code than C without _BitInt. There was no comparison to other languages
    or other features.


    But as a systems language, it is much more enticing than C.

    And that is presumably why it is so much more popular than C.

    If it was generally available then I think quite a few would prefer it.

    Sure. Keep telling yourself that.

    As it is I enjoy the benefits myself.


    That I /do/ believe - and I genuinely think it is great that you enjoy it.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Wed Nov 26 18:42:11 2025
    On 26/11/2025 16:37, David Brown wrote:
    On 26/11/2025 16:44, bart wrote:

    The "other people" I referred to are the folks behind the C language,
    not me.

    OK. The people who chose to make 'break' do two jobs, unfortunately in
    parts of the language that can overlap in use; those people! (I guess
    you mean the more recent lot.)

    In C, the solution for my example might look like this:

    ˙˙˙˙ double temp = x+y;
    ˙˙˙˙ printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);


    No, that's not how a C solution would work.˙ People who know C would
    know that.˙ As a challenge for you, see if you can spot your mistake.

    This was my point. (Although I can't see the problem, making it even
    more pertinent.)


    (And of course if anyone wanted to do this stuff in real code, they'd
    wrap things in a static inline "bit_range_extract" function.)

    Also my point: everyone will invent their own incompatible solutions for
    this fundamental stuff.

    You forgot about the type-punning part, which I guess needs yet another inlined function,

    Rather more fiddly and error prone, and it needs an auxiliary
    statement that makes it awkward to embed into an expression. (I also
    had to think twice about that format code.)

    BTW here is how my C transpiler translated it, so it /can/ be done
    without explicit temporaries:

    ˙˙˙˙ mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x
    + y)),(i64)52,(i64)62),NULL);


    Avoiding explicit temporaries is not a goal to aspire to - unless you
    are trying to squeeze performance from a poorly optimising compiler.

    The memory temp involved a declaration which needs to exist outside of
    the expression in standard C. While type-punning in C either means
    writing to a union, or using & and applying a cast.

    (My type-punning works on rvalues and will work on values in registers.)

    No, what was shown was how _BitInt(12) could let people write clearer C
    code than C without _BitInt.˙ There was no comparison to other languages
    or other features.

    But when it came my example, it could trivially be done with inline
    functions, just like this could.




    But as a systems language, it is much more enticing than C.

    And that is presumably why it is so much more popular than C.

    If it was generally available then I think quite a few would prefer it.

    Sure.˙ Keep telling yourself that.

    Well, it would be a minority. Grown-up languages with decent syntax
    exist such as Ada and Fortran; those are not that popular. People prefer brace-based languages such as C, Java, Go, Zig, Rust.

    Anything without braces isn't taken as seriously, eg. scripting languages.


    As it is I enjoy the benefits myself.


    That I /do/ believe - and I genuinely think it is great that you enjoy it.


    I've had several opportunities to retire my language and switch to C.
    Each time, I rejected that and chose to perservere with mine, despite
    the extra problems of working with a language used by only one person on
    the planet.

    Then, because I genuinely considered it better, and now because I enjoy working at it and with it. Using C feels like driving a model T.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 26 21:43:59 2025
    On 26/11/2025 19:42, bart wrote:
    On 26/11/2025 16:37, David Brown wrote:
    On 26/11/2025 16:44, bart wrote:

    The "other people" I referred to are the folks behind the C language,
    not me.

    OK. The people who chose to make 'break' do two jobs, unfortunately in
    parts of the language that can overlap in use; those people! (I guess
    you mean the more recent lot.)

    In C, the solution for my example might look like this:

    ˙˙˙˙ double temp = x+y;
    ˙˙˙˙ printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);


    No, that's not how a C solution would work.˙ People who know C would
    know that.˙ As a challenge for you, see if you can spot your mistake.

    This was my point. (Although I can't see the problem, making it even
    more pertinent.)


    So you can claim to have a "better" solution than C, without knowing how
    to write it correctly in C?


    (And of course if anyone wanted to do this stuff in real code, they'd
    wrap things in a static inline "bit_range_extract" function.)

    Also my point: everyone will invent their own incompatible solutions for this fundamental stuff.

    It is not remotely fundamental. Extracting groups of bits from the representation of a type, especially a floating point type, is a niche operation. (It can be an important operation - such as for software
    floating point routines. But the people who write those are few, and
    they know what they are doing.)


    You forgot about the type-punning part, which I guess needs yet another inlined function,

    I didn't forget about anything. I didn't write the incorrect C code.


    Rather more fiddly and error prone, and it needs an auxiliary
    statement that makes it awkward to embed into an expression. (I also
    had to think twice about that format code.)

    BTW here is how my C transpiler translated it, so it /can/ be done
    without explicit temporaries:


    mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x +
    y)),(i64)52,(i64)62),NULL);


    Avoiding explicit temporaries is not a goal to aspire to - unless you
    are trying to squeeze performance from a poorly optimising compiler.

    The memory temp involved a declaration which needs to exist outside of
    the expression in standard C. While type-punning in C either means
    writing to a union, or using & and applying a cast.


    "Type punning" refers to using a union to access or reinterpret the
    underlying bit representation. Using references and a cast to do so is
    UB, except when using pointers to character types. Neither involves
    actually putting data into memory or the stack unless you are using a
    compiler that can't optimise well - and then it is just a matter of less efficient generated code.

    (My type-punning works on rvalues and will work on values in registers.)

    No, what was shown was how _BitInt(12) could let people write clearer
    C code than C without _BitInt.˙ There was no comparison to other
    languages or other features.

    But when it came my example, it could trivially be done with inline functions, just like this could.


    Sure.




    But as a systems language, it is much more enticing than C.

    And that is presumably why it is so much more popular than C.

    If it was generally available then I think quite a few would prefer it.

    Sure.˙ Keep telling yourself that.

    Well, it would be a minority. Grown-up languages with decent syntax
    exist such as Ada and Fortran; those are not that popular. People prefer brace-based languages such as C, Java, Go, Zig, Rust.

    Anything without braces isn't taken as seriously, eg. scripting languages.


    What a /very/ strange way to distinguish or classify languages. And
    what a bizarre way to generalise what people think, as though all
    programmers share the same opinions.


    As it is I enjoy the benefits myself.


    That I /do/ believe - and I genuinely think it is great that you enjoy
    it.


    I've had several opportunities to retire my language and switch to C.
    Each time, I rejected that and chose to perservere with mine, despite
    the extra problems of working with a language used by only one person on
    the planet.

    Then, because I genuinely considered it better, and now because I enjoy working at it and with it. Using C feels like driving a model T.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Wed Nov 26 22:19:47 2025
    On 26/11/2025 20:43, David Brown wrote:
    On 26/11/2025 19:42, bart wrote:
    On 26/11/2025 16:37, David Brown wrote:
    On 26/11/2025 16:44, bart wrote:

    The "other people" I referred to are the folks behind the C language,
    not me.

    OK. The people who chose to make 'break' do two jobs, unfortunately in
    parts of the language that can overlap in use; those people! (I guess
    you mean the more recent lot.)

    In C, the solution for my example might look like this:

    ˙˙˙˙ double temp = x+y;
    ˙˙˙˙ printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);


    No, that's not how a C solution would work.˙ People who know C would
    know that.˙ As a challenge for you, see if you can spot your mistake.

    This was my point. (Although I can't see the problem, making it even
    more pertinent.)


    So you can claim to have a "better" solution than C, without knowing how
    to write it correctly in C?



    (And of course if anyone wanted to do this stuff in real code, they'd
    wrap things in a static inline "bit_range_extract" function.)

    Also my point: everyone will invent their own incompatible solutions
    for this fundamental stuff.

    It is not remotely fundamental.˙ Extracting groups of bits from the representation of a type, especially a floating point type, is a niche operation.

    A bit like that BitInt(12) example then?

    This is about a lower-level systems language working with primitive
    machine types, and having access to the underlying bits of those types.

    How much more fundamental can you get?

    C provides only basic bitwise operators, and you have to do some
    bit-fiddling, while trying to avoid UB, in order to extract or inject individual bits or bitfields.

    I provide direct indexing ops to get or set any bit or bitfield, which
    is actually a great core feature to have, but for some reason you want
    to downplay it.

    You might just admit for once that it is quite neat.


    ˙ (It can be an important operation - such as for software
    floating point routines.

    That particular task can be important for lots of reasons.

    ˙ But the people who write those are few, and
    they know what they are doing.)

    And I don't? I used to write FP emulation routines...


    "Type punning" refers to using a union to access or reinterpret the underlying bit representation.˙ Using references and a cast to do so is
    UB,

    In C maybe, using your favoured compilers. In my implementations of C,
    and in my languages, it is well defined, especially as it is
    type-punning a 64-bit quantity to another 64-bit quantity.

    (This is a great thing about creating your own implementations: you get
    to say what is UB, which will be for genuine, not artificial ones
    maintained so that C compilers can be one-up on each other.

    As it is, somebody using C as an intermediate language can have a
    situation where something is well-defined in their source language,
    known to be well-defined on their platforms of interest, but inbetween,
    C says otherwise.)

    Note that in the original example in my language, no references are used
    (the code just copies a FP register to a GPR without conversion).

    except when using pointers to character types.˙ Neither involves
    actually putting data into memory or the stack unless you are using a compiler that can't optimise well - and then it is just a matter of less efficient generated code.

    OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

    Anything without braces isn't taken as seriously, eg. scripting
    languages.


    What a /very/ strange way to distinguish or classify languages.

    It's an observation. Which languages that call themselves 'systems
    languages' these days don't use braces?

    ˙ And
    what a bizarre way to generalise what people think, as though all programmers share the same opinions.

    You're welcome to do your own survey.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Wed Nov 26 17:04:06 2025
    On 11/25/2025 5:38 AM, bart wrote:
    On 25/11/2025 02:03, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 24/11/2025 14:41, David Brown wrote:
    On 24/11/2025 13:31, bart wrote:
    That's all up to the implementation.
    You are worrying about completely negligible things here.

    Is it that negligible? That's easy to say when you're not doing the
    implementing! However it may impact on the size and performance of
    code.

    You're right, it's easy to say when I'm not doing the implementing.
    Which I'm not.

    The maintainers of gcc and llvm/clang have done that for me, so I don't
    have to worry about it.

    Are you planning to implement bit-precise integer types yourself?˙ I
    don't think you've said so in this thread.˙ If you are, you have at
    least two existing implementations you can look at for ideas.

    No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits, and played with 1/2/4 bits, but my view is that above this range, using
    exact bit-sizes is the wrong way to go.


    On normal PC's, it is meh.

    On FPGA's, more so the whole HLS (High Level Synthesis) thing, it is
    much more significant.


    Also it is a bridge that allows sensible mapping some Verilog semantics
    onto C, which can in turn be made more efficient than "ye olde shifts
    and masking". This is partly a case because the compiler has more
    freedom to either use specific CPU features, or to implement the
    constructs in ways that are more efficient but would impose too much
    mental computational burden on normal programmers (such as shifts being relative to other shifts, and/or where the most efficient masking
    strategy depends on the width of the type being masked, etc).

    Though, granted, bolting a bunch of Verilog stuff onto C is also
    nonstandard (and goes well beyond the scope of _BitInt). But, a lot of
    it is stuff that wouldn't really make sense at all in C in the absence
    of exact-width integers.



    Though, the other parts of Verilog don't map over quite so easily...
    always @(posedge clock)
    ...
    ... yeah ...

    Ironically, had started looking into adding Verilog support to my
    compiler (at the time hoping maybe to be able to implement something
    that was less of a pain to debug on than Verilator), most I got here was
    the idea that modules would be mapped onto classes and so each module
    could be implemented as a class instance, with an internal run/step
    method which would check variables and fire off any "always" blocks when appropriate.

    The effort kinda stalled out at this stage though (and motivation
    lessened when I actually found some of the bugs I had been looking for).


    Some other functionality had ended up mapped onto C, some features (ironically) being useful in this C land, and others not so much.

    Well, maybe some people could cheer for things like "casez()" or "__switchz()":
    __switchz(val[15:0])
    {
    case 0bZZZZ_ZZZZ_ZZZZ_ZZZ0u16: ... matches everything with LSB clear
    case 0bZZZZ_ZZZZ_ZZZZ_ZZ01u16: ... matches with LSB's as 01
    case 0bZZZZ_ZZZZ_ZZZZ_Z011u16: ...
    case 0b1111_ZZZZ_ZZZZ_0111u16: ... mastches 0111 and MSBs set to 1s.
    }

    Where, 0bZZZZ_ZZZZ_ZZZZ_Z011u16 is a C syntax analog of 16'bbZZZZ_ZZZZ_ZZZZ_Z011 (and in this case my compiler allows for either
    _ or single quotes).



    Though, implementing this in a way that is efficient is a harder problem
    (much more complicated than a normal "switch()").

    Though, had I gotten this part implemented, would still have also needed:
    A high performance emulator (now partly written, but, would likely need
    a full JIT compiler rather than a call-threading interpreter);
    A better/more usable debugger (*).

    *: My existing "jx2vm" emulator mostly dumps stuff if the emulator
    exits, and has an integrated GDB style debugger, this still leaves
    something to be desired.

    So, more likely the desired debugger would likely be built on "x3vm",
    but have not yet done so.


    Also compiler needs to produce more complete debuginfo. As-is, it is outputting symbol maps (in nm notation, similar to that typically used
    by the Linux kernel), with line-numbers in a slightly nonstandard way,
    and some small about of STABS. Maybe weak, but currently the most
    reachable strategy (contrast, GCC would typically put the debug info
    inside the binary, either as STABS or DWARF depending on target, ...).
    The debuginfo is still very incomplete, and I am also lacking a good
    debugger here.

    I had considered the possibility of going to a binary format for the map
    files to save space, but for now they are still ASCII based (well, or
    the possible lazier option of internally generating the map in ASCII
    format, but then dumping it in gzip format or similar ".map.gz"; would
    need to decompress them when loaded, but would leave an easy option for
    a user to get back to an ASCII map file as needed). Including STABS
    would add considerable bulk even vs just a normal symbol listing.

    I have my own reasons for not wanting to put debuginfo inside the
    binaries themselves. MSVC is kinda similar, just uses ".PDB" files instead.


    While for odd sizes up to 64 bits, bitfields are more apt than employing
    the type system.


    This is missing the point of the purpose of _BitInt...


    Here's an idea.˙ Rather than asserting that _BitInt(1'000'000)
    is silly and obviously useless, try *asking* how it's useful.
    I personally don't know what I'd do with a million-bit integer,
    but maybe somebody out there has a valid use for it.˙ Meanwhile,
    its existence doesn't bother me.

    Again, my view is that types like _BitInt(123456) (could they have made
    it any more fiddly to type?!) is the same mistake that early Pascal made with arrays.

    It is common that an N-array of T and an M-array of T are not
    compatible, but usually there are ways to deal generically with both.


    For using them in a way that is useful for their intended purpose, there
    need to be some constraints here.


    But, alas, debating 1M bit values is a little moot in my case as the
    compiler doesn't go quite that big.

    Most cases where a giant _BitInt could make sense are better served by
    not using _BitInt.

    In this case, the limit is 16383 bits, but this is still bigger than
    anything it really makes sense to do with _BitInt.

    Also doesn't make sense for Verilog either; about as soon as you start
    trying to use values this big, it is "gonna eat the FPGA".





    Well, and for things like bignums, could instead make a case for a
    dynamic typesystem and the ability for user code to plug new types into
    said dynamic typesystem (and register operators for said types).

    But, this is probably a feature that is unlikely to get added to mainline C.

    Well, and probably about as soon as someone adds dynamic types, people
    might start pushing for also adding a garbage collector, even if (thus
    far) still no one has succeeded in making GC "not suck" (some may point
    to JVM and .NET, but they mostly just made it "less obvious").

    Contrast to my recent annoyance that Firefox is regularly stalling for extended periods of time for presumably GC related reasons (and has
    seemingly failed to implement one that runs particularly fast).



    My guess is that once you've implemented integers wider than 128
    or 256 bits, million-bit integers aren't much extra effort.

    I've implemented 128-bit arithmetic, and have seen some scary-looking C
    code that implemented 256-bit arithmetic. Neither of those would scale
    to N-bits where N can be arbitrary large /and/ might not be a multiple
    of either 64 or 8.

    You would need pretty much the same algorithms as used for arbitrary precision. Those usually require N to be some multiple of 'limb' size.


    Note for example, say:
    How do you think I would have implemented large _BitInt?...
    Why is storage a multiple of 128-bits / 16 bytes?...
    ...


    Well, internally in this case the compiler effectively just sorta
    generates internal runtime calls.

    In this case:
    1-64 bits: Mostly native;
    65-128: Semi-native mixed with runtime calls.
    129-256: Runtime calls to fixed-width 256-bit handlers.
    257-16383: Generic runtime calls that pass the width as an argument.

    ...

    If the size isn't an exact multiple of the target size, one can pad it
    up and and then sign or zero extend the high-order element.



    Ironically, it is sorta like a partial inverse of "memcpy()" and
    "memset()" with a fixed size:
    Size is small, better to handle it inline;
    Medium, use size-specialized handling;
    Large, use a generic call (actually call "memcpy()").

    Say:
    <= 64 bytes: Inline
    65-512 bytes: Call into fixed-size copies
    Generally, copy any trailing bytes and then fall into a copy-slide.
    > 512: Actually call "memcpy()".


    Even for smaller types, there is no guarantee that it is not a runtime call.

    Say, for example, on some random (non x86-64 or similar) target:
    long x, y, z;
    ...
    z=x/y;

    How confident can you be that it is *not* just secretly calling a
    runtime function?...

    Even if the ISA has an instruction, there isn't much guarantee that it
    is going to be faster than using a runtime call.

    Or, say, the only time one can be semi-confident it is not a runtime
    call is if it is "int/const" or similar (because the compiler can turn
    this into multiply-by-reciprocal).



    Say, a CPU being left with choices for how to implement divide:
    Faster, but very expensive;
    Say, for example, a radix divider or similar;
    Medium cost, kinda slow: Hardware shift-and-subtract or similar;
    Can give ~ 36 cycle 32-bit IDIV, and ~ 68 cycle 64-bit IDIV.
    Cheap but slow: Trap and emulate, OS then uses shift-and-subtract.

    In some cases, it being faster for the compiler to use runtime calls for divide and similar, since this can sidestep the performance cost of the emulation trap.


    Except ironically, me shifting towards the counter stance for Binary128,
    as Binary128 operations can be sufficiently slow that the code-density
    savings can offset the (comparably modest in this case) performance cost
    of the emulation traps (otherwise, code using "long double" or similar
    having a whole lot of runtime calls; which cost around 8x the code
    footprint of just pretending one has the corresponding instructions).

    Also semi-relevant for RV64, which also lacks particularly good options
    for implementing fast "__int128" support, which is (ironically) somewhat relevant for making the Binary128 FPU emulation not slow. Though, I also
    took the non-standard stance of defining these operations as working on register pairs (in this case, defining FADD.Q and FMUL.Q as using
    register pairs being the less-bad option than also pretending it has
    128-bit FPR's; more so as in this case SIMD operations already use
    register pairs for 128-bit SIMD, ...).

    So, a "Pseudo-Q" defined in terms of:
    CPU only actually implements F/D;
    Define .Q operations as being allowed, but operating via traps.
    Arguably poor, but cheapest option in this case.
    Actually supporting Q extension would be too expensive.
    I don't exactly expect Binary128 to suddenly become cheap either.


    Though, for integer divide and similar, the relative cost is low enough
    (and divide is common enough) that trap-and-emulate is rather painful.

    So, in the absence of a HW divider, better to use a runtime call
    (probably via a shift-and-subtract loop or similar).


    But, divide is still rare enough to make it hard-pressed to justify the
    cost of a more expensive HW divider (such as Radix-16 or Goldschmidt or similar). One might still be left with trap-and-emulate for FPU divide
    and SQRT though.

    Or, say, other wonk (for an ISA like RV64):
    FDIV, FSQRT: Trap
    FMADD/etc (when RM=DYN): Trap
    Assuming here that DYN has the option of IEEE correct semantics.
    Which means trapping on HW which doesn't have single-rounded FMA.

    Mildly annoying then if building with GCC, one has to use special
    options to disable FMA and tell it to use a runtime call for FDIV, etc,
    in an attempt to avoid it stepping on cases that need emulation traps.

    Well, and for sake of IEEE semantics:
    Trapping on subnormal/denormal inputs;
    Trapping with LOBs of inputs are non-zero for FMUL;
    Partial width makes FMUL a lot cheaper for hardware;
    But, then one needs to trap in some cases.
    ...



    Well, and other fun corner cutting, like lacking hardware page walking
    and reliance on a trap handler to deal with TLB misses; etc.

    Well, for an ISA like RV, can also corner-cut things like supporting
    options other than X0 and X1 for JAL and JALR (if Rd is not X0 or X1, trap).


    Well, and if the CPU is being "extra budget", they might use trap and
    emulate for misaligned loads/stores. Though, IMO this is getting a
    little too budget (and there are a lot of things that can be implemented
    more efficiently if one has at least semi-fast unaligned load/store).

    ...



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Thu Nov 27 01:05:03 2025
    On 26/11/2025 23:04, BGB wrote:
    On 11/25/2025 5:38 AM, bart wrote:
    On 25/11/2025 02:03, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 24/11/2025 14:41, David Brown wrote:
    On 24/11/2025 13:31, bart wrote:
    That's all up to the implementation.
    You are worrying about completely negligible things here.

    Is it that negligible? That's easy to say when you're not doing the
    implementing! However it may impact on the size and performance of
    code.

    You're right, it's easy to say when I'm not doing the implementing.
    Which I'm not.

    The maintainers of gcc and llvm/clang have done that for me, so I don't
    have to worry about it.

    Are you planning to implement bit-precise integer types yourself?˙ I
    don't think you've said so in this thread.˙ If you are, you have at
    least two existing implementations you can look at for ideas.

    No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
    and played with 1/2/4 bits, but my view is that above this range,
    using exact bit-sizes is the wrong way to go.


    On normal PC's, it is meh.

    On FPGA's, more so the whole HLS (High Level Synthesis) thing, it is
    much more significant.


    Also it is a bridge that allows sensible mapping some Verilog semantics
    onto C, which can in turn be made more efficient than "ye olde shifts
    and masking". This is partly a case because the compiler has more
    freedom to either use specific CPU features, or to implement the
    constructs in ways that are more efficient but would impose too much
    mental computational burden on normal programmers (such as shifts being relative to other shifts, and/or where the most efficient masking
    strategy depends on the width of the type being masked, etc).

    Though, granted, bolting a bunch of Verilog stuff onto C is also
    nonstandard (and goes well beyond the scope of _BitInt). But, a lot of
    it is stuff that wouldn't really make sense at all in C in the absence
    of exact-width integers.



    Though, the other parts of Verilog don't map over quite so easily...
    ˙ always @(posedge clock)
    ˙˙˙˙ ...
    ... yeah ...

    Ironically, had started looking into adding Verilog support to my
    compiler (at the time hoping maybe to be able to implement something
    that was less of a pain to debug on than Verilator), most I got here was
    the idea that modules would be mapped onto classes and so each module
    could be implemented as a class instance, with an internal run/step
    method which would check variables and fire off any "always" blocks when appropriate.

    The effort kinda stalled out at this stage though (and motivation
    lessened when I actually found some of the bugs I had been looking for).


    Some other functionality had ended up mapped onto C, some features (ironically) being useful in this C land, and others not so much.

    Well, maybe some people could cheer for things like "casez()" or "__switchz()":
    ˙ __switchz(val[15:0])
    ˙ {
    ˙˙˙ case 0bZZZZ_ZZZZ_ZZZZ_ZZZ0u16: ... matches everything with LSB clear
    ˙˙˙ case 0bZZZZ_ZZZZ_ZZZZ_ZZ01u16: ... matches with LSB's as 01
    ˙˙˙ case 0bZZZZ_ZZZZ_ZZZZ_Z011u16: ...
    ˙˙˙ case 0b1111_ZZZZ_ZZZZ_0111u16: ... mastches 0111 and MSBs set to 1s.
    ˙ }

    Where, 0bZZZZ_ZZZZ_ZZZZ_Z011u16 is a C syntax analog of 16'bbZZZZ_ZZZZ_ZZZZ_Z011 (and in this case my compiler allows for either
    _ or single quotes).



    Though, implementing this in a way that is efficient is a harder problem (much more complicated than a normal "switch()").

    Though, had I gotten this part implemented, would still have also needed:
    A high performance emulator (now partly written, but, would likely need
    a full JIT compiler rather than a call-threading interpreter);
    A better/more usable debugger (*).

    *: My existing "jx2vm" emulator mostly dumps stuff if the emulator
    exits, and has an integrated GDB style debugger, this still leaves
    something to be desired.

    So, more likely the desired debugger would likely be built on "x3vm",
    but have not yet done so.


    Also compiler needs to produce more complete debuginfo. As-is, it is outputting symbol maps (in nm notation, similar to that typically used
    by the Linux kernel), with line-numbers in a slightly nonstandard way,
    and some small about of STABS. Maybe weak, but currently the most
    reachable strategy (contrast, GCC would typically put the debug info
    inside the binary, either as STABS or DWARF depending on target, ...).
    The debuginfo is still very incomplete, and I am also lacking a good debugger here.

    I had considered the possibility of going to a binary format for the map files to save space, but for now they are still ASCII based (well, or
    the possible lazier option of internally generating the map in ASCII
    format, but then dumping it in gzip format or similar ".map.gz"; would
    need to decompress them when loaded, but would leave an easy option for
    a user to get back to an ASCII map file as needed). Including STABS
    would add considerable bulk even vs just a normal symbol listing.

    I have my own reasons for not wanting to put debuginfo inside the
    binaries themselves. MSVC is kinda similar, just uses ".PDB" files instead.


    While for odd sizes up to 64 bits, bitfields are more apt than
    employing the type system.


    This is missing the point of the purpose of _BitInt...

    Which is ... ?

    From what I can gether, on ordinary computers, Bitint(N), for N of 1/2
    to 63, is just rounded up to the next size of 8/16/32/64 bits, if N is
    not already at that size.

    That's if storage is involved.

    The other aspect appears to be two-fold:

    * BitInt(N) used as a cast on an ordinary value will zero- or
    sign-extend the low N bits

    * When reading from storage allocated with BitInt(N), is ensures only N
    bits of info are retrieved, extended as necessary, even of more then N
    bits were stored. This applies even if the storage was rounded up.

    So it seems to be mainly about masking.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Thu Nov 27 02:18:03 2025
    On 27/11/2025 01:30, Waldek Hebisch wrote:
    bart <bc@freeuk.com> wrote:
    And yet, integer widths have been roughly capped at double a machine
    word size for decades - until 64 bits came along and then few even
    bothered with double-width.

    Nobody thought how easy it would be to just have an integer of whatever
    size you like - you just generate whatever code is necessary to make it
    happen. We could have had BitInts on 32- and even 16-bit machines if
    only somebody had thought of it!

    PL/I had things like 'fixed binary(23)' (that is ability to
    specify bit size) around 1965, but that stopped at machine
    word length. Pascal had range types, but similarly stopped
    at at integer size.

    What were the reasons for PL/I to use odd sizes not related to word size
    or memory width?


    GNU Pascal allowed specifiying size in
    bits and going to twice machine word (that was limitation
    imposed by gcc backend).

    Before 64-bits, we needed double the word size in order to represent
    ordinary quantities. With 64 bits, there is much less need (hence few
    128-bit types).


    And yes, such types could be added much earlier and it
    is a shame that they are added only now.

    So what is the pressing need now?

    It is a mild convenience for those applications which really need
    numbers of 100s of bits, but not what I would have thought were worth
    making special provision for in a language.

    While they would be unwieldy for very much larger numbers, and in any
    case there are caps in place.

    I can see some use when you want a block datatype or so many bytes
    (sorry, bits, since it needs to be bit-precise even at the large scale), especially if some bitwise ops are available.

    Eg. do some of the things that Pascal bit-sets were used for, but
    there's still seems to be lots of support lacking.

    So it still appears to me a rather heavyweight feature, in a lightweight language, that is lacking in everyday use-cases.

    Part of reason may be that in nineties usage of other
    (than C) lower level languages went down. C was
    traditionally quite minimal and did not want new to
    introduce new features.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From James Kuyper@3:633/10 to All on Wed Nov 26 21:19:14 2025
    On 2025-11-26 04:29, Michael S wrote:
    On Tue, 25 Nov 2025 18:33:30 +0000
    bart <bc@freeuk.com> wrote:
    ...
    Then maybe C bitfields could be used, but a bigger problem with those
    is poor control over layout, which is anyway implementation-defined.
    (Mine of course don't have that problem!)

    According to the language of The Standard, it's not 'poor control'.
    As far as standard requirements goes, there is *no* control on layout of
    bit fields.

    C ;doesn't provide enough control over bit-field layouts to be useful,
    but it's an exaggeration to say it provides no control at all:

    "An implementation may allocate any addressable storage unit large
    enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed
    into adjacent bits of the same unit. If insufficient space remains,
    whether a bit-field that does not fit is put into the next unit or
    overlaps adjacent units is implementation-defined. The order of
    allocation of bit-fields within a unit (high-order to low-order or
    low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
    A bit-field declaration with no declarator, but only a colon and a
    width, indicates an unnamed bit-field.148) As a special case, a
    bit-field structure member with a width of zero indicates that no
    further bit-field is to be packed into the unit in which the previous bit-field, if any, was placed."


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Ike Naar@3:633/10 to All on Thu Nov 27 08:10:10 2025
    On 2025-11-26, bart <bc@freeuk.com> wrote:
    In C, the solution for my example might look like this:

    double temp = x+y;
    printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);

    Rather more fiddly and error prone, and it needs an auxiliary statement
    that makes it awkward to embed into an expression. (I also had to think twice about that format code.)

    The ilogb() function from <math.h> extracts the exponent of a double.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Thu Nov 27 11:43:39 2025
    On 26/11/2025 23:19, bart wrote:
    On 26/11/2025 20:43, David Brown wrote:
    On 26/11/2025 19:42, bart wrote:
    On 26/11/2025 16:37, David Brown wrote:
    On 26/11/2025 16:44, bart wrote:

    The "other people" I referred to are the folks behind the C
    language, not me.

    OK. The people who chose to make 'break' do two jobs, unfortunately
    in parts of the language that can overlap in use; those people! (I
    guess you mean the more recent lot.)

    In C, the solution for my example might look like this:

    ˙˙˙˙ double temp = x+y;
    ˙˙˙˙ printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);


    No, that's not how a C solution would work.˙ People who know C would
    know that.˙ As a challenge for you, see if you can spot your mistake.

    This was my point. (Although I can't see the problem, making it even
    more pertinent.)


    So you can claim to have a "better" solution than C, without knowing
    how to write it correctly in C?



    (And of course if anyone wanted to do this stuff in real code,
    they'd wrap things in a static inline "bit_range_extract" function.)

    Also my point: everyone will invent their own incompatible solutions
    for this fundamental stuff.

    It is not remotely fundamental.˙ Extracting groups of bits from the
    representation of a type, especially a floating point type, is a niche
    operation.

    A bit like that BitInt(12) example then?

    Yes, using BitInt(12) is quite niche.

    But when the people behind _BitInt() started thinking about what sizes
    people might want, it quickly became clear that it would be vastly more
    effort to try to define which sizes were useful. It was much simpler
    and clearer just to support any size (up to an implementation-defined
    limit). Most particular sizes, other than 8, 16, 32, 64, and perhaps
    128, are going to be niche. But if one clear feature enables a large
    number of niche uses, that's a good thing.

    No one has suggested that _BitInt(12) is in any way a /necessary/
    feature. And I certainly don't think a stand-alone proposal to add
    12-bit types to C would have been accepted. But since it exists, it
    will let me write slightly neater code in some cases - thus I will use
    it when appropriate.

    What I don't like about your bit extraction operations is that you have
    an operator syntax for a fairly obscure and rarely used operation. A "bit_range_extract" standard library function would make more sense to
    me, though I think shifting and masking works well enough for the few situations where you need it. A syntax that looks very much like array
    access is not going to be helpful to people looking at the code - for general-purpose languages, most programmers will never see or use bit
    ranges.


    This is about a lower-level systems language working with primitive
    machine types, and having access to the underlying bits of those types.

    How much more fundamental can you get?

    It is not fundamental for a low-level systems language. And this is a C
    group - C is a language covering general application programming as well
    as systems programming. I can agree that it can be a /useful/ operation
    at times - useful enough to make it worth having a standard library
    function (or macro - or, in your case, a keyword or built-in function).
    But not "fundamental" or useful enough to make it operator based like that.


    C provides only basic bitwise operators, and you have to do some bit-fiddling, while trying to avoid UB, in order to extract or inject individual bits or bitfields.

    You make it sound difficult. It's not.


    I provide direct indexing ops to get or set any bit or bitfield, which
    is actually a great core feature to have, but for some reason you want
    to downplay it.

    You might just admit for once that it is quite neat.


    I am sure it is very nice on the few occasions when it is useful.


    ˙ (It can be an important operation - such as for software floating
    point routines.

    That particular task can be important for lots of reasons.

    ˙ But the people who write those are few, and they know what they are
    doing.)

    And I don't? I used to write FP emulation routines...


    The thing you always seem to forget, is that your languages are written
    for /you/ - no one else. It doesn't make a difference whether something
    is added /to/ the language or written in code /for/ the language. You
    think other languages are missing critical features simply because there
    is a thing that /you/ want to do that you added to your own language.
    And you think other languages are overly complex or bloated because they
    have features that you don't want to use.

    That attitude is fine for your own personal specific language. If
    that's how you like to do things, that's fine. But that's your own
    little isolated world that does not compare to the wider world of other people, other programmers, other languages, other tools.

    Imagine asking the regulars in this group what features or changes they
    would like C to have in order to make C "perfect" for their uses,
    regardless of everyone else, all existing code, all existing tools. We
    could all fill pages with ideas. And if those were all added to C, the
    result would be a language that made C++ look as easy as Logo, while
    being riddled with inconsistencies and contradictions.


    "Type punning" refers to using a union to access or reinterpret the
    underlying bit representation.˙ Using references and a cast to do so
    is UB,

    In C maybe, using your favoured compilers.

    In C, yes. This is comp.lang.c.

    In my implementations of C,
    and in my languages, it is well defined, especially as it is
    type-punning a 64-bit quantity to another 64-bit quantity.


    OK. But not in C.

    (This is a great thing about creating your own implementations: you get
    to say what is UB, which will be for genuine, not artificial ones
    maintained so that C compilers can be one-up on each other.

    Ah, so the many C compilers I have used over the decades were not
    "genuine", and the many different processors I have used were all "artificial". Okay, that clears things up.


    As it is, somebody using C as an intermediate language can have a
    situation where something is well-defined in their source language,
    known to be well-defined on their platforms of interest, but inbetween,
    C says otherwise.)

    You've never really understood how languages are defined, have you?
    With your own languages and tools, you don't have to - there is no need
    for standards, specifications, or anything like that. You can just make
    up what suits you at the time. The language is "defined" by what the implementation does. That's been very convenient for you, but it has
    left you with serious misconceptions about how non-personal languages work.


    Note that in the original example in my language, no references are used (the code just copies a FP register to a GPR without conversion).

    except when using pointers to character types.˙ Neither involves
    actually putting data into memory or the stack unless you are using a
    compiler that can't optimise well - and then it is just a matter of
    less efficient generated code.

    OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

    As you know, you use a union. So just to please you, here is your bit extraction - written as a one-line function (split over two lines for
    Usenet) because you seem to think that kind of thing is important :

    uint64_t get_exponent(double x) {
    return ((union { double d; uint64_t u;}) { x }.u >> 52)
    & ((1ull << (62 - 52 + 1)) - 1);
    }

    That compiles (with gcc on x86-64) to :

    movq rax, xmm0
    shr rax, 52
    and eax, 2047
    ret

    There's nothing in C that suggests this must be put in memory or do
    anything more than this.


    Anything without braces isn't taken as seriously, eg. scripting
    languages.


    What a /very/ strange way to distinguish or classify languages.

    It's an observation. Which languages that call themselves 'systems languages' these days don't use braces?


    Ada? Forth?

    It is certainly common for languages to use braces, simply because they
    are a simple and unambiguous way to delimit blocks. They are widely
    used in languages that might be called "systems languages", and widely
    used in languages that might /not/ be called "systems languages" -
    though I don't think there is any remotely clear definition or
    distinction between "systems languages" and other languages.

    ˙ And what a bizarre way to generalise what people think, as though
    all programmers share the same opinions.

    You're welcome to do your own survey.


    In what way are languages like Ada, Fortran, Python, Haskell, Erlang,
    etc., "not taken seriously" ? /Who/ does not take them seriously? Who
    takes "B" seriously but not Ruby, just because "B" uses braces and Ruby
    does not?

    <https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(syntax)#Block_delimitation>


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Thu Nov 27 12:20:32 2025
    On 27/11/2025 10:43, David Brown wrote:
    On 26/11/2025 23:19, bart wrote:


    What I don't like about your bit extraction operations is that you have
    an operator syntax for a fairly obscure and rarely used operation.

    So shift and masking operations in C are obscure?!

    ˙ A
    "bit_range_extract" standard library function would make more sense to
    me, though I think shifting and masking works well enough for the few situations where you need it.˙ A syntax that looks very much like array access is not going to be helpful to people looking at the code - for general-purpose languages, most programmers will never see or use bit ranges.

    The syntax actually comes from DEC Algol60 IIRC. It was used to access individual characters of a string, normally an indivisible type in that language, and I applied the same concept to bits of an integer.

    How much more fundamental can you get?

    It is not fundamental for a low-level systems language.

    So bits are not fundamental either! But then, it has taken until C23 to standardise binary literals, and there is still no format code for
    binary output.

    ˙ But the people who write those are few, and they know what they are
    doing.)

    And I don't? I used to write FP emulation routines...


    The thing you always seem to forget, is that your languages are written
    for /you/ - no one else.˙ It doesn't make a difference whether something
    is added /to/ the language or written in code /for/ the language.˙ You
    think other languages are missing critical features simply because there
    is a thing that /you/ want to do that you added to your own language.
    And you think other languages are overly complex or bloated because they have features that you don't want to use.

    They frequently have advanced features while ignoring the basics.

    Imagine asking the regulars in this group what features or changes they would like C to have in order to make C "perfect" for their uses,
    regardless of everyone else, all existing code, all existing tools.˙ We could all fill pages with ideas.˙ And if those were all added to C, the result would be a language that made C++ look as easy as Logo, while
    being riddled with inconsistencies and contradictions.

    Yes, that's the trick. That's why a lot of features I've played with
    have disappeared, while some have proved indispensable.

    As it is, somebody using C as an intermediate language can have a
    situation where something is well-defined in their source language,
    known to be well-defined on their platforms of interest, but
    inbetween, C says otherwise.)

    You've never really understood how languages are defined, have you? With your own languages and tools, you don't have to - there is no need for standards, specifications, or anything like that.˙ You can just make up
    what suits you at the time.˙ The language is "defined" by what the implementation does.˙ That's been very convenient for you, but it has
    left you with serious misconceptions about how non-personal languages work.

    Here's a program in a very simple language, where all variables have
    i64 type:

    c = a + b

    Here, the author has decreed that any overflow in this addition will
    wrap (any overflow bits above 64 are lost). If directly compiled to x64
    code it might use this (here 'a b c' are aliases for the registers where
    they reside):

    mov c, a
    add c, b

    Or on ARM64:

    add c, a, b

    Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:

    int64_t a, b, c;
    ...
    c = a + b;

    But here, if a + b happens to overflow, it is UB, and for no good
    reason. You have to fix it. This is where it can be harder to generate
    HLL code than assembly!

    *Now* do you understand? This is nothing to do with me or my personal languages, it is a problem for every language that transpiles to C,
    where there is a mismatch between the sets of behaviour considered UB in
    each.

    OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

    As you know, you use a union.˙ So just to please you, here is your bit extraction - written as a one-line function (split over two lines for Usenet) because you seem to think that kind of thing is important :

    uint64_t get_exponent(double x) {
    ˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }

    That compiles (with gcc on x86-64) to :

    ˙˙˙˙movq rax, xmm0
    ˙˙˙˙shr rax, 52
    ˙˙˙˙and eax, 2047
    ˙˙˙˙ret

    There's nothing in C that suggests this must be put in memory or do
    anything more than this.

    (This only seems to work with gcc. Clang and MSVS don't like it.)



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Thu Nov 27 12:46:01 2025
    On 27/11/2025 02:32, Waldek Hebisch wrote:
    bart <bc@freeuk.com> wrote:

    This is about a lower-level systems language working with primitive
    machine types, and having access to the underlying bits of those types.

    How much more fundamental can you get?

    C provides only basic bitwise operators, and you have to do some
    bit-fiddling, while trying to avoid UB, in order to extract or inject
    individual bits or bitfields.

    I provide direct indexing ops to get or set any bit or bitfield, which
    is actually a great core feature to have, but for some reason you want
    to downplay it.

    You might just admit for once that it is quite neat.

    Yes, it is neat.

    Hmm, perhaps you're being sincere, perhaps not ...

    OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?

    #include <stdint.h>
    #include <string.h>

    uint64_t
    d_to_u(double d) {
    uint64_t tmp;
    memcpy(&tmp, &d, sizeof(tmp));
    return tmp;
    }

    int
    f_exp(double d) {
    return (d_to_u(d)>>52)&2047;
    }

    Using 'gcc -O' I get the following assembly (only code, without
    unimportant directives/labels):

    d_to_u:
    movq %xmm0, %rax
    ret

    f_exp:
    movq %xmm0, %rax
    shrq $52, %rax
    andl $2047, %eax
    ret

    As you can see 'd_to_u' is single computational instruction,
    you can not do better given that floating point registers
    are distinct from integer registers. And 'f_exp' looks
    optimal assuming lack of "bit extract" or "extract exponent"
    instructions.

    Note that you can put both functions above in a header file,
    so once you have written few lines above you can use them
    in all your C code. Of course, efficientcy depends on
    compiler optimization.

    Yes (that's something I can't rely on).

    These examples are interesting: with a HLL you normally express yourself
    in a clear manner, and it is the compiler's job to generate the
    complicated code required to implement what you mean.

    Here it seems to be other way around: it is the programmer who writes
    the convoluted code, and the compiler turns that into short, clear instructions! Which unfortunately no one will see.

    If I use your functions like this:

    a = f_exp(x + y);

    then once the x+y result is in a register, gcc-O2 generates this inline
    code for the extraction:

    movq rax, xmm0
    shr rax, 52
    and eax, 2047


    If I express it in my language:

    a := int@(x + y).[52..62]

    then my non-optimising compiler generates this (D0 is rax):

    movq D0, XMM4
    shr D0, 52
    and D0, 2047

    So such features have definite advantages, in being able to express
    intent directly, and to make it easier for a simple compiler to know
    that intent and help it generate reasonable code without lots of
    analysis or needing function inlining.

    BTW, your example explicitly writes to memory; David Brown posted a
    version that didn't do so that I could see. Unless a compound literal is designed to be built in memory? However that version only seemed to work
    with one compiler.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Thu Nov 27 14:02:38 2025
    On 27/11/2025 13:20, bart wrote:
    On 27/11/2025 10:43, David Brown wrote:
    On 26/11/2025 23:19, bart wrote:


    What I don't like about your bit extraction operations is that you
    have an operator syntax for a fairly obscure and rarely used operation.

    So shift and masking operations in C are obscure?!

    Both shift operators and bitwise operators have lots of other uses.

    When you are designing a programming language, you first provide general features that can be used for multiple purposes. You only implement specialised features if the need arises - it is too cumbersome, or error-prone, or inefficient, or laborious to use the general features.

    In some areas of C usage, shifts and masks - and bitfield extraction -
    turn up quite a bit. But it seems the C operators work fine for the
    task. It would not exactly be difficult to add a standard
    "bit_range_extract" function to the C standard library, yet no one has
    felt it to be worth the effort over the last 50 years. Perhaps it is
    not as essential or fundamental as you think? Or perhaps C's current
    features do the job well enough that there's no need for anything else?


    ˙ A
    "bit_range_extract" standard library function would make more sense to
    me, though I think shifting and masking works well enough for the few
    situations where you need it.˙ A syntax that looks very much like
    array access is not going to be helpful to people looking at the code
    - for general-purpose languages, most programmers will never see or
    use bit ranges.

    The syntax actually comes from DEC Algol60 IIRC. It was used to access individual characters of a string, normally an indivisible type in that language, and I applied the same concept to bits of an integer.

    I don't care if you found the syntax on the back of a cornflakes packet.
    The origin is not relevant.


    How much more fundamental can you get?

    It is not fundamental for a low-level systems language.

    So bits are not fundamental either! But then, it has taken until C23 to standardise binary literals, and there is still no format code for
    binary output.


    Very few programmers are at all interested in bits. A "double" holds a floating point value, not a pattern of bits. You are thinking on a
    level of abstraction that is not realistic for most programming tasks.

    ˙ But the people who write those are few, and they know what they
    are doing.)

    And I don't? I used to write FP emulation routines...


    The thing you always seem to forget, is that your languages are
    written for /you/ - no one else.˙ It doesn't make a difference whether
    something is added /to/ the language or written in code /for/ the
    language.˙ You think other languages are missing critical features
    simply because there is a thing that /you/ want to do that you added
    to your own language. And you think other languages are overly complex
    or bloated because they have features that you don't want to use.

    They frequently have advanced features while ignoring the basics.

    No - they frequently have features that /you/ call "advanced" because
    you don't need or want them, and they ignore things that /you/ call
    "basics" because you /do/ need or want them. It's all about /you/.


    Imagine asking the regulars in this group what features or changes
    they would like C to have in order to make C "perfect" for their uses,
    regardless of everyone else, all existing code, all existing tools.
    We could all fill pages with ideas.˙ And if those were all added to C,
    the result would be a language that made C++ look as easy as Logo,
    while being riddled with inconsistencies and contradictions.

    Yes, that's the trick. That's why a lot of features I've played with
    have disappeared, while some have proved indispensable.

    As it is, somebody using C as an intermediate language can have a
    situation where something is well-defined in their source language,
    known to be well-defined on their platforms of interest, but
    inbetween, C says otherwise.)

    You've never really understood how languages are defined, have you?
    With your own languages and tools, you don't have to - there is no
    need for standards, specifications, or anything like that.˙ You can
    just make up what suits you at the time.˙ The language is "defined" by
    what the implementation does.˙ That's been very convenient for you,
    but it has left you with serious misconceptions about how non-personal
    languages work.

    Here's a˙ program in a very simple language, where all variables have
    i64 type:

    ˙ c = a + b

    Here, the author has decreed that any overflow in this addition will
    wrap (any overflow bits above 64 are lost). If directly compiled to x64
    code it might use this (here 'a b c' are aliases for the registers where they reside):

    ˙˙˙ mov c, a
    ˙˙˙ add c, b

    Or on ARM64:

    ˙˙˙ add c, a, b

    Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:

    ˙˙˙ int64_t a, b, c;
    ˙˙˙ ...
    ˙˙˙ c = a + b;

    But here, if a + b happens to overflow, it is UB, and for no good
    reason. You have to fix it. This is where it can be harder to generate
    HLL code than assembly!


    You are talking nonsense.

    Either a + b results in the correct answer, or it does not. Any sane
    person reads that as "a plus b" - mathematically adding two integers to
    get their sum. That's what the programmer wants, and that's what they
    ask for. And any sane programmer expects the language to give the
    correct result within its limitations, but doe not expect it to do
    magic. Expecting to form a sum that is greater than 2 ^ 63 and somehow produce the "correct" result is a total misunderstanding of mathematics
    and programming - any primary school kid will tell you that using the
    fingers of one hand, you can't add 3 and 4. They will /not/ tell you
    that it's fine to add them on one hand because 3 + 4 is actually equal to 2.

    *Now* do you understand? This is nothing to do with me or my personal languages, it is a problem for every language that transpiles to C,
    where there is a mismatch between the sets of behaviour considered UB in each.

    I understand that simple maths and common sense is beyond you. I
    understand that you think mathematics should be defined in terms of
    accidental byproducts of the way hardware logic designs happen to be implemented.


    OK, so how would you do a 'reinterpret' cast in C, of a value like
    'x+y'?

    As you know, you use a union.˙ So just to please you, here is your bit
    extraction - written as a one-line function (split over two lines for
    Usenet) because you seem to think that kind of thing is important :

    uint64_t get_exponent(double x) {
    ˙˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }

    That compiles (with gcc on x86-64) to :

    ˙˙˙˙˙movq rax, xmm0
    ˙˙˙˙˙shr rax, 52
    ˙˙˙˙˙and eax, 2047
    ˙˙˙˙˙ret

    There's nothing in C that suggests this must be put in memory or do
    anything more than this.

    (This only seems to work with gcc. Clang and MSVS don't like it.)


    I think you are mistaken. clang is fine with it. It is standard C99,
    so any decent C compiler from the last 25 years will handle it fine. MS
    gave up on bothering to make C compilers before the turn of the century
    (they make a reasonable enough C++ compiler). Even your hero tcc is
    fine with it (though on my attempts, it produces rubbish code - maybe it
    needs different flags for optimisation). The C code is not made invalid
    by the existence of C90-only compilers.





    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Thu Nov 27 14:39:51 2025
    On 27/11/2025 13:46, bart wrote:
    On 27/11/2025 02:32, Waldek Hebisch wrote:
    bart <bc@freeuk.com> wrote:

    This is about a lower-level systems language working with primitive
    machine types, and having access to the underlying bits of those types.

    How much more fundamental can you get?

    C provides only basic bitwise operators, and you have to do some
    bit-fiddling, while trying to avoid UB, in order to extract or inject
    individual bits or bitfields.

    I provide direct indexing ops to get or set any bit or bitfield, which
    is actually a great core feature to have, but for some reason you want
    to downplay it.

    You might just admit for once that it is quite neat.

    Yes, it is neat.

    Hmm, perhaps you're being sincere, perhaps not ...

    OK, so how would you do a 'reinterpret' cast in C, of a value like
    'x+y'?
    #include <stdint.h>
    #include <string.h>

    uint64_t
    d_to_u(double d) {
    ˙˙˙˙ uint64_t tmp;
    ˙˙˙˙ memcpy(&tmp, &d, sizeof(tmp));
    ˙˙˙˙ return tmp;
    }

    int
    f_exp(double d) {
    ˙˙˙ return (d_to_u(d)>>52)&2047;
    }

    Using 'gcc -O' I get the following assembly (only code, without
    unimportant directives/labels):

    d_to_u:
    ˙˙˙˙˙˙˙˙ movq˙˙˙ %xmm0, %rax
    ˙˙˙˙˙˙˙˙ ret

    f_exp:
    ˙˙˙˙˙˙˙˙ movq˙˙˙ %xmm0, %rax
    ˙˙˙˙˙˙˙˙ shrq˙˙˙ $52, %rax
    ˙˙˙˙˙˙˙˙ andl˙˙˙ $2047, %eax
    ˙˙˙˙˙˙˙˙ ret

    As you can see 'd_to_u' is single computational instruction,
    you can not do better given that floating point registers
    are distinct from integer registers.˙ And 'f_exp' looks
    optimal assuming lack of "bit extract" or "extract exponent"
    instructions.

    Note that you can put both functions above in a header file,
    so once you have written few lines above you can use them
    in all your C code.˙ Of course, efficientcy depends on
    compiler optimization.

    Yes (that's something I can't rely on).

    These examples are interesting: with a HLL you normally express yourself
    in a clear manner, and it is the compiler's job to generate the
    complicated code required to implement what you mean.

    Here it seems to be other way around: it is the programmer who writes
    the convoluted code, and the compiler turns that into short, clear instructions! Which unfortunately no one will see.


    I don't think Waldek's code (or mine) is particularly convoluted. But
    in either case, you put such things in static inline functions (or
    macros if you need to). Then you have clear intent when implementing
    those functions - you are clearly doing low-level shifts and masking.
    And you have clear intent when /using/ the functions - you are
    extracting some bits from the underlying representation of the value.
    You split things into identified functions with specific tasks - that's
    at the heart of programming.

    And then you let the automated computer system - the compiler - do what
    it does best, and generate efficient results.

    If I use your functions like this:

    ˙˙˙ a = f_exp(x + y);

    then once the x+y result is in a register, gcc-O2 generates this inline
    code for the extraction:

    ˙˙˙ movq˙˙˙ rax, xmm0
    ˙˙˙ shr rax, 52
    ˙˙˙ and eax, 2047


    If I express it in my language:

    ˙˙ a := int@(x + y).[52..62]

    then my non-optimising compiler generates this (D0 is rax):

    ˙˙˙ movq˙˙˙˙˙ D0,˙˙˙ XMM4
    ˙˙˙ shr˙˙˙˙˙˙ D0,˙˙˙ 52
    ˙˙˙ and˙˙˙˙˙˙ D0,˙˙˙ 2047

    So such features have definite advantages, in being able to express
    intent directly, and to make it easier for a simple compiler to know
    that intent and help it generate reasonable code without lots of
    analysis or needing function inlining.


    You seem to be arguing that it is a good thing to write code that
    spoon-feeds the compiler so that the compiler doesn't have to do much
    work. You get this because you are writing the application code and
    also writing the compiler - so you pick the solution that gives you the
    best results for the least effort overall. But that is only appropriate
    for people with personal languages like yours.

    It should be the other way round - the compiler should be optimising so
    that the programmer can work at higher levels of abstraction or write
    code in the way that is most convenient to them, and the compiler will
    handle the boring low-level details. Programmers using serious
    languages /can/ rely on the compiler optimising well.


    BTW, your example explicitly writes to memory; David Brown posted a
    version that didn't do so that I could see. Unless a compound literal is designed to be built in memory? However that version only seemed to work with one compiler.

    The version I wrote is C99 and works fine with any C99 compiler. No,
    compound literals are not "designed to be built in memory", whatever
    that might mean. A compound literal is a value, and can be used like
    any other value.

    Waldek's version does use "memcpy" and pointers formed from the
    addresses of a parameter and a local variable. That means it must give results as if the parameter and local variable were in memory somewhere
    (the stack, in usual practice - though C does not actually require a
    stack) and a memory-to-memory copy was carried out, byte by byte.
    Critical here is the term "as if". If the compiler can give the same
    results without using memory, it is allowed to do so - thus optimising compilers will just do a register transfer from a floating point
    register to a general purpose register in this case.





    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Thu Nov 27 16:02:23 2025
    On Thu, 27 Nov 2025 14:02:38 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 27/11/2025 13:20, bart wrote:
    On 27/11/2025 10:43, David Brown wrote:
    On 26/11/2025 23:19, bart wrote:


    What I don't like about your bit extraction operations is that you
    have an operator syntax for a fairly obscure and rarely used
    operation.

    So shift and masking operations in C are obscure?!

    Both shift operators and bitwise operators have lots of other uses.

    When you are designing a programming language, you first provide
    general features that can be used for multiple purposes. You only
    implement specialised features if the need arises - it is too
    cumbersome, or error-prone, or inefficient, or laborious to use the
    general features.

    In some areas of C usage, shifts and masks - and bitfield extraction
    - turn up quite a bit. But it seems the C operators work fine for
    the task. It would not exactly be difficult to add a standard "bit_range_extract" function to the C standard library, yet no one
    has felt it to be worth the effort over the last 50 years. Perhaps
    it is not as essential or fundamental as you think? Or perhaps C's
    current features do the job well enough that there's no need for
    anything else?


    ? A
    "bit_range_extract" standard library function would make more
    sense to me, though I think shifting and masking works well enough
    for the few situations where you need it.? A syntax that looks
    very much like array access is not going to be helpful to people
    looking at the code
    - for general-purpose languages, most programmers will never see
    or use bit ranges.

    The syntax actually comes from DEC Algol60 IIRC. It was used to
    access individual characters of a string, normally an indivisible
    type in that language, and I applied the same concept to bits of an integer.

    I don't care if you found the syntax on the back of a cornflakes
    packet. The origin is not relevant.


    How much more fundamental can you get?

    It is not fundamental for a low-level systems language.

    So bits are not fundamental either! But then, it has taken until
    C23 to standardise binary literals, and there is still no format
    code for binary output.


    Very few programmers are at all interested in bits. A "double" holds
    a floating point value, not a pattern of bits. You are thinking on a
    level of abstraction that is not realistic for most programming tasks.

    ? But the people who write those are few, and they know what
    they are doing.)

    And I don't? I used to write FP emulation routines...


    The thing you always seem to forget, is that your languages are
    written for /you/ - no one else.? It doesn't make a difference
    whether something is added /to/ the language or written in code
    /for/ the language.? You think other languages are missing
    critical features simply because there is a thing that /you/ want
    to do that you added to your own language. And you think other
    languages are overly complex or bloated because they have features
    that you don't want to use.

    They frequently have advanced features while ignoring the basics.

    No - they frequently have features that /you/ call "advanced" because
    you don't need or want them, and they ignore things that /you/ call
    "basics" because you /do/ need or want them. It's all about /you/.


    Imagine asking the regulars in this group what features or changes
    they would like C to have in order to make C "perfect" for their
    uses, regardless of everyone else, all existing code, all existing
    tools. We could all fill pages with ideas.? And if those were all
    added to C, the result would be a language that made C++ look as
    easy as Logo, while being riddled with inconsistencies and
    contradictions.

    Yes, that's the trick. That's why a lot of features I've played
    with have disappeared, while some have proved indispensable.

    As it is, somebody using C as an intermediate language can have a
    situation where something is well-defined in their source
    language, known to be well-defined on their platforms of
    interest, but inbetween, C says otherwise.)

    You've never really understood how languages are defined, have
    you? With your own languages and tools, you don't have to - there
    is no need for standards, specifications, or anything like that.
    You can just make up what suits you at the time.? The language is
    "defined" by what the implementation does.? That's been very
    convenient for you, but it has left you with serious
    misconceptions about how non-personal languages work.

    Here's a? program in a very simple language, where all variables
    have i64 type:

    ? c = a + b

    Here, the author has decreed that any overflow in this addition
    will wrap (any overflow bits above 64 are lost). If directly
    compiled to x64 code it might use this (here 'a b c' are aliases
    for the registers where they reside):

    ??? mov c, a
    ??? add c, b

    Or on ARM64:

    ??? add c, a, b

    Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:

    ??? int64_t a, b, c;
    ??? ...
    ??? c = a + b;

    But here, if a + b happens to overflow, it is UB, and for no good
    reason. You have to fix it. This is where it can be harder to
    generate HLL code than assembly!


    You are talking nonsense.

    Either a + b results in the correct answer, or it does not. Any sane
    person reads that as "a plus b" - mathematically adding two integers
    to get their sum. That's what the programmer wants, and that's what
    they ask for. And any sane programmer expects the language to give
    the correct result within its limitations, but doe not expect it to
    do magic. Expecting to form a sum that is greater than 2 ^ 63 and
    somehow produce the "correct" result is a total misunderstanding of mathematics and programming - any primary school kid will tell you
    that using the fingers of one hand, you can't add 3 and 4. They will
    /not/ tell you that it's fine to add them on one hand because 3 + 4
    is actually equal to 2.

    *Now* do you understand? This is nothing to do with me or my
    personal languages, it is a problem for every language that
    transpiles to C, where there is a mismatch between the sets of
    behaviour considered UB in each.

    I understand that simple maths and common sense is beyond you. I
    understand that you think mathematics should be defined in terms of accidental byproducts of the way hardware logic designs happen to be implemented.


    OK, so how would you do a 'reinterpret' cast in C, of a value
    like 'x+y'?

    As you know, you use a union.? So just to please you, here is your
    bit extraction - written as a one-line function (split over two
    lines for Usenet) because you seem to think that kind of thing is
    important :

    uint64_t get_exponent(double x) {
    ???? return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ????????????? & ((1ull << (62 - 52 + 1)) - 1
    );
    }

    That compiles (with gcc on x86-64) to :

    ?????movq rax, xmm0
    ?????shr rax, 52
    ?????and eax, 2047
    ?????ret

    There's nothing in C that suggests this must be put in memory or
    do anything more than this.

    (This only seems to work with gcc. Clang and MSVS don't like it.)


    I think you are mistaken. clang is fine with it. It is standard
    C99, so any decent C compiler from the last 25 years will handle it
    fine. MS gave up on bothering to make C compilers before the turn of
    the century (they make a reasonable enough C++ compiler). Even your
    hero tcc is fine with it (though on my attempts, it produces rubbish
    code - maybe it needs different flags for optimisation). The C code
    is not made invalid by the existence of C90-only compilers.


    MSVC compilers compile your code and produce correct result, but the
    code
    looks less nice:
    0000000000000000 <get_exponent>:
    0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
    6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
    b: 48 c1 e8 34 shr $0x34,%rax
    f: 25 ff 07 00 00 and $0x7ff,%eax
    14: c3 ret

    Although on old AMD processors it is likely faster than nicer code
    generated by gcc and clang. On newer processor gcc code is likely a bit
    better, but the difference is unlikely to be detected by simple
    measurements.

    Also MSVC compiler does not like your style and produces following
    warning:
    dave_b.c(5): warning C4116: unnamed type definition in parentheses

    BTW, I don't like your style either. My preferred code will look
    very similar to the code of Waldek Hebisch except that I'd declare
    d_to_u() static.
    I don't like union trick. Not just in this particular context, but
    generally. memcpy() much cleaner in expressing programmer's intentions.








    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Thu Nov 27 17:13:50 2025
    On 27/11/2025 13:02, David Brown wrote:
    On 27/11/2025 13:20, bart wrote:

    In some areas of C usage, shifts and masks - and bitfield extraction -
    turn up quite a bit.˙ But it seems the C operators work fine for the
    task.˙ It would not exactly be difficult to add a standard "bit_range_extract" function to the C standard library, yet no one has
    felt it to be worth the effort over the last 50 years.

    That doesn't say much. Binary literals only became official in C23.

    Width-specific integers only became standard in C99 (what did people use
    in the preceding quarter-century?) and are not yet in the core language.

    Such things as bit-extraction don't get prioritised because it so easy
    for people to put together some crummy macro to do the job. But this
    means everybody will create their own incompatible solutions.

    Min/Max operators, or 'lengthof', will never get added for similar
    reasons. But there was a time when every other code-base I looked at
    defined its own MIN/MAX or Min/Max or min/max macros or functions.

    Examples:

    #define MZ_MAX(a, b) (((a) > (b)) ? (a) : (b)) (MZ lib)

    # define MAX(A,B) ((A)>(B)?(A):(B)) (SQLite)

    #define MAX(a,b) ((a) > (b) ? (a) : (b)) (LIBjpeg)

    While everywhere you see patterns like:

    sizeof(somearray/somearray[0])

    which is crying out for standardisation.

    Here, I guess 'no one has felt it to be worth the effort'. Except me.


    The syntax actually comes from DEC Algol60 IIRC. It was used to access
    individual characters of a string, normally an indivisible type in
    that language, and I applied the same concept to bits of an integer.

    I don't care if you found the syntax on the back of a cornflakes packet.
    ˙The origin is not relevant.

    Oh, I thought it was an automatic negative reaction from you to anything
    I'd thought up. I guess you have it in for DEC too.



    How much more fundamental can you get?

    It is not fundamental for a low-level systems language.

    So bits are not fundamental either! But then, it has taken until C23
    to standardise binary literals, and there is still no format code for
    binary output.


    Very few programmers are at all interested in bits.

    Unless it is that extra bit in _BitInt(65)! Then it is apparently vital.

    ˙ A "double" holds a
    floating point value, not a pattern of bits.˙ You are thinking on a
    level of abstraction that is not realistic for most programming tasks.

    This is systems programming.

    They frequently have advanced features while ignoring the basics.

    No - they frequently have features that /you/ call "advanced" because
    you don't need or want them, and they ignore things that /you/ call
    "basics" because you /do/ need or want them.˙ It's all about /you/.

    Well, let's stick with C. Here are some features I use, and the C
    equivalents (A has whatever type is needed):

    M C
    -------------------------------------------------------------
    A.len sizeof(A)/sizeof(A[0])

    * max(a, b) (a > b ? a : b)

    A.odd A & 1, or A % 1

    A.even - you can do this one

    A.msbit (A>>31) & 1, or (A>>63) & 1

    2 ** n (1LL << n)
    a ** b (int) pow(a, b) (ints cast to float, and float result)
    x ** y (float) pow(x, y)

    A.[i] = x - you can do this too; assume x is 0 or 1

    A.[i..j] = x - yikes!

    * if c in 'A'..'Z' if (c >= 'A' && c <= 'Z')

    * if c in [cr, lf] if (c == cr || c == lf)

    * if a = b = c if (a == b && b == c)

    * swap(A[i+1], A[j]) {T temp=A[i+1]; A[i+1]=A[j]; A[j]=temp;}

    abs(x) abs(x), labs(x), llabs(x), fabs(x) ...

    println =a, =b printf("A=%X B=%Y\n", a, b); what are X, Y?

    readln a, b - some scanf nonsense

    (a,b):=c divrem d - involves div_t and div()

    print "-" * 50 "----------------------- ... "

    A[i, j] A[i][j]

    byte unsigned char, uint8_t, _BitInt(8), char maybe


    (* marks examples that are problemetic in C when operands that have side-effects are evaluated twice)

    There are /dozens/ of examples like this that make small tasks a
    pleasure to write, but also make them clearer, to the point, and less
    error prone.

    But let me guess, none of this cuts any ice at all. The C will always be superior.

    You are talking nonsense.

    End of discussion then. You either missed my point or chose to ignore it.

    I understand that simple maths and common sense is beyond you.

    More insults.

    uint64_t get_exponent(double x) {
    ˙˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }

    That compiles (with gcc on x86-64) to :

    ˙˙˙˙˙movq rax, xmm0
    ˙˙˙˙˙shr rax, 52
    ˙˙˙˙˙and eax, 2047
    ˙˙˙˙˙ret

    There's nothing in C that suggests this must be put in memory or do
    anything more than this.

    (This only seems to work with gcc. Clang and MSVS don't like it.)


    I think you are mistaken.˙ clang is fine with it.˙ It is standard C99,
    so any decent C compiler from the last 25 years will handle it fine.˙ MS gave up on bothering to make C compilers before the turn of the century (they make a reasonable enough C++ compiler).˙ Even your hero tcc is
    fine with it (though on my attempts, it produces rubbish code - maybe it needs different flags for optimisation).˙ The C code is not made invalid
    by the existence of C90-only compilers.

    I was mistaken. I used godbolt.org but it was set to C++. Presumably gcc
    has some C++ extensions that make it valid.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Ike Naar@3:633/10 to All on Thu Nov 27 17:38:03 2025
    On 2025-11-27, bart <bc@freeuk.com> wrote:
    Well, let's stick with C. Here are some features I use, and the C equivalents (A has whatever type is needed):

    M C
    -------------------------------------------------------------
    [snip]
    A.odd A & 1, or A % 1

    "A % 1" ?

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Thu Nov 27 17:59:19 2025
    On 27/11/2025 17:38, Ike Naar wrote:
    On 2025-11-27, bart <bc@freeuk.com> wrote:
    Well, let's stick with C. Here are some features I use, and the C
    equivalents (A has whatever type is needed):

    M C
    -------------------------------------------------------------
    [snip]
    A.odd A & 1, or A % 1

    "A % 1" ?

    I guess A % 2 then.

    Note my remark about error proneness later on.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Thu Nov 27 21:15:53 2025
    On 27/11/2025 15:02, Michael S wrote:
    On Thu, 27 Nov 2025 14:02:38 +0100
    David Brown <david.brown@hesbynett.no> wrote:



    MSVC compilers compile your code and produce correct result, but the
    code
    looks less nice:
    0000000000000000 <get_exponent>:
    0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
    6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
    b: 48 c1 e8 34 shr $0x34,%rax
    f: 25 ff 07 00 00 and $0x7ff,%eax
    14: c3 ret

    Although on old AMD processors it is likely faster than nicer code
    generated by gcc and clang. On newer processor gcc code is likely a bit better, but the difference is unlikely to be detected by simple
    measurements.

    I think it is unlikely that this version - moving from xmm0 to rax via
    memory instead of directly - is faster on any processor. But I fully
    agree that it is unlikely to be a measurable difference in practice.


    Also MSVC compiler does not like your style and produces following
    warning:
    dave_b.c(5): warning C4116: unnamed type definition in parentheses

    Warnings are a matter of taste. There's nothing wrong with my code, but
    it may be against some code styles.


    BTW, I don't like your style either. My preferred code will look
    very similar to the code of Waldek Hebisch except that I'd declare
    d_to_u() static.
    I don't like union trick. Not just in this particular context, but
    generally. memcpy() much cleaner in expressing programmer's intentions.


    I particularly don't like using unions in compound literals like this
    either - it was just to make a compact demonstration. I'd write real
    code in more re-usable bits with static inline functions.

    I disagree, however, that memcpy() shows intent better. The intention
    is not to copy it to memory - the intention is to access the underlying
    bit representation as a different type. A type-punning union is at
    least, if not more, clear for that purpose (IMHO - and judgements of
    style and clarity are very much a matter of opinion).


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Fri Nov 28 00:15:07 2025
    On Thu, 27 Nov 2025 21:15:53 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 27/11/2025 15:02, Michael S wrote:
    On Thu, 27 Nov 2025 14:02:38 +0100
    David Brown <david.brown@hesbynett.no> wrote:



    MSVC compilers compile your code and produce correct result, but the
    code
    looks less nice:
    0000000000000000 <get_exponent>:
    0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
    6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
    b: 48 c1 e8 34 shr $0x34,%rax
    f: 25 ff 07 00 00 and $0x7ff,%eax
    14: c3 ret

    Although on old AMD processors it is likely faster than nicer code generated by gcc and clang. On newer processor gcc code is likely a
    bit better, but the difference is unlikely to be detected by simple measurements.

    I think it is unlikely that this version - moving from xmm0 to rax
    via memory instead of directly - is faster on any processor. But I
    fully agree that it is unlikely to be a measurable difference in
    practice.

    I wonder, how do you have a nerve "to think" about things that you have absolutely no idea about?

    Instead of "thinking" you could just as well open Optimization
    Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
    Agner Fog's instruction tables. Move from XMM to GPR on these
    processors is very slow: 8 clocks on BD, 7 on BbC.

    BTW, AMD K8 has the opposite problem. Move from XMM to GPR is reasonably
    fast, but move from GPR to XMM is painfully slow.

    On the other hand, moves "via memory" are reasonably fast on these
    CPUs (except, may be, Bobcat? I am not sure about it), because data
    does not really travels through memory or through cache. Load-store
    forwarding picks the data directly from the store queue.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Thu Nov 27 15:59:13 2025
    bart <bc@freeuk.com> writes:
    On 27/11/2025 10:43, David Brown wrote:
    [...]
    uint64_t get_exponent(double x) {
    ˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }
    That compiles (with gcc on x86-64) to :
    ˙˙˙˙movq rax, xmm0
    ˙˙˙˙shr rax, 52
    ˙˙˙˙and eax, 2047
    ˙˙˙˙ret
    There's nothing in C that suggests this must be put in memory or do
    anything more than this.

    (This only seems to work with gcc. Clang and MSVS don't like it.)

    How exactly did clang and msvs express their dislike? What versions are
    you using?

    On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
    tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.

    If your problem is that you're using older compilers that don't support compound literals, it would have saved some time if you had said so.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Nov 28 00:11:53 2025
    On 27/11/2025 23:59, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 27/11/2025 10:43, David Brown wrote:
    [...]
    uint64_t get_exponent(double x) {
    ˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }
    That compiles (with gcc on x86-64) to :
    ˙˙˙˙movq rax, xmm0
    ˙˙˙˙shr rax, 52
    ˙˙˙˙and eax, 2047
    ˙˙˙˙ret
    There's nothing in C that suggests this must be put in memory or do
    anything more than this.

    (This only seems to work with gcc. Clang and MSVS don't like it.)

    How exactly did clang and msvs express their dislike? What versions are
    you using?

    On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
    tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.

    If your problem is that you're using older compilers that don't support compound literals, it would have saved some time if you had said so.


    I said in a followup that I'd been using a C++ compiler by mistake (this
    was on Godbolt).

    That gcc's C++ compiler accepted the code wasn't helpful.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Thu Nov 27 16:39:51 2025
    bart <bc@freeuk.com> writes:
    On 27/11/2025 23:59, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 27/11/2025 10:43, David Brown wrote:
    [...]
    uint64_t get_exponent(double x) {
    ˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }
    [...]
    How exactly did clang and msvs express their dislike? What versions
    are
    you using?
    On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
    tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
    If your problem is that you're using older compilers that don't
    support
    compound literals, it would have saved some time if you had said so.

    Can you *please* do something about the way your newsreader
    (apparently Mozilla Thunderbird) mangles quoted text? That first
    quoted line, starting with "> How exactly", would have been just
    74 columns, but your newsreader folded it, making it more difficult
    to read. It also deletes blank lines between paragraphs.

    I don't recall similar problems from other Thunderbird users.

    I said in a followup that I'd been using a C++ compiler by mistake
    (this was on Godbolt).

    That gcc's C++ compiler accepted the code wasn't helpful.

    But not surprising, since as you know gcc (and likewise g++) is
    not fully conforming by default. If you're compiling code with the
    purpose of making a point about the language, invoke the compiler
    in standard-conforming mode. And if a compiler "doesn't like"
    the code you feed it, at least show us the diagnostic messages.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Nov 28 01:49:44 2025
    On 28/11/2025 00:39, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 27/11/2025 23:59, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 27/11/2025 10:43, David Brown wrote:
    [...]
    uint64_t get_exponent(double x) {
    ˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }
    [...]
    How exactly did clang and msvs express their dislike? What versions
    are
    you using?
    On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
    tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
    If your problem is that you're using older compilers that don't
    support
    compound literals, it would have saved some time if you had said so.

    Can you *please* do something about the way your newsreader
    (apparently Mozilla Thunderbird) mangles quoted text? That first
    quoted line, starting with "> How exactly", would have been just
    74 columns, but your newsreader folded it, making it more difficult
    to read. It also deletes blank lines between paragraphs.

    I don't recall similar problems from other Thunderbird users.

    I don't see anything amiss with quoted content in my own posts. My last
    post looks like this to me:

    https://github.com/sal55/langs/blob/master/tbird.png

    In any case, I've no idea how to fix the problem, assuming it is at my end.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Fri Nov 28 03:33:49 2025
    On 11/27/25 18:59, bart wrote:
    On 27/11/2025 17:38, Ike Naar wrote:
    On 2025-11-27, bart <bc@freeuk.com> wrote:
    Well, let's stick with C. Here are some features I use, and the C
    equivalents (A has whatever type is needed):

    ˙˙˙˙ M˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ C
    ˙˙˙˙ -------------------------------------------------------------
    [snip]
    ˙˙˙˙ A.odd˙˙˙˙˙˙˙˙˙˙˙˙˙˙ A & 1, or A % 1

    "A % 1" ?

    I guess A % 2 then.

    You guess? - LOL - okay. :-)

    Note my remark about error proneness later on.

    Higher level abstractions (usually found in higher level languages)
    are always less error prone than low-level (or composed) constructs.

    "C" is inherently and by design a comparably low-level language, so
    I wonder what you complain here about. (You won't change that.)

    'even' and 'odd' are higher level abstractions than bit-operations,
    and they are also _special cases_ (nonetheless useful; I like them,
    and I appreciate if they are present in any language). The general
    case of the terms like "odd" and "even" is defined mathematically,
    though; so the natural way of describing them would (IMO) rather be
    based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
    syntax with '%' is probably more "appropriate". Mileages may vary.)

    You can of course add as many commodity features to "your language"
    as you like. I seem to recall that one of the design principles of
    "C" was to not add too many keywords. (Not sure whether 'A.odd' is
    a function or keyword above [in "your language"].) Omitting to add
    special case operators or functions for things that can simply be
    expressed by the respective arithmetic or boolean counterparts is
    not an unreasonable language-detail design decision.[*]

    You made a mistake above (or just a typo), never mind. I suppose it
    stems from your primary "thinking in bits". - This is not meant to
    be offensive. - Back in university days (I still remember!) I made
    a similar typo but vice versa; I wanted to express "div 2" in some
    assembler language and accidentally wrote "shift-right 2", the same
    type of typo but the other way round. I *knew*, and didn't "guess",
    though, that "shift-right 1" would have been correct. ;-)

    Janis

    [*] Compare to Algol 68 that introduced everything ("including the
    kitchen sink"), and even in multiple variants! - A design decision
    that is also not appreciated by everyone.

    PS: BTW, I was always wondering why Pascal and Algol 68 supported
    'odd' but not 'even'! - In the documents of the Genie compiler we
    can read: "This is a relic of times long past.", but beyond that
    it doesn't explain why it's a "relic". I can only guess that it's,
    as a special case, considered just unnecessary in the presence of
    the modulus operator.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Thu Nov 27 19:36:22 2025
    bart <bc@freeuk.com> writes:
    On 28/11/2025 00:39, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 27/11/2025 23:59, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 27/11/2025 10:43, David Brown wrote:
    [...]
    uint64_t get_exponent(double x) {
    ˙˙˙ return ((union { double d; uint64_t u;}) { x }.u >> 52)
    ˙˙˙˙˙˙˙˙˙˙˙˙ & ((1ull << (62 - 52 + 1)) - 1);
    }
    [...]
    How exactly did clang and msvs express their dislike? What versions
    are
    you using?
    On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
    tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
    If your problem is that you're using older compilers that don't
    support
    compound literals, it would have saved some time if you had said so.
    Can you *please* do something about the way your newsreader
    (apparently Mozilla Thunderbird) mangles quoted text? That first
    quoted line, starting with "> How exactly", would have been just
    74 columns, but your newsreader folded it, making it more difficult
    to read. It also deletes blank lines between paragraphs.
    I don't recall similar problems from other Thunderbird users.

    I don't see anything amiss with quoted content in my own posts. My
    last post looks like this to me:

    https://github.com/sal55/langs/blob/master/tbird.png

    In any case, I've no idea how to fix the problem, assuming it is at my end.

    My apologies, the problem doesn't appear to be on your end.

    I saved your post from my newsreader (Gnus), and the quoted text
    was correctly formatted in the saved copy. The lines were not
    unevenly wrapped, and blank lines between paragraphs were preserved.
    The formatting is messed up when I view the article in Gnus, but ok
    when I view it in Thunderbird.

    Relevant headers in your article are:

    Content-Type: text/plain; charset=UTF-8; format=flowed
    Content-Transfer-Encoding: 8bit
    ...
    User-Agent: Mozilla Thunderbird
    ...
    Content-Language: en-GB

    I think the "format=flowed" might be an issue (I suggest it's
    not ideal for Usenet posts), but yours aren't the only posts that
    use that.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Fri Nov 28 09:46:56 2025
    On 27/11/2025 23:15, Michael S wrote:
    On Thu, 27 Nov 2025 21:15:53 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 27/11/2025 15:02, Michael S wrote:
    On Thu, 27 Nov 2025 14:02:38 +0100
    David Brown <david.brown@hesbynett.no> wrote:



    MSVC compilers compile your code and produce correct result, but the
    code
    looks less nice:
    0000000000000000 <get_exponent>:
    0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
    6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
    b: 48 c1 e8 34 shr $0x34,%rax
    f: 25 ff 07 00 00 and $0x7ff,%eax
    14: c3 ret

    Although on old AMD processors it is likely faster than nicer code
    generated by gcc and clang. On newer processor gcc code is likely a
    bit better, but the difference is unlikely to be detected by simple
    measurements.

    I think it is unlikely that this version - moving from xmm0 to rax
    via memory instead of directly - is faster on any processor. But I
    fully agree that it is unlikely to be a measurable difference in
    practice.

    I wonder, how do you have a nerve "to think" about things that you have absolutely no idea about?

    I think about many things - and these are things I /do/ know about. But
    I don't know all the details, and am happy to be corrected and learn more.


    Instead of "thinking" you could just as well open Optimization
    Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
    Agner Fog's instruction tables. Move from XMM to GPR on these
    processors is very slow: 8 clocks on BD, 7 on BbC.


    Okay. But storing data to memory from xmm0 is also going to be slow,
    and loading it to rax from memory is going to be slow. I am not an
    expert at the x86 world or reading Fog's tables, but it looks to me that
    on a Bulldozer, storing from xmm0 to memory has a latency of 6 cycles
    and reading the memory into rax has a latency of 4 cycles. That adds up
    to more than the 8 cycles for the direct register transfer, and I expect
    (but do not claim to know for sure!) that the dependency limits the
    scope for pipeline overlap - decode and address calculations can be
    done, but the data can't be fetched until the previous store is complete.

    So all in all, my estimate was, I think, quite reasonable. There may be unusual circumstances on particular cores if the instruction scheduling
    and pipelining, combined with the stack engine, make that sequence
    faster than the single register move.

    I've now had a short look at the relevant table from Fog's site. My conclusion from that is that the register move - though surprisingly
    slow - is probably marginally faster than passing it through memory.
    Perhaps if I spend enough time studying the details, I might find out
    more and discover that I was wrong. But that would be an extraordinary
    effort to learn about a meaningless little detail of a long-gone processor.

    I am also fairly confident that the function as a whole will be faster
    with the register move since you will get better overlap and
    superscaling with the call and return sequence when the instructions in
    the middle don't access the stack.

    Of curiosity, I compiled the code with gcc and "-march=bdver1", which I believe is the correct flag for that processor. It generated the
    register move version, but with a "vmovq" instruction instead of "movq".
    I don't know if there is any difference there - x86 instruction naming
    seems to have a certain degree of variance. (gcc's models of
    scheduling, pipelining and timing for processors is far from perfect,
    but the gcc folks do study Agner Fog's publications as well as having contributors from AMD and Intel.)

    More interesting, however, was that with "-march=bdver2" (up to bdver4)
    gcc changed the "shr / and" sequence to a single "bextr" instruction. I didn't see that on other -march choices. It seems the two instruction shift-and-mask is faster than a single bit extract instruction on most
    x86 processors.

    All in all, it is a lesson on how small details of architectures can
    make a difference.

    BTW, AMD K8 has the opposite problem. Move from XMM to GPR is reasonably fast, but move from GPR to XMM is painfully slow.

    On the other hand, moves "via memory" are reasonably fast on these
    CPUs (except, may be, Bobcat? I am not sure about it), because data
    does not really travels through memory or through cache. Load-store forwarding picks the data directly from the store queue.


    Yes, and there can be even more specialised short-cuts for stack data.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Fri Nov 28 11:41:21 2025
    On 27/11/2025 18:13, bart wrote:
    On 27/11/2025 13:02, David Brown wrote:
    On 27/11/2025 13:20, bart wrote:


    I'm snipping most of this, because I don't think we are getting anywhere except down angry rabbit holes. Most of what we both have written has
    been said many times before, and I don't want to re-hash old fights.
    They bring out the worst in both of us, and we both get frustrated and annoyed. I'd rather reset the conversation before it gets out of hand,
    and go back to exchanging opinions and ideas, and helping out.


    (This only seems to work with gcc. Clang and MSVS don't like it.)


    I think you are mistaken.˙ clang is fine with it.˙ It is standard C99,
    so any decent C compiler from the last 25 years will handle it fine.
    MS gave up on bothering to make C compilers before the turn of the
    century (they make a reasonable enough C++ compiler).˙ Even your hero
    tcc is fine with it (though on my attempts, it produces rubbish code -
    maybe it needs different flags for optimisation).˙ The C code is not
    made invalid by the existence of C90-only compilers.

    I was mistaken. I used godbolt.org but it was set to C++. Presumably gcc
    has some C++ extensions that make it valid.

    You are not the first person to mix that up on godbolt.org, with a
    different language and/or compiler from what you thought you had.

    I usually make a point of explicitly specifying the standard in the
    command line arguments - that means there is no doubt about what I am
    asking for. And if you specify a C standard with g++, you will get an
    error message (unless you also use "-x c" to tell g++ that you have C code).

    My standard options are :

    -std=c17 -Wall -Wextra -Wpedantic -O2


    Of course I will vary the standard according to need - so for looking at _BitInt, I have -std=c23. I sometimes use -std=gnu17 or similar when I specifically want to use gcc extensions - in which case "-Wpedantic" is basically pointless. And for C++, I use appropriate C++ standards.
    Note that without an explicit "-std=" option, gcc will use a "gnuXX"
    version that depends on the compiler version. Thus gcc extensions are accepted by default.

    "-Wall -Wextra" enable lots of warnings. For real work, I have
    fine-tuned warning sets - I don't want all of these sets, and I want
    some warnings that are not in these sets, but they give a good starting
    point for code snippets on godbolt.

    "-Wpedantic" gives warnings on deviations from the standard. It will
    give you warnings if you accidentally use a gcc extension (such as using compound literals in C++). I don't think gcc is perfectly conforming
    with "-std=c?? -Wpedantic", but it is as close as any other compiler I
    have seen, and is IMHO the best starting point for checks.

    And I almost always have -O2. -O3 can sometimes lead to overwhelming
    amounts of extra inline code that make the assembly hard to follow. -O0 generates unreadably bad assembly. -O1 is easier to follow. But for
    me, -O2 is generally the sweet spot. I have no real interest in using a compiler that doesn't do decent optimisation - if I am happy with slow
    code, I'll use Python.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Fri Nov 28 13:12:17 2025
    On Fri, 28 Nov 2025 09:46:56 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 27/11/2025 23:15, Michael S wrote:
    On Thu, 27 Nov 2025 21:15:53 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 27/11/2025 15:02, Michael S wrote:
    On Thu, 27 Nov 2025 14:02:38 +0100
    David Brown <david.brown@hesbynett.no> wrote:



    MSVC compilers compile your code and produce correct result, but
    the code
    looks less nice:
    0000000000000000 <get_exponent>:
    0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
    6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
    b: 48 c1 e8 34 shr $0x34,%rax
    f: 25 ff 07 00 00 and $0x7ff,%eax
    14: c3 ret

    Although on old AMD processors it is likely faster than nicer code
    generated by gcc and clang. On newer processor gcc code is likely
    a bit better, but the difference is unlikely to be detected by
    simple measurements.

    I think it is unlikely that this version - moving from xmm0 to rax
    via memory instead of directly - is faster on any processor. But I
    fully agree that it is unlikely to be a measurable difference in
    practice.

    I wonder, how do you have a nerve "to think" about things that you
    have absolutely no idea about?

    I think about many things - and these are things I /do/ know about.
    But I don't know all the details, and am happy to be corrected and
    learn more.


    Instead of "thinking" you could just as well open Optimization
    Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
    Agner Fog's instruction tables. Move from XMM to GPR on these
    processors is very slow: 8 clocks on BD, 7 on BbC.


    Okay. But storing data to memory from xmm0 is also going to be slow,
    and loading it to rax from memory is going to be slow. I am not an
    expert at the x86 world or reading Fog's tables, but it looks to me
    that on a Bulldozer, storing from xmm0 to memory has a latency of 6
    cycles and reading the memory into rax has a latency of 4 cycles.
    That adds up to more than the 8 cycles for the direct register
    transfer, and I expect (but do not claim to know for sure!) that the dependency limits the scope for pipeline overlap - decode and address calculations can be done, but the data can't be fetched until the
    previous store is complete.

    So all in all, my estimate was, I think, quite reasonable. There may
    be unusual circumstances on particular cores if the instruction
    scheduling and pipelining, combined with the stack engine, make that
    sequence faster than the single register move.


    It seems, you are correct in this particular case.
    Latency tables, esp. those that are measured by software rather
    than supplied by designer, are problematic in case of moves between
    registers of different types, memory stores of all types and even
    memory loads, with exception of memory load into GPR. Agner explains why
    they are problematic in te preface to his tables. In short, there is no
    direct way to measure this things in isolation, so one has to measure
    latency of the sequence of instructions and then to apply either
    guesswork or manufacturer's docs to somehow divide the combined
    latency into individual parts.

    So, the best way is to go by recommendations of the vendor in Opt.
    Reference Manual.
    There are no relevant recommendations for K8, unfortunately. I suspect
    that all methods are slow here.
    For Bobcat, there should be recommendations, but I don't have them and
    too lazy to look for.

    For Family 10h (Barcelona and derivatives):
    "When moving data from a GPR to an MMX or XMM register, use separate
    store and load instructions to move the data first from the source
    register to a temporary location in memory and then from memory into
    the destination register, taking the memory latency into account when scheduling both stages of the load-store sequence.

    When moving data from an MMX or XMM register to a general-purpose
    register, use the MOVD instruction.

    Whenever possible, use loads and stores of the same data length. (See
    5.3, ?Store-to-Load Forwarding Restrictions? on page 74 for
    more
    information.)"

    For Family 15h (Bullozer and derivatives):
    "When moving data from a GPR to an XMM register, use separate store and
    load instructions to move the data first from the source register to a temporary location in memory and then from memory into the destination register, taking the memory latency into account when scheduling both
    stages of the load-store sequence.

    When moving data from an XMM register to a general-purpose register,
    use the VMOVD instruction.

    Whenever possible, use loads and stores of the same data length. (See
    6.3, ?Store-to-Load Forwarding Restrictions? on page 98 for
    more
    information.)"

    So, for both families, vendor recommends register move in direction from
    SIMD to GPR and Store/Load sequence in direction from GPR to SIMD.
    The suspect point here is specific mentioning of EVEX-encoded form
    (VMOVD) in case of BD. It can mean that "legacy" (SSE-encoded) form is
    slower or it can mean nothing. I suspect the latter.

    I've now had a short look at the relevant table from Fog's site. My conclusion from that is that the register move - though surprisingly
    slow - is probably marginally faster than passing it through memory.
    Perhaps if I spend enough time studying the details, I might find out
    more and discover that I was wrong. But that would be an
    extraordinary effort to learn about a meaningless little detail of a long-gone processor.

    I am also fairly confident that the function as a whole will be
    faster with the register move since you will get better overlap and superscaling with the call and return sequence when the instructions
    in the middle don't access the stack.

    Of curiosity, I compiled the code with gcc and "-march=bdver1", which
    I believe is the correct flag for that processor. It generated the
    register move version, but with a "vmovq" instruction instead of
    "movq". I don't know if there is any difference there - x86
    instruction naming seems to have a certain degree of variance.
    (gcc's models of scheduling, pipelining and timing for processors is
    far from perfect, but the gcc folks do study Agner Fog's publications
    as well as having contributors from AMD and Intel.)

    More interesting, however, was that with "-march=bdver2" (up to
    bdver4) gcc changed the "shr / and" sequence to a single "bextr"
    instruction. I didn't see that on other -march choices. It seems
    the two instruction shift-and-mask is faster than a single bit
    extract instruction on most x86 processors.

    All in all, it is a lesson on how small details of architectures can
    make a difference.


    Zen3 has its own can of worms in the area of moving data between
    GPR and SIMD. The issues here are more subtle than those mentioned
    above. And unfortunately almost completely non-documented in the
    manuals. And despite that issues are subtle, performance impact can be
    very significant.

    I encountered these things when implementing alternative
    (to those currently in use by gcc) IEEE binary128 arithmetic routines.
    My conclusion was that designers of binary128 ABI in general and of ABI
    of support routines in particular made a serious mistake by treating
    binary128 (a.k.a. __float128, a.k.a _Float128, a.k.a. 'long double' on
    ARM64) as "floating-point" type that is passed around in XMM registers
    (or Neon registers on ARM64). Both passing it in pair of GPRs and via
    memory would be significantly faster on AMD processors and detectably
    faster on Intel processors.


    BTW, AMD K8 has the opposite problem. Move from XMM to GPR is
    reasonably fast, but move from GPR to XMM is painfully slow.

    On the other hand, moves "via memory" are reasonably fast on these
    CPUs (except, may be, Bobcat? I am not sure about it), because data
    does not really travels through memory or through cache. Load-store forwarding picks the data directly from the store queue.


    Yes, and there can be even more specialised short-cuts for stack data.






    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Fri Nov 28 12:45:58 2025
    On 28/11/2025 12:12, Michael S wrote:
    On Fri, 28 Nov 2025 09:46:56 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 27/11/2025 23:15, Michael S wrote:
    On Thu, 27 Nov 2025 21:15:53 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 27/11/2025 15:02, Michael S wrote:
    On Thu, 27 Nov 2025 14:02:38 +0100
    David Brown <david.brown@hesbynett.no> wrote:



    MSVC compilers compile your code and produce correct result, but
    the code
    looks less nice:
    0000000000000000 <get_exponent>:
    0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
    6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
    b: 48 c1 e8 34 shr $0x34,%rax
    f: 25 ff 07 00 00 and $0x7ff,%eax
    14: c3 ret

    Although on old AMD processors it is likely faster than nicer code
    generated by gcc and clang. On newer processor gcc code is likely
    a bit better, but the difference is unlikely to be detected by
    simple measurements.

    I think it is unlikely that this version - moving from xmm0 to rax
    via memory instead of directly - is faster on any processor. But I
    fully agree that it is unlikely to be a measurable difference in
    practice.

    I wonder, how do you have a nerve "to think" about things that you
    have absolutely no idea about?

    I think about many things - and these are things I /do/ know about.
    But I don't know all the details, and am happy to be corrected and
    learn more.


    Instead of "thinking" you could just as well open Optimization
    Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
    Agner Fog's instruction tables. Move from XMM to GPR on these
    processors is very slow: 8 clocks on BD, 7 on BbC.


    Okay. But storing data to memory from xmm0 is also going to be slow,
    and loading it to rax from memory is going to be slow. I am not an
    expert at the x86 world or reading Fog's tables, but it looks to me
    that on a Bulldozer, storing from xmm0 to memory has a latency of 6
    cycles and reading the memory into rax has a latency of 4 cycles.
    That adds up to more than the 8 cycles for the direct register
    transfer, and I expect (but do not claim to know for sure!) that the
    dependency limits the scope for pipeline overlap - decode and address
    calculations can be done, but the data can't be fetched until the
    previous store is complete.

    So all in all, my estimate was, I think, quite reasonable. There may
    be unusual circumstances on particular cores if the instruction
    scheduling and pipelining, combined with the stack engine, make that
    sequence faster than the single register move.


    It seems, you are correct in this particular case.
    Latency tables, esp. those that are measured by software rather
    than supplied by designer, are problematic in case of moves between
    registers of different types, memory stores of all types and even
    memory loads, with exception of memory load into GPR. Agner explains why
    they are problematic in te preface to his tables. In short, there is no direct way to measure this things in isolation, so one has to measure
    latency of the sequence of instructions and then to apply either
    guesswork or manufacturer's docs to somehow divide the combined
    latency into individual parts.


    Well, if even Agner thinks it is difficult, then I don't feel bad for
    having trouble!

    So, the best way is to go by recommendations of the vendor in Opt.
    Reference Manual.
    There are no relevant recommendations for K8, unfortunately. I suspect
    that all methods are slow here.
    For Bobcat, there should be recommendations, but I don't have them and
    too lazy to look for.


    Fair enough. It is not information that is likely to be useful to
    anyone here, so it's all for fun and interest. I certainly wouldn't
    want you to spend effort finding out the details just for me.

    For Family 10h (Barcelona and derivatives):
    "When moving data from a GPR to an MMX or XMM register, use separate
    store and load instructions to move the data first from the source
    register to a temporary location in memory and then from memory into
    the destination register, taking the memory latency into account when scheduling both stages of the load-store sequence.

    When moving data from an MMX or XMM register to a general-purpose
    register, use the MOVD instruction.

    Whenever possible, use loads and stores of the same data length. (See
    5.3, ?Store-to-Load Forwarding Restrictions? on page 74 for more information.)"

    How much does advice like this take into account surrounding code?
    That's what makes generating optimal code /really/ hard. And it means micro-optimising a short instruction sequence can be ineffective for real-world code. After all, no one is actually interested in minimising
    the number of nanoseconds it takes to extract the exponent of a floating
    point number - the speed only matters if you are doing lots of these,
    probably in a big loop with data moving into and out of memory all the time.

    This stuff was all /so/ much easier when we used PIC's and AVR's...


    For Family 15h (Bullozer and derivatives):
    "When moving data from a GPR to an XMM register, use separate store and
    load instructions to move the data first from the source register to a temporary location in memory and then from memory into the destination register, taking the memory latency into account when scheduling both
    stages of the load-store sequence.

    When moving data from an XMM register to a general-purpose register,
    use the VMOVD instruction.

    Whenever possible, use loads and stores of the same data length. (See
    6.3, ?Store-to-Load Forwarding Restrictions? on page 98 for more information.)"

    So, for both families, vendor recommends register move in direction from
    SIMD to GPR and Store/Load sequence in direction from GPR to SIMD.
    The suspect point here is specific mentioning of EVEX-encoded form
    (VMOVD) in case of BD. It can mean that "legacy" (SSE-encoded) form is
    slower or it can mean nothing. I suspect the latter.

    I've now had a short look at the relevant table from Fog's site. My
    conclusion from that is that the register move - though surprisingly
    slow - is probably marginally faster than passing it through memory.
    Perhaps if I spend enough time studying the details, I might find out
    more and discover that I was wrong. But that would be an
    extraordinary effort to learn about a meaningless little detail of a
    long-gone processor.

    I am also fairly confident that the function as a whole will be
    faster with the register move since you will get better overlap and
    superscaling with the call and return sequence when the instructions
    in the middle don't access the stack.

    Of curiosity, I compiled the code with gcc and "-march=bdver1", which
    I believe is the correct flag for that processor. It generated the
    register move version, but with a "vmovq" instruction instead of
    "movq". I don't know if there is any difference there - x86
    instruction naming seems to have a certain degree of variance.
    (gcc's models of scheduling, pipelining and timing for processors is
    far from perfect, but the gcc folks do study Agner Fog's publications
    as well as having contributors from AMD and Intel.)

    More interesting, however, was that with "-march=bdver2" (up to
    bdver4) gcc changed the "shr / and" sequence to a single "bextr"
    instruction. I didn't see that on other -march choices. It seems
    the two instruction shift-and-mask is faster than a single bit
    extract instruction on most x86 processors.

    All in all, it is a lesson on how small details of architectures can
    make a difference.


    Zen3 has its own can of worms in the area of moving data between
    GPR and SIMD. The issues here are more subtle than those mentioned
    above. And unfortunately almost completely non-documented in the
    manuals. And despite that issues are subtle, performance impact can be
    very significant.

    I encountered these things when implementing alternative
    (to those currently in use by gcc) IEEE binary128 arithmetic routines.
    My conclusion was that designers of binary128 ABI in general and of ABI
    of support routines in particular made a serious mistake by treating binary128 (a.k.a. __float128, a.k.a _Float128, a.k.a. 'long double' on
    ARM64) as "floating-point" type that is passed around in XMM registers
    (or Neon registers on ARM64). Both passing it in pair of GPRs and via
    memory would be significantly faster on AMD processors and detectably
    faster on Intel processors.

    I can believe that. If you have to implement floating point routines in general integer hardware (and I expect that is the case for most of your implementation here) then I would think it is better to start and end
    with the data in GPR's. On some targets, moving data into and out of
    floating point or vector registers is efficient enough that those
    registers can effectively be used as caches, but it sounds like that is
    not the case here.



    BTW, AMD K8 has the opposite problem. Move from XMM to GPR is
    reasonably fast, but move from GPR to XMM is painfully slow.

    On the other hand, moves "via memory" are reasonably fast on these
    CPUs (except, may be, Bobcat? I am not sure about it), because data
    does not really travels through memory or through cache. Load-store
    forwarding picks the data directly from the store queue.


    Yes, and there can be even more specialised short-cuts for stack data.







    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Nov 28 11:49:40 2025
    On 28/11/2025 02:33, Janis Papanagnou wrote:
    On 11/27/25 18:59, bart wrote:
    On 27/11/2025 17:38, Ike Naar wrote:
    On 2025-11-27, bart <bc@freeuk.com> wrote:
    Well, let's stick with C. Here are some features I use, and the C
    equivalents (A has whatever type is needed):

    ˙˙˙˙ M˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ C
    ˙˙˙˙ -------------------------------------------------------------
    [snip]
    ˙˙˙˙ A.odd˙˙˙˙˙˙˙˙˙˙˙˙˙˙ A & 1, or A % 1

    "A % 1" ?

    I guess A % 2 then.

    You guess? - LOL - okay. :-)

    Note my remark about error proneness later on.

    Higher level abstractions (usually found in higher level languages)
    are always less error prone than low-level (or composed) constructs.

    "C" is inherently and by design a comparably low-level language, so
    I wonder what you complain here about. (You won't change that.)

    So is mine. But it has many more 'commodity' features that make life
    simpler. Plus a generally cleaner syntax to make it clearer.


    'even' and 'odd' are higher level abstractions than bit-operations,
    and they are also _special cases_ (nonetheless useful; I like them,
    and I appreciate if they are present in any language). The general
    case of the terms like "odd" and "even" is defined mathematically,
    though;

    The advantage of using '.odd' is that the language doesn't specify how
    it works, just the behaviour.

    (But internally, 'A.odd' is an alias for 'A.[0]', and 'A.even' is one
    for 'not A.[0]', but with the extra proviso that these are read-only:
    while `A.[0] := x' is possible, you can't do 'A.odd := x'.)


    so the natural way of describing them would (IMO) rather be
    based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
    syntax with '%' is probably more "appropriate". Mileages may vary.)

    I've made the mistake with % 1 more than once.


    You can of course add as many commodity features to "your language"
    as you like. I seem to recall that one of the design principles of
    "C" was to not add too many keywords. (Not sure whether 'A.odd' is
    a function or keyword above [in "your language"].)

    It is a reserved word, which means it can't be used as either a
    top-level user identifier, or a member name. With extra effort, it could
    be used for both, but that needs some special syntax, such as Ada-style "A'odd"; I've never got around to it.

    In Pascal (where I copied it from) it is a reserved word.

    You made a mistake above (or just a typo), never mind. I suppose it
    stems from your primary "thinking in bits". - This is not meant to
    be offensive. - Back in university days (I still remember!) I made
    a similar typo but vice versa; I wanted to express "div 2" in some
    assembler language and accidentally wrote "shift-right 2", the same
    type of typo but the other way round. I *knew*, and didn't "guess",
    though, that "shift-right 1" would have been correct. ;-)

    I use a decimal type in another language. There bitwise operations don't
    work. I would have to define what they might do. For example, the possibilities for `123 << 2` might be:

    - Not valid (how it works now)
    - 12300 (shift decimal digits)
    - 492 (shift 'binary' digits)

    That last simply defines as 'A << n' as meaning 'A * 2**n'.

    PS: BTW, I was always wondering why Pascal and Algol 68 supported
    'odd' but not 'even'! - In the documents of the Genie compiler we
    can read: "This is a relic of times long past.", but beyond that
    it doesn't explain why it's a "relic". I can only guess that it's,
    as a special case, considered just unnecessary in the presence of
    the modulus operator.

    Maybe because you can trivially define 'even' as 'not odd'.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Fri Nov 28 15:33:55 2025
    On Fri, 28 Nov 2025 12:45:58 +0100
    David Brown <david.brown@hesbynett.no> wrote:


    I can believe that. If you have to implement floating point routines
    in general integer hardware (and I expect that is the case for most
    of your implementation here) then I would think it is better to start
    and end with the data in GPR's. On some targets, moving data into
    and out of floating point or vector registers is efficient enough
    that those registers can effectively be used as caches, but it sounds
    like that is not the case here.


    On Windows the problem is only of moving data between various types of registers.
    On SysV things are worse: there is also a problem of absence of
    caller-saved FP/SIMD registers. In theory, the problem could have been
    solved by defining specialized ABI for support routines (__addtf3,
    __subtf3, __multf3, etc...), but that was not done either.

    I think, that it all comes from the old mental model of soft floating
    point routines being very slow; so slow that ABI impedance mismatches
    lost in noise. But in specific case of binary128 on modern CPUs, it's
    simply not true - arithmetic itself is quite fast so ABI mismatches are significant.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Nov 28 14:46:08 2025
    On 28/11/2025 11:49, bart wrote:
    On 28/11/2025 02:33, Janis Papanagnou wrote:
    On 11/27/25 18:59, bart wrote:
    On 27/11/2025 17:38, Ike Naar wrote:
    On 2025-11-27, bart <bc@freeuk.com> wrote:
    Well, let's stick with C. Here are some features I use, and the C
    equivalents (A has whatever type is needed):

    ˙˙˙˙ M˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ C
    ˙˙˙˙ -------------------------------------------------------------
    [snip]
    ˙˙˙˙ A.odd˙˙˙˙˙˙˙˙˙˙˙˙˙˙ A & 1, or A % 1

    "A % 1" ?

    I guess A % 2 then.

    You guess? - LOL - okay. :-)

    Note my remark about error proneness later on.

    Higher level abstractions (usually found in higher level languages)
    are always less error prone than low-level (or composed) constructs.

    "C" is inherently and by design a comparably low-level language, so
    I wonder what you complain here about. (You won't change that.)

    So is mine. But it has many more 'commodity' features that make life simpler. Plus a generally cleaner syntax to make it clearer.

    I didn't answer your (JP's) question.

    When I mention such micro-features of mine, the response is always overwhelmingly negative (even if I subsequently reveal they are inspired
    by other languages).

    In this thread, in response to a use-case of small BitInt types, I
    suggested a more general set of bit-operations that didn't involve
    emplying the type system.

    But apparently, even in the world's most famous and truly 'bare-metal'
    systems language, accessing the underlying bits of machine types is a
    rarely used, niche feature.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Fri Nov 28 15:47:41 2025
    On 28/11/2025 14:33, Michael S wrote:
    On Fri, 28 Nov 2025 12:45:58 +0100
    David Brown <david.brown@hesbynett.no> wrote:


    I can believe that. If you have to implement floating point routines
    in general integer hardware (and I expect that is the case for most
    of your implementation here) then I would think it is better to start
    and end with the data in GPR's. On some targets, moving data into
    and out of floating point or vector registers is efficient enough
    that those registers can effectively be used as caches, but it sounds
    like that is not the case here.


    On Windows the problem is only of moving data between various types of registers.
    On SysV things are worse: there is also a problem of absence of
    caller-saved FP/SIMD registers. In theory, the problem could have been
    solved by defining specialized ABI for support routines (__addtf3,
    __subtf3, __multf3, etc...), but that was not done either.

    I think, that it all comes from the old mental model of soft floating
    point routines being very slow; so slow that ABI impedance mismatches
    lost in noise. But in specific case of binary128 on modern CPUs, it's
    simply not true - arithmetic itself is quite fast so ABI mismatches are significant.


    My only real experience with software floating point (using it, not
    writing it) is on systems where they are either slow (like 32-bit
    Cortex-M ARMs), or /very/ slow (like an 8-bit AVR). A little
    inefficiency in the main ABI's is, as you say, just noise in these cases.

    But in those systems, the floating point arithmetic routines were part
    of the compiler support library. Functions there don't have to abide by
    the platform ABI - they can use different registers according to what
    suits best. Were you working on a library that integrates into the
    compiler, or was it more "user level" (like a C++ "binary128" class with operator overrides) ?

    ABI's are obviously useful for standardisation and intermixing of code
    from different tools. But they can also be a pain, especially when they
    are old and outdated or designed to be efficient on different processors
    or with different kinds of code. I am finding the EABI for 32-bit ARM
    to be a serious performance drain for some kinds of work. It doesn't
    support passing anything bigger than 32-bit in registers, except for
    "long long int" and "unsigned long long int". It has the same
    restriction on return values. That means if you have something like a
    C++ optional<uint32_t> type, or equivalent struct in C, it's all passed
    back and forth on the stack. And unlike the AMD processors you mention,
    on a Cortex-M core that is a lot slower!



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From James Kuyper@3:633/10 to All on Fri Nov 28 09:48:06 2025
    On 2025-11-27 12:38, Ike Naar wrote:
    On 2025-11-27, bart <bc@freeuk.com> wrote:
    Well, let's stick with C. Here are some features I use, and the C
    equivalents (A has whatever type is needed):

    M C
    -------------------------------------------------------------
    [snip]
    A.odd A & 1, or A % 1

    "A % 1" ?

    Probably a typo for A % 2.

    Note to bart: A%2 has a value of -1 for odd negative numbers. In many
    contexts (#if, !, &&, ||, ?:, if(), for(), while(), do while(),
    assert(), or static_assert()), all that matters is that it's not equal
    to 0. However, any time you're looking at the actual value, A&1 and A%2
    are not equivalent.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Fri Nov 28 13:09:15 2025
    On 11/27/2025 2:15 PM, David Brown wrote:
    On 27/11/2025 15:02, Michael S wrote:
    On Thu, 27 Nov 2025 14:02:38 +0100
    David Brown <david.brown@hesbynett.no> wrote:



    MSVC compilers compile your code and produce correct result, but the
    code
    looks less nice:
    0000000000000000 <get_exponent>:
    ˙˙˙ 0:˙˙ f2 0f 11 44 24 08˙˙˙˙˙˙ movsd˙ %xmm0,0x8(%rsp)
    ˙˙˙ 6:˙˙ 48 8b 44 24 08˙˙˙˙˙˙˙˙˙ mov˙˙˙ 0x8(%rsp),%rax
    ˙˙˙ b:˙˙ 48 c1 e8 34˙˙˙˙˙˙˙˙˙˙˙˙ shr˙˙˙ $0x34,%rax
    ˙˙˙ f:˙˙ 25 ff 07 00 00˙˙˙˙˙˙˙˙˙ and˙˙˙ $0x7ff,%eax
    ˙˙ 14:˙˙ c3˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ret

    Although on old AMD processors it is likely faster than nicer code
    generated by gcc and clang. On newer processor gcc code is likely a bit
    better, but the difference is unlikely to be detected by simple
    measurements.

    I think it is unlikely that this version - moving from xmm0 to rax via memory instead of directly - is faster on any processor.˙ But I fully
    agree that it is unlikely to be a measurable difference in practice.


    Also MSVC compiler does not like your style and produces following
    warning:
    dave_b.c(5): warning C4116: unnamed type definition in parentheses

    Warnings are a matter of taste.˙ There's nothing wrong with my code, but
    it may be against some code styles.


    BTW, I don't like your style either. My preferred code will look
    very similar to the code of Waldek Hebisch except that I'd declare
    d_to_u() static.
    I don't like union trick. Not just in this particular context, but
    generally. memcpy() much cleaner in expressing programmer's intentions.


    I particularly don't like using unions in compound literals like this
    either - it was just to make a compact demonstration.˙ I'd write real
    code in more re-usable bits with static inline functions.

    I disagree, however, that memcpy() shows intent better.˙ The intention
    is not to copy it to memory - the intention is to access the underlying
    bit representation as a different type.˙ A type-punning union is at
    least, if not more, clear for that purpose (IMHO - and judgements of
    style and clarity are very much a matter of opinion).


    FWIW, BGBCC allows:
    double f;
    u64 uli;
    uli=(u64)((__m64)f);
    And, you can extract an exponent as:
    uli[62:52]
    But, this is pretty nonstandard...


    Here, "val[hi:lo]" works on pretty much any integer type, with the behavior:
    If it is a normal integer type, will return a zero-extended value of the
    same type as the input (so, similar to what shift and mask would do).

    If the input type is a _BitInt or _UBitInt, the result will also be
    _BitInt or _UBitInt with the same width as the bitfield selector.

    It is possible to select a single bit:
    uli[63]
    But, this is only valid for _BitInt and _UBitInt, where if a normal
    integer type it used here, it makes more sense to assume the user had
    mistyped something, so "uli[63:63]" would be needed for a single bit
    extract in this case.

    It is also possible to compose values of _UBitInt and similar, say:
    _UBitInt(24) rgb24;
    _UBitInt(16) rgb5;
    rgb5=(_UBitInt(16)) { 0b0u1, rgb24[23:19], rgb24[15:11], rgb24[7:3] };

    Etc...

    Some of this was partly inspired by Verilog in my case.

    Partial merit in this case is that, beyond just being slightly more
    concise and readable than a more traditional shifts-and-masks approach,
    also makes it easier for the compiler to generate more efficient code...

    Though, for this particular scenario, my ISA has a specialized CPU
    instruction for RGB24 to RGB555 that is faster than manually repacking
    the bits, so alas...


    But, also there are special optional instructions for bitfield moves
    that would allow the above to be expressed in 3 CPU instructions (vs the
    11 or so instructions that would be needed with a more traditional
    approach, or 8 if one gets clever with how they use shifts).

    However, with the syntax, without the special CPU instruction, it can
    infer the 8-instruction construct, and other similar constructs, it
    cases where it might have otherwise been too much mental effort for a
    human programmer (and where inferring this from shifts and masks is also asking too much from the compiler...).

    ...



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Nov 28 19:46:43 2025
    On 28/11/2025 10:41, David Brown wrote:


    ˙ But for
    me, -O2 is generally the sweet spot.˙ I have no real interest in using a compiler that doesn't do decent optimisation - if I am happy with slow
    code, I'll use Python.


    That's like saying that if you can't go at 100mph, you're happy to walk!

    There's no compromise at all?

    I've taken a task (decode JPEG) which uses the same algorithm across
    three languages, and applied it to the same input. These are the
    runtimes, expressed in relative MPH:

    Drive 1 mile:
    gcc -O3 C 108 mph 33s
    gcc -O2 C 100 mph 36s
    mm M 77 mph (my lang) 47s
    bcc C 55 mph (my product) 1m 05s
    tcc C 25 mph 2m 24s
    CPython Python 0.8 mph 1h 15m 00s

    Actually, forget walking: you'd rather crawl on your hands on knees!

    (The figure for PyPy for this task, which has lots of long loops to get
    stuck into, is 19 mph, but the speedup is generally unpredictable.)



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Fri Nov 28 21:58:04 2025
    On 28/11/2025 20:46, bart wrote:
    On 28/11/2025 10:41, David Brown wrote:


    ˙ But for me, -O2 is generally the sweet spot.˙ I have no real
    interest in using a compiler that doesn't do decent optimisation - if
    I am happy with slow code, I'll use Python.


    That's like saying that if you can't go at 100mph, you're happy to walk!

    There's no compromise at all?

    My work is mainly on microcontrollers, where efficient code is critical
    (x86 processors are good at running unoptimised code quickly,
    microcontrollers are not). And some of my work is programming on PC's,
    where it is rarely important - it makes more sense to use a language
    targeting faster development time than faster runtime. (The bulk of the
    time spent when running Python code is in libraries, OS calls, waiting
    for disks, IO, networks, etc.)

    I'm sure plenty of people have use for "medium speed" languages, but I
    don't see it for what I do.

    Actually, the same goes for travelling. I'm happy to go out for a walk,
    but if I am trying to get somewhere at a distance, I'll drive. I've
    never thought "what I really want here to go to the shops is a car with
    a max speed of 30 mph".


    I've taken a task (decode JPEG) which uses the same algorithm across
    three languages, and applied it to the same input. These are the
    runtimes, expressed in relative MPH:

    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ Drive 1 mile:
    ˙ gcc -O3˙ C˙˙˙˙˙˙ 108˙˙ mph˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 33s
    ˙ gcc -O2˙ C˙˙˙˙˙˙ 100˙˙ mph˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 36s
    ˙ mm˙˙˙˙˙˙ M˙˙˙˙˙˙˙ 77˙˙ mph (my lang)˙˙˙˙˙˙˙˙˙˙ 47s
    ˙ bcc˙˙˙˙˙ C˙˙˙˙˙˙˙ 55˙˙ mph (my product)˙˙˙˙ 1m 05s
    ˙ tcc˙˙˙˙˙ C˙˙˙˙˙˙˙ 25˙˙ mph˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 2m 24s
    ˙ CPython˙ Python˙˙˙ 0.8 mph˙˙˙˙˙˙˙˙˙˙˙˙˙ 1h 15m 00s

    Actually, forget walking: you'd rather crawl on your hands on knees!

    (The figure for PyPy for this task, which has lots of long loops to get stuck into, is 19 mph, but the speedup is generally unpredictable.)



    I don't write jpeg decoders on PC's. I very rarely write code that has
    to be fast on a PC. (It has happened occasionally - but usually then I
    use existing fast code like numpy to do the heavy lifting.) On the few occasions that I write C or C++ code on PC's, I use optimisation. For
    one thing, it gives better static error checking. And while I probably
    am not too bothered about the speed differences, it's just hard for me
    to purposefully and pointlessly pessimise code.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Nov 28 22:43:36 2025
    On 28/11/2025 19:09, BGB wrote:

    It is also possible to compose values of _UBitInt and similar, say:
    ˙ _UBitInt(24) rgb24;
    ˙ _UBitInt(16) rgb5;
    ˙ rgb5=(_UBitInt(16)) { 0b0u1, rgb24[23:19], rgb24[15:11], rgb24[7:3] };

    This has given me an idea for an extended feature. Here, I would use
    rgb.[x] syntax, where x is maybe 23, or 23..19, and rgb.[x, y] just
    means rgb.[x].[y].

    The latter is not that useful however; suppose that rgb.[x, y] actually combines rgb.[x] and rgb.[y]. That could then be used to express your
    example like this:

    rgb24.[23..19, 15..11, 7..3]

    So the 3 distinct 5-bit bitfields are concatenated into one 15-bit field.

    However the exact meaning and ordering would still need pinning down,
    and there are various questions to be answered. I also think that such extaction can be a separate feature from packing multiple sub-word
    values into one result.

    I think this might be worth looking at. But I'm still not keen on
    relying on the type system to give you the lengths of those fields. (In
    my language, rgb.[23..19] is extracted into an i64 value so its bitfield
    info is lost.)



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Fri Nov 28 15:23:23 2025
    bart <bc@freeuk.com> writes:
    On 28/11/2025 02:33, Janis Papanagnou wrote:
    [...]
    You can of course add as many commodity features to "your language"
    as you like. I seem to recall that one of the design principles of
    "C" was to not add too many keywords. (Not sure whether 'A.odd' is
    a function or keyword above [in "your language"].)

    It is a reserved word, which means it can't be used as either a
    top-level user identifier, or a member name. With extra effort, it
    could be used for both, but that needs some special syntax, such as
    Ada-style "A'odd"; I've never got around to it.

    In Pascal (where I copied it from) it is a reserved word.

    <OT>In Pascal, "odd" is not a reserved word. It's the name of a
    predefined function.</OT>

    [...]

    PS: BTW, I was always wondering why Pascal and Algol 68 supported
    'odd' but not 'even'! - In the documents of the Genie compiler we
    can read: "This is a relic of times long past.", but beyond that
    it doesn't explain why it's a "relic". I can only guess that it's,
    as a special case, considered just unnecessary in the presence of
    the modulus operator.

    Maybe because you can trivially define 'even' as 'not odd'.

    Or maybe because odd(n) can be implemented as "treat the low-order bit
    of the argument as a Boolean".

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Sat Nov 29 00:08:46 2025
    On 28/11/2025 23:23, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 28/11/2025 02:33, Janis Papanagnou wrote:
    [...]
    You can of course add as many commodity features to "your language"
    as you like. I seem to recall that one of the design principles of
    "C" was to not add too many keywords. (Not sure whether 'A.odd' is
    a function or keyword above [in "your language"].)

    It is a reserved word, which means it can't be used as either a
    top-level user identifier, or a member name. With extra effort, it
    could be used for both, but that needs some special syntax, such as
    Ada-style "A'odd"; I've never got around to it.

    In Pascal (where I copied it from) it is a reserved word.

    <OT>In Pascal, "odd" is not a reserved word. It's the name of a
    predefined function.</OT>

    So what's a 'reserved word' then? To me it is something not available as
    a user-identifier because it has a special meaning in the language,
    which may be that of a predefined function among other things.




    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Sat Nov 29 03:26:49 2025
    On 28/11/2025 12.49, bart wrote:
    On 28/11/2025 02:33, Janis Papanagnou wrote:

    so the natural way of describing them would (IMO) rather be
    based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
    syntax with '%' is probably more "appropriate". Mileages may vary.)

    I've made the mistake with % 1 more than once.

    (If you know in what areas you commonly make mistakes you can
    work on that! - Just a suggestion to think about.)


    You can of course add as many commodity features to "your language"
    as you like. I seem to recall that one of the design principles of
    "C" was to not add too many keywords. (Not sure whether 'A.odd' is
    a function or keyword above [in "your language"].)

    It is a reserved word, which means it can't be used as either a top-
    level user identifier, or a member name. With extra effort, it could be
    used for both, but that needs some special syntax, such as Ada-style "A'odd"; I've never got around to it.

    In Pascal (where I copied it from) it is a reserved word.

    As far as I recall, in Pascal it's a predefined function! - The
    difference is that you cannot use reserved words as identifiers.
    (It's similar, but not necessarily, with keywords; depending on
    the language.)

    That was basically also the background of my explanation; to my
    knowledge "C" didn't want to introduce too many reserved words
    that as a consequence then cannot be used as "language entity"
    names (like variables, function names, etc.) any more. - That's
    why introducing simple high-level functions unnecessarily may be
    deprecated.


    PS: BTW, I was always wondering why Pascal and Algol 68 supported
    'odd' but not 'even'! - In the documents of the Genie compiler we
    can read: "This is a relic of times long past.", but beyond that
    it doesn't explain why it's a "relic". I can only guess that it's,
    as a special case, considered just unnecessary in the presence of
    the modulus operator.

    Maybe because you can trivially define 'even' as 'not odd'.

    But it's the same with 'odd'; you can trivially write it as an
    boolean or as an arithmetic expression, whatever one prefers.

    And that also doesn't explain why 'odd' is considered a "relic"
    by Marcel. (I can only explain that opinion as I've done above.)
    The point in Algol 68 is, though, even more relaxed; since you
    have stropping there the conflicts of keywords with identifiers
    aren't what they are in other languages.

    Janis


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Sat Nov 29 03:32:36 2025
    On 29/11/2025 03.26, Janis Papanagnou wrote:
    ...

    That was basically also the background of my explanation; to my
    knowledge "C" didn't want to introduce too many reserved words
    that as a consequence then cannot be used as "language entity"
    names (like variables, function names, etc.) any more. - That's
    why introducing simple high-level functions unnecessarily may be
    deprecated.

    Please ignore the last sentence. - I was speaking about reserved
    words or keywords and not about function names in the context of
    the paragraph. - So it depends in what way you introduce elements
    like 'odd'. As a "C" function it wouldn't matter much. In case of
    "your language" - where you say it's a keyword! - it would matter,
    though!

    Janis


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Fri Nov 28 19:38:06 2025
    bart <bc@freeuk.com> writes:
    On 28/11/2025 23:23, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 28/11/2025 02:33, Janis Papanagnou wrote:
    [...]
    You can of course add as many commodity features to "your language"
    as you like. I seem to recall that one of the design principles of
    "C" was to not add too many keywords. (Not sure whether 'A.odd' is
    a function or keyword above [in "your language"].)

    It is a reserved word, which means it can't be used as either a
    top-level user identifier, or a member name. With extra effort, it
    could be used for both, but that needs some special syntax, such as
    Ada-style "A'odd"; I've never got around to it.

    In Pascal (where I copied it from) it is a reserved word.
    <OT>In Pascal, "odd" is not a reserved word. It's the name of a
    predefined function.</OT>

    So what's a 'reserved word' then? To me it is something not available
    as a user-identifier because it has a special meaning in the language,
    which may be that of a predefined function among other things.

    Right. The name "odd" is available as a user-defined identifier.
    If you define something named "odd" in Pascal, it hides the
    predefined function of that name.

    You can think of Pascal's predefined functions as being declared
    in an outer scope, surrounding the main program. Pascal's rules
    for declarations in inner scopes hiding identifiers in outer scopes
    are similar to C's.

    (C has no predefined functions.)

    If there's more to say about this, I suggest comp.lang.misc or comp.lang.pascal.misc.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Sat Nov 29 11:24:19 2025
    On 29/11/2025 03:38, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 28/11/2025 23:23, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 28/11/2025 02:33, Janis Papanagnou wrote:
    [...]
    You can of course add as many commodity features to "your language"
    as you like. I seem to recall that one of the design principles of
    "C" was to not add too many keywords. (Not sure whether 'A.odd' is
    a function or keyword above [in "your language"].)

    It is a reserved word, which means it can't be used as either a
    top-level user identifier, or a member name. With extra effort, it
    could be used for both, but that needs some special syntax, such as
    Ada-style "A'odd"; I've never got around to it.

    In Pascal (where I copied it from) it is a reserved word.
    <OT>In Pascal, "odd" is not a reserved word. It's the name of a
    predefined function.</OT>

    So what's a 'reserved word' then? To me it is something not available
    as a user-identifier because it has a special meaning in the language,
    which may be that of a predefined function among other things.

    Right. The name "odd" is available as a user-defined identifier.
    If you define something named "odd" in Pascal, it hides the
    predefined function of that name.

    I did test it with a toy Pascal compiler I have. Defining 'odd' as a
    variable didn't work, but that was for other reasons.


    You can think of Pascal's predefined functions as being declared
    in an outer scope, surrounding the main program.

    I took 'predefined functions' to mean 'built-in functions' (effectively, operators with function-like syntax), that cannot be overridden.

    So 'odd' is not a reserved word in Pascal; I was mistaken.

    (My opinion is that being able to shadow fundamental language features
    is undesirable. Being able to reuse them as user identifiers is another matter, but that would involve tricks with syntax or context to avoid ambiguity.)



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Sat Nov 29 12:24:12 2025
    On 29/11/2025 02:32, Janis Papanagnou wrote:
    On 29/11/2025 03.26, Janis Papanagnou wrote:
    ...

    That was basically also the background of my explanation; to my
    knowledge "C" didn't want to introduce too many reserved words
    that as a consequence then cannot be used as "language entity"
    names (like variables, function names, etc.) any more. - That's
    why introducing simple high-level functions unnecessarily may be
    deprecated.

    Please ignore the last sentence. - I was speaking about reserved
    words or keywords and not about function names in the context of
    the paragraph. - So it depends in what way you introduce elements
    like 'odd'. As a "C" function it wouldn't matter much. In case of
    "your language" - where you say it's a keyword! - it would matter,
    though!


    My syntax actually has a stropping mechanism, but it is applied to user-identifiers.

    That's for when you really want to use a reserved as an identifier, for example if porting code from another language, or machine translation.
    So this is possible:

    int `odd := 3

    if `odd.odd then

    It is also case-preserving (syntax is usually case-insensitive):

    int `int, `INT, `Int # three different variables

    And (I've just discovered this), it be used when identifiers either
    start with a digit, or are numbers:

    int `1234 := 1235

    But this is generally ugly and undesirable; you only do this as a last
    resort. (The feature is most heavily used in machine-generated assembly.)


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Sat Nov 29 14:45:30 2025
    On 29/11/2025 12:24, bart wrote:
    On 29/11/2025 03:38, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 28/11/2025 23:23, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 28/11/2025 02:33, Janis Papanagnou wrote:
    [...]
    You can of course add as many commodity features to "your language" >>>>>> as you like. I seem to recall that one of the design principles of >>>>>> "C" was to not add too many keywords. (Not sure whether 'A.odd' is >>>>>> a function or keyword above [in "your language"].)

    It is a reserved word, which means it can't be used as either a
    top-level user identifier, or a member name. With extra effort, it
    could be used for both, but that needs some special syntax, such as
    Ada-style "A'odd"; I've never got around to it.

    In Pascal (where I copied it from) it is a reserved word.
    <OT>In Pascal, "odd" is not a reserved word.˙ It's the name of a
    predefined function.</OT>

    So what's a 'reserved word' then? To me it is something not available
    as a user-identifier because it has a special meaning in the language,
    which may be that of a predefined function among other things.

    Right.˙ The name "odd" is available as a user-defined identifier.
    If you define something named "odd" in Pascal, it hides the
    predefined function of that name.

    I did test it with a toy Pascal compiler I have. Defining 'odd' as a variable didn't work, but that was for other reasons.


    You can think of Pascal's predefined functions as being declared
    in an outer scope, surrounding the main program.

    I took 'predefined functions' to mean 'built-in functions' (effectively, operators with function-like syntax), that cannot be overridden.

    So 'odd' is not a reserved word in Pascal; I was mistaken.

    (My opinion is that being able to shadow fundamental language features
    is undesirable. Being able to reuse them as user identifiers is another matter, but that would involve tricks with syntax or context to avoid ambiguity.)



    The issue is where you draw the line of what is a "fundamental language feature", and what is not. For Pascal, "begin" is a fundamental
    language feature, part of the syntax. "odd" is not fundamental - it's
    just a function in the Pascal's equivalent of the C standard library.
    So no tricks or special syntax (like "stropping") are needed to re-use
    the identifier for other purposes.

    I agree that using words that are "fundamental" is not good. But if a language provides built-in functions in a global namespace, then it is a serious limitation if these cannot be shadowed or overridden.
    Basically, it means that you are always at risk of conflicts with
    existing code if later language versions add new functions. So if
    someone wrote Pascal code with a local variable called "even", and a
    later version introduced a built-in function "even", then it is critical
    that this is an overrideable or shadowable (if that is a real word!) identifier.

    That's why C is very conservative about adding new keywords, and uses
    reserved namespaces for the purpose - thus C99 added "_Bool", not
    "bool", to avoid conflict with existing code. Only now, over two
    decades later, did the committee feel that uses of the identifier "bool"
    other than as a typedef for _Bool (usually via <stdbool.h>) are so rare
    that C23 could finally have "bool" as a keyword for the type. And they
    still have challenges with good names for standard library functions -
    now in C23, many new ones have names with a "stdc_" prefix.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)