• Nice way of allocating flexible struct.

    From Kaz Kylheku@3:633/10 to All on Wed Oct 8 06:35:28 2025
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    struct S {
    // ...
    T A[];
    };

    Don't do this:

    malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From pozz@3:633/10 to All on Wed Oct 8 09:09:40 2025
    Il 08/10/2025 08:35, Kaz Kylheku ha scritto:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    struct S {
    // ...
    T A[];
    };

    Don't do this:

    malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    struct S {
    unsigned int size;
    unsigned char mode;
    unsigned char array[];
    }

    In a 32-bits integer machine, sizeof(struct S) is 8, because there are 3
    bytes of padding after mode and array is considered empty.

    Now I want to store 9 bytes in array[]. I could use:

    malloc(sizeof(struct S) + 9 * sizeof(unsigned char))=malloc(17)

    or I could use:

    malloc(offsetof(struct S, array[9]))=malloc(14)

    Is the second better (and correct) than the first?

    And another question. Suppose I need an array of struct S. All elements
    have 7-bytes array[] member. How to allocate this array and access each element?

    I think I can't use the first malloc (17), neither the second (14). Both aren't a multiple of alignment length.


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Wed Oct 8 11:09:27 2025
    Am 08.10.2025 um 08:35 schrieb Kaz Kylheku:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    struct S {
    // ...
    T A[];
    };

    Don't do this:

    malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    In a real language:

    #include <iostream>
    #include <optional>
    #include <array>

    using namespace std;

    template<typename T, typename Derived>
    struct flex_base
    {
    T &operator []( size_t i )
    {
    return static_cast<Derived &>( *this ).m_arr[i];
    }
    virtual ~flex_base() {};
    };

    template<typename T, size_t N>
    struct flex_array : flex_base<T, flex_array<T, N>>
    {
    virtual ~flex_array() {};
    private:
    template<typename T, typename Derived>
    friend struct flex_base;
    std::array<T, N> m_arr;
    };


    int main()
    {
    auto &fb = *new flex_array<string, 100>();
    for( size_t i = 0; i != 100; ++i )
    fb[i] = "hello world";
    }

    Somewhat more complicated to declare, but much shorter and
    more readable usage.
    C really sucks.

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Wed Oct 8 11:23:13 2025
    Am 08.10.2025 um 11:09 schrieb Bonita Montero:
    Am 08.10.2025 um 08:35 schrieb Kaz Kylheku:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    ’’ struct S {
    ’’’’ // ...
    ’’’’ T A[];
    ’’ };

    Don't do this:

    ’’ malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    ’’ malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    In a real language:

    #include <iostream>
    #include <optional>
    #include <array>

    using namespace std;

    template<typename T, typename Derived>
    struct flex_base
    {
    ’’’’T &operator []( size_t i )
    ’’’’{
    ’’’’’’’ return static_cast<Derived &>( *this ).m_arr[i];
    ’’’’}
    ’’’’virtual ~flex_base() {};
    };

    template<typename T, size_t N>
    struct flex_array : flex_base<T, flex_array<T, N>>
    {
    ’’’’virtual ~flex_array() {};
    private:
    ’’’’template<typename T, typename Derived>
    ’’’’friend struct flex_base;
    ’’’’std::array<T, N> m_arr;
    };


    int main()
    {
    ’’’’auto &fb = *new flex_array<string, 100>();
    ’’’’for( size_t i = 0; i != 100; ++i )
    ’’’’’’’ fb[i] = "hello world";
    }

    Somewhat more complicated to declare, but much shorter and
    more readable usage.
    C really sucks.

    OMG, I was blind:

    T * new T[N];

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Oct 8 12:53:01 2025
    On Wed, 8 Oct 2025 11:23:13 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 08.10.2025 um 11:09 schrieb Bonita Montero:
    Am 08.10.2025 um 08:35 schrieb Kaz Kylheku:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    ?? struct S {
    ???? // ...
    ???? T A[];
    ?? };

    Don't do this:

    ?? malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    ?? malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    In a real language:

    #include <iostream>
    #include <optional>
    #include <array>

    using namespace std;

    template<typename T, typename Derived>
    struct flex_base
    {
    ????T &operator []( size_t i )
    ????{
    ??????? return static_cast<Derived &>( *this ).m_arr[i];
    ????}
    ????virtual ~flex_base() {};
    };

    template<typename T, size_t N>
    struct flex_array : flex_base<T, flex_array<T, N>>
    {
    ????virtual ~flex_array() {};
    private:
    ????template<typename T, typename Derived>
    ????friend struct flex_base;
    ????std::array<T, N> m_arr;
    };


    int main()
    {
    ????auto &fb = *new flex_array<string, 100>();
    ????for( size_t i = 0; i != 100; ++i )
    ??????? fb[i] = "hello world";
    }

    Somewhat more complicated to declare, but much shorter and
    more readable usage.
    C really sucks.

    OMG, I was blind:

    T * new T[N];

    You don't understand the meaning of the word 'flexible'.
    The whole point of it is that N is unknown at compile time.

    Formally speaking, flexible array members are not supported in
    inferior tongue that you call "real language" although they can be
    emulated and in practice will work with any production-quality
    compiler.
    However, if I am not mistaken, it works just because implementors are
    sane people, rather than because the language itself provides sane
    guarantees.









    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Wed Oct 8 12:09:13 2025
    Am 08.10.2025 um 11:53 schrieb Michael S:

    You don't understand the meaning of the word 'flexible'.

    I understand it, my first solution was o.k. in that sense.
    The usage is much more simple than in C.

    The whole point of it is that N is unknown at compile time.

    Formally speaking, flexible array members are not supported in
    inferior tongue ...

    As you can see from my first post my solution is much more flexible.

    However, if I am not mistaken, it works just because implementors are
    sane people, rather than because the language itself provides sane guarantees.

    C is really dangerous in that sense because you've to flip every bit
    yourself. Better use abstactions you re-use a lot of times. In C there
    almost no complex data strructures at all; like a vector in C++ or a
    unordered map because it would be a large effort to specialize your-
    self that for every data type. Most C projects stick with simple data structures which are less efficient. The "generic" types in C which
    work work callbacks like with qsort() really suck since their perfor-
    mance is better but still not optimal.
    I think all developers who use C today are either forced to stick
    with C though their job or are persons which think mostly on the
    detail level and can't think in abstractions.
    This is programming like in the beginning of the 90s. But today's
    machines are capable to handle more complex requirements and these
    requirements need a more flexible language so that you can handle
    that with less bugs than in a lanugage where you've to do every
    detail by yourself.

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Tobin@3:633/10 to All on Wed Oct 8 12:01:03 2025
    In article <10c52nj$pnn0$1@dont-email.me>, pozz <pozzugno@gmail.com> wrote:

    struct S {
    unsigned int size;
    unsigned char mode;
    unsigned char array[];
    }

    And another question. Suppose I need an array of struct S. All elements
    have 7-bytes array[] member. How to allocate this array and access each >element?

    To get the size, round up offsetof(struct S, array[7]) to a multiple
    of _Alignof(struct S).

    To access an element, I think you will have to determine its offset
    and cast a char pointer.

    -- Richard

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Wed Oct 8 15:59:23 2025
    On 08.10.2025 12:09, Bonita Montero wrote:
    [...]

    C is really dangerous in that sense because you've to flip every bit yourself. Better use abstactions you re-use a lot of times. In C there
    almost no complex data strructures at all; like a vector in C++ or a unordered map because it would be a large effort to specialize your-
    self that for every data type. Most C projects stick with simple data structures which are less efficient. The "generic" types in C which
    work work callbacks like with qsort() really suck since their perfor-
    mance is better but still not optimal.
    I think all developers who use C today are either forced to stick
    with C though their job or are persons which think mostly on the
    detail level and can't think in abstractions.

    This is programming like in the beginning of the 90s.

    I disagree in the historic valuation; abstractions were known and
    used (and asked for) already [long] before. (Even your beloved C++
    came already a decade earlier, and its designer was influenced by
    even older abstraction concepts from the 1960's [Simula].)

    But there certainly always have been developers who stuck to older
    languages with less expressiveness in abstraction; obviously still
    today. About the (strange or also valid) reasons we can speculate.
    I would also speculate that many/most developers can not only think
    in abstractions but know (and can program in) other languages that
    provide abstraction concepts. (Or so I hope.)

    Janis

    But today's
    machines are capable to handle more complex requirements and these requirements need a more flexible language so that you can handle
    that with less bugs than in a lanugage where you've to do every
    detail by yourself.


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Wed Oct 8 15:23:09 2025
    On 2025-10-08, pozz <pozzugno@gmail.com> wrote:
    Il 08/10/2025 08:35, Kaz Kylheku ha scritto:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    struct S {
    // ...
    T A[];
    };

    Don't do this:

    malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    struct S {
    unsigned int size;
    unsigned char mode;
    unsigned char array[];
    }

    In a 32-bits integer machine, sizeof(struct S) is 8, because there are 3 bytes of padding after mode and array is considered empty.

    Now I want to store 9 bytes in array[]. I could use:

    malloc(sizeof(struct S) + 9 * sizeof(unsigned char))=malloc(17)

    Ah well, you can lead an ass to water ...

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Wed Oct 8 15:29:16 2025
    On 2025-10-08, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 08.10.2025 um 08:35 schrieb Kaz Kylheku:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    struct S {
    // ...
    T A[];
    };

    Don't do this:

    malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    In a real language:

    That HackerNews comment I alluded to actually arose in the context
    of C++ code that was using struct hack.

    The size is not know at compile time, so it cannot be a template
    parameter.

    It is also important that the entire object, header plus data,
    is one memory allocation. It is less overhead. Also, given a
    pointer to the structure, the pointer to the flexible array
    data is just a displacement calculation: there is no additional
    pointer to load to get to the data.

    You should be glad you have the technique available in C++.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Oct 8 18:52:17 2025
    On Wed, 8 Oct 2025 09:09:40 +0200
    pozz <pozzugno@gmail.com> wrote:

    And another question. Suppose I need an array of struct S. All
    elements have 7-bytes array[] member. How to allocate this array and
    access each element?

    I think I can't use the first malloc (17), neither the second (14).
    Both aren't a multiple of alignment length.


    That's very good question.
    IMHO, the best practical answer is: don't!

    I mean, don't use structures with flexible array members except as
    individual objects or as *last* field of other structure.

    If you feel that the data structure that looks like array of structures
    with FAM is the best fit for your requirements or if it happens to be
    the structure that matches an external layout that you have to deal
    with then handle the situation in imperative rather than declarative
    manner. Manually impose headers and data arrays on array of char.

    If you feel that alignment can cause troubles then don't hesitate to
    memcpy into and out of local variable. Don't fall into trap of
    premature optimization.




    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Oct 8 19:04:25 2025
    On Wed, 8 Oct 2025 15:23:09 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:


    Ah well, you can lead an ass to water ...



    IMHO, your sarcasm is unwarranted. Read the whole post of pozz.
    I seems to me that [in the first half of his post] pozz cares about
    things that are not worth carrying (few more or few less byte requested
    from malloc, where in practice malloc rounds requested size up at least
    to a multiple of 8, but more likely of 16), but it is obvious that he
    fully understood your earlier suggestion.



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Wed Oct 8 11:33:45 2025
    On 10/8/2025 1:35 AM, Kaz Kylheku wrote:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    struct S {
    // ...
    T A[];
    };

    Don't do this:

    malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    This is assuming offsetof and can deal with general expressions (vs just
    field names). IIRC, it is only required to work with field names (and
    with plain structs).



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Wed Oct 8 12:29:44 2025
    On 10/8/2025 8:59 AM, Janis Papanagnou wrote:
    On 08.10.2025 12:09, Bonita Montero wrote:
    [...]

    C is really dangerous in that sense because you've to flip every bit
    yourself. Better use abstactions you re-use a lot of times. In C there
    almost no complex data strructures at all; like a vector in C++ or a
    unordered map because it would be a large effort to specialize your-
    self that for every data type. Most C projects stick with simple data
    structures which are less efficient. The "generic" types in C which
    work work callbacks like with qsort() really suck since their perfor-
    mance is better but still not optimal.
    I think all developers who use C today are either forced to stick
    with C though their job or are persons which think mostly on the
    detail level and can't think in abstractions.

    This is programming like in the beginning of the 90s.

    I disagree in the historic valuation; abstractions were known and
    used (and asked for) already [long] before. (Even your beloved C++
    came already a decade earlier, and its designer was influenced by
    even older abstraction concepts from the 1960's [Simula].)

    But there certainly always have been developers who stuck to older
    languages with less expressiveness in abstraction; obviously still
    today. About the (strange or also valid) reasons we can speculate.
    I would also speculate that many/most developers can not only think
    in abstractions but know (and can program in) other languages that
    provide abstraction concepts. (Or so I hope.)


    While a higher level language might be nice sometimes...
    C++ is kind of a trash fire.

    Personally would rather have something more like a C/C# hybrid.
    Or, sorta like C# but without the need for a garbage collector.

    Also a language where one can get a usable implementation in a "sane"
    level of effort. For C++, for a compiler written by an individual, it is
    only really reasonable to get to a subset, like roughly early 90s C++
    (with little/no chance of something like STL or Boost working).

    Only real reason to deal with it is that some people around seem to
    think C++ is a good idea, and have used it to write software.



    Granted, a few of my own language design attempts ended up with a
    different mess:
    T foo; //default / global lifespan for objects
    T! foo; //automatic
    T^ foo; //reference counted
    T(Z) foo; //zoned

    Where "T!" Local scope only, for certain patterns:
    Foo! foo(); //local constructor
    Foo foo = new! Foo(); //basically the same.
    Or, as a member, but may only be initialized in the constructor (else 'final').

    Unlike the others, "T^" would need to be preserved in all uses of the ref-counted object (converting to 'T' would lose the refcounting).

    These could replace explicit new/delete with still needing to be careful
    about how objects are used.


    Full GC is undesirable because basically no one has managed to avoid the
    issue of timing and performance instabilities. Almost invariably,
    performance stutters at regular intervals, or the program periodically
    stalls (and at least over the past 30 years, it seems no one has
    entirely avoided this issue).

    Refcounting is also bad for performance, but typically the timing issues
    from refcounting is orders of magnitude smaller (and mostly cases where
    some large linked list or similar has its refcount drop to 0 and is
    freed, which in turn results in walking and freeing every member of the
    list).

    Zones can work OK, but depend some on being able to divide up the
    program into discrete sections, at which point all memory in a given
    category can be freed. Not all programs fit this pattern. Destroying a
    zone can also be expensive (as it goes and frees all heap objects
    associated with that zone).

    ...


    Still, not really anything that fully replaces C though.
    One merit of C being, that it is C, and so all existing C code works in it.

    Both C++ and other extended C dialects can have the advantage of being backwards compatible with existing C code (though for C++, "only
    sorta"); but the drawback of also being "crap piled onto C", inherently
    more messy and crufty than a "clean" language could have been.


    Where, the further one deviates from C, the more pain it is to move code
    back and forth between languages.


    But, a language resembling a C/C# hybrid could be close enough that the
    pain isn't too unreasonable; and could sorta still resemble C++ as well;
    but be a whole lot easier for writing the compiler.

    Like, one can throw out the whole mess that is dealing with Multiple-Inheritance and all of the tweaky and weird ways that class
    instances can be structured (direct inheritance vs virtual inheritance,
    friend classes, ... and all of the wacky effects these can have on
    object layout).


    Comparably, at least with Single-Inheritance and interfaces, class
    layouts tend to be append-only. Interfaces are still special, but easier
    to deal with (the interface can merely append its vtable pointer to the
    end of the existing object).

    so, say:
    class Foo:IBaz1 {
    public int x, y;
    }
    class Bar:Foo,IBaz2 {
    public int z;
    }

    Might look like, for Bar, say:
    Bar-VT
    x
    y
    IBaz1-VT
    z
    IBaz2-VT

    Also, it simplifies things if class instances are always by reference
    and never by value. So, structs retain the by value use-case, with
    structs being disallowed from having interfaces or virtual methods or supporting inheritance (which can be the exclusive domain of class objects).

    ...



    Janis

    But today's
    machines are capable to handle more complex requirements and these
    requirements need a more flexible language so that you can handle
    that with less bugs than in a lanugage where you've to do every
    detail by yourself.


    One can also use JavaScript or Python, but there are reasons why these
    are only used in some areas and not in others.


    My compiler can also (more or less) natively compile JavaScript (and I
    could maybe add a Python variant, if I really wanted).

    A could in theory also revive a past dialect of my script language
    (which kinda resembled ActionScript 3.0) and compile this natively.

    But, there are reasons one would not want to write, say, kernel-level of firmware level code, in JavaScript or Python.


    Though, there are also cases where you would use a language like JS, and
    C would make little sense.

    Say, embedding blobs of C code into HTML would arguably have been a
    worse option.

    ...



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Wed Oct 8 19:36:54 2025
    Am 08.10.2025 um 15:59 schrieb Janis Papanagnou:

    I disagree in the historic valuation; abstractions were known and
    used (and asked for) already [long] before. ...

    Compared to C++ they're almost not possible in C compared to mordern
    languages.



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Wed Oct 8 19:51:03 2025
    On 08.10.2025 19:36, Bonita Montero wrote:
    Am 08.10.2025 um 15:59 schrieb Janis Papanagnou:

    I disagree in the historic valuation; abstractions were known and
    used (and asked for) already [long] before. ...

    Compared to C++ they're almost not possible in C compared to mordern languages.

    No doubt. (My statement was just addressing the historic aspect of
    your post.)

    Janis


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Wed Oct 8 21:04:27 2025
    On 08.10.2025 19:29, BGB wrote:
    On 10/8/2025 8:59 AM, Janis Papanagnou wrote:

    While a higher level language might be nice sometimes...
    C++ is kind of a trash fire.

    I'm not familiar with the meaning of the term "trash fire".
    (If it's important to understand your post please explain.)

    I can say a lot concerning C++; both, pros and cons. (But
    not here and not now.)

    [...]

    Only real reason to deal with it is that some people around seem to
    think C++ is a good idea, and have used it to write software.

    Well, I certainly don't think it's a bad idea; far from bad.

    And back then, when I was seeking for a HLL with OO support,
    and C++ became available - and even widely accepted - I was
    quite grateful to be able to use it professionally.


    Granted, a few of my own language design attempts ended up with a
    different mess: [...]

    A sensibly defined language isn't something easily to create
    or obtain! - Personally I'd have appreciated it more if more
    designers of "own languages" have oriented their designs on
    sensible existing and proven concepts. - There may be a
    "market" for all these "own languages", I don't know, but I
    also don't care much, given what I've seen or heard of yet.
    (This isn't meant to be offensive, just to be clear, only
    that I don't care much. As compiler writers don't care much
    what I think.)

    [ attempt for a discussion on features of "own language"
    snipped; not my business ]

    Full GC is undesirable because basically no one has managed to avoid the issue of timing and performance instabilities. Almost invariably,
    performance stutters at regular intervals, or the program periodically
    stalls (and at least over the past 30 years, it seems no one has
    entirely avoided this issue).

    Well, some languages have no GC at all. Others even support
    a couple of functions to control GC on various levels. It
    may be triggered manually (on items, classes, or ranges),
    or automatically (on demand, or depending on conditions; it
    may depend on memory, time, heuristics, statistical behavior).

    Pick your language depending on your projects demands.


    Refcounting is also bad for performance, but typically the timing issues
    from refcounting is orders of magnitude smaller (and mostly cases where
    some large linked list or similar has its refcount drop to 0 and is
    freed, which in turn results in walking and freeing every member of the list).

    Tailor your application and language choice on the projects'
    requirements.

    [...]

    Still, not really anything that fully replaces C though.
    One merit of C being, that it is C, and so all existing C code works in it.

    (On a larger time scale that seems not to match my observation.
    But okay, never mind.)


    Both C++ and other extended C dialects can have the advantage of being backwards compatible with existing C code (though for C++, "only
    sorta"); but the drawback of also being "crap piled onto C", inherently
    more messy and crufty than a "clean" language could have been.

    Are you talking here about the relation of "C" with C++?

    I certainly agree to what a "clean language" can be.

    My opinion on that is, though, that the "C" base of C++ is part of
    the problem. Which doesn't let it appear to me "C" to be "better"
    than C++, but that the "C" base is part of C++'s problem. (Here
    I'm not speaking about "C++"'s own problems that probably entered
    about with C++0x/C++11, IMO. - Mileages certainly vary.)

    [...]

    Like, one can throw out the whole mess that is dealing with Multiple-Inheritance

    Well, when I started with C++ there wasn't multiple-inheritance
    available. Personally thinking its omission would be a mistake;
    I missed it back these day.

    I'm not sure what "mess" you have in mind. - Explicit qualification
    isn't a hindrance. Weakening the independence of classes in complex
    multi-level class-topologies is something under control of the
    program designer. - So it's fine to have it with all design options
    it opens.

    and all of the tweaky and weird ways that class
    instances can be structured (direct inheritance vs virtual inheritance,

    I'm not sure what you are thinking here. - It's a notation to avoid
    duplicate inclusions across "converging hierarchies".

    friend classes, ... and all of the wacky effects these can have on
    object layout).

    Well, back then I wasn't a "friend" of the 'friend' feature. But it
    also didn't stress me in any way. (The only aspect I was concerned
    about a bit here was the uncontrolled access to class details; yet
    it's under the programmer's control.)


    Comparably, at least with Single-Inheritance and interfaces, [...]

    This insight came later. (Was it Java that served as paragon? I only
    seem to recall that the GNU compiler suite supported C++ 'interfaces'
    at some time; was it the late 1990's ?)

    [...]

    Also, it simplifies things if class instances are always by reference
    and never by value. So, structs retain the by value use-case, with
    structs being disallowed from having interfaces or virtual methods or supporting inheritance (which can be the exclusive domain of class
    objects).

    Well, I can only say that it was nice to use objects ("instances")
    in an orthogonal way like other [primitive, built-in] object entities.

    (I knew the concept of "ref-only" [for class objects] from Simula.
    But this distinction was something I never considered a nice concept.)

    Janis

    [...]


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Wed Oct 8 20:05:23 2025
    On 2025-10-08, Michael S <already5chosen@yahoo.com> wrote:
    On Wed, 8 Oct 2025 15:23:09 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:


    Ah well, you can lead an ass to water ...



    IMHO, your sarcasm is unwarranted. Read the whole post of pozz.
    I seems to me that [in the first half of his post] pozz cares about
    things that are not worth carrying (few more or few less byte requested
    from malloc, where in practice malloc rounds requested size up at least
    to a multiple of 8, but more likely of 16), but it is obvious that he
    fully understood your earlier suggestion.

    Fair enough.

    My view is that applying sizeof to a structure with a flexible array
    member is foolhardy, especially in the context when we want to allocate
    space for that array and use it.

    That is to say the only situation in which it makes sense is if
    we need an instance of the structure which doesn't use the array.

    Even then, it won't hurt us to only allocate offsetof(S, A),
    since we don't touch A.

    Only if we are allocating an array of such structures (which won't be
    using the flexible arrays of any of them other than possibly the last
    one) does the size come into play. The padding is then necessary for the
    usual reason so that the members are correctly aligned in every element
    of the array of structs.

    The wording about the padding changed between the C99 draft, C99 and a
    later standard. IIRC, C99 appeared to give the requirement that the
    size of the struct had to be the offset of the array, and that was
    backpedaled out.

    The important thing is that the padding bears no relation to the
    flexible array. Under no circumstances do you want to pretend
    that &S + 1 is a suitable pointer for the base of the array,
    rather S.A. &S + 1, which includes the padding, might not be
    suitably aligned for the element type of A.

    It is not necessary that offseotf(S, A[n]) calculation produce
    a value that is at least as large as sizeof(S). If you need the array to
    be [1] or [2] and that happens to fid into the padding, with padding
    left over, that's okay.

    You do not want to use sizeof(S) as the base for calculating
    the necessary space.

    If you allocate sizeof(S) + N * sizeof(T) where T is the element type,
    but then correctly access P->A[0] through P->A[N-1], you have
    over-allocated: your allocation includes unnecessary pading after
    A[N-1].

    I would like to see a compiler option which diagnoses when sizeof is
    applied to an structure which ends in a flexible array member, or a
    structure which has such a structure as its last member directly or, recursively.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Wed Oct 8 13:35:05 2025
    On 10/7/2025 11:35 PM, Kaz Kylheku wrote:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    struct S {
    // ...
    T A[];
    };

    Don't do this:

    malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    For some god damn reason its raising memories of an older region
    allocator I mocked up in C:

    Still on pastebin. funny:

    https://groups.google.com/g/comp.lang.c/c/H_p2Ki5JhYU/m/rlSzqJsxCQAJ

    https://pastebin.com/raw/f37a23918
    (no ads, raw text)

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Wed Oct 8 13:36:13 2025
    On 10/8/2025 1:35 PM, Chris M. Thomasson wrote:
    On 10/7/2025 11:35 PM, Kaz Kylheku wrote:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:

    Given:

    ’’ struct S {
    ’’’’ // ...
    ’’’’ T A[];
    ’’ };

    Don't do this:

    ’’ malloc(offsetof(S, A) + n * sizeof (T));

    But rather this:

    ’’ malloc(offsetof(S, A[n]));

    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.


    For some god damn reason its raising memories of an older region
    allocator I mocked up in C:

    Still on pastebin. funny:

    https://groups.google.com/g/comp.lang.c/c/H_p2Ki5JhYU/m/rlSzqJsxCQAJ

    https://pastebin.com/raw/f37a23918
    (no ads, raw text)

    Strange. For mock up alignment:

    #define RALLOC_ALIGN_OF(mp_type) \
    offsetof( \
    struct { \
    char pad_RALLOC_ALIGN_OF; \
    mp_type type_RALLOC_ALIGN_OF; \
    }, \
    type_RALLOC_ALIGN_OF \
    )

    ;^)

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Wed Oct 8 14:57:46 2025
    BGB <cr88192@gmail.com> writes:
    On 10/8/2025 1:35 AM, Kaz Kylheku wrote:
    Jonas Lund of https://whizzter.woorlic.org/ mentioned this
    trick in a HackerNews comment:
    Given:
    struct S {
    // ...
    T A[];
    };
    Don't do this:
    malloc(offsetof(S, A) + n * sizeof (T));
    But rather this:
    malloc(offsetof(S, A[n]));
    It's easy to forget that the second argument of offsetof is a
    designator, not simply a member name.

    This is assuming offsetof and can deal with general expressions (vs
    just field names). IIRC, it is only required to work with field names
    (and with plain structs).

    I just read that part of the standard, and it's not clear whether
    the second argument to offsetof() has to be a member name or whether
    it can be something more elaborate.

    Quoting the N3096 draft of C23, 7.21:

    offsetof(type, member-designator)

    which expands to an integer constant expression that has type
    `size_t`, the value of which is the offset in bytes, to the
    subobject (designated by *member-designator*), from the beginning
    of any object of type *type*. The type and member designator
    shall be such that given

    static type t;

    then the expression &(t. *member-designator*) evaluates to
    an address constant. If the specified *type* defines a new
    type or if the specified member is a bit-field, the behavior
    is undefined.

    The requirements imply that the type can be a struct or a union.

    The term "member designator" is not used elsewhere in the standard.
    If the term to be taken literally, then it has to designate a
    *member*, not an element of a member. But the term "subobject",
    along with the address constant requirement, could imply that it
    could be an arbitrary sequence of members and array elements.

    But in addition to that, in Kaz's example, n is not a constant
    expression, so `&(t.member-designator)` is not an address constant
    and therefore `offsetof(S, A[n])` has undefined behavior.

    Every compiler I've tried handles this "correctly", and I tend to
    think that a compiler would have to go out of its way not to do so.
    I'd like to see a future standard make offsetof more flexible,
    with defined behavior for cases like this.

    The C99 Rationale shows these possible definitions:

    (size_t)&(((s_name*)0)->m_name)

    (size_t)(char*)&(((s_name*)0)->m_name)

    which, if they work, should handle Kaz's example correctly.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Thu Oct 9 01:39:36 2025
    On 2025-10-08, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    But in addition to that, in Kaz's example, n is not a constant
    expression, so `&(t.member-designator)` is not an address constant
    and therefore `offsetof(S, A[n])` has undefined behavior.

    Great; I'd like to hear reasons to avoid it so I don't look foolish
    for having overlooked it for manytyears. :)

    Every compiler I've tried handles this "correctly", and I tend to

    I'm sure I've seen foo.bar expressions on the right of an offsetof,
    but those still yield constants.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Wed Oct 8 22:25:06 2025
    On 10/8/2025 8:39 PM, Kaz Kylheku wrote:
    On 2025-10-08, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    But in addition to that, in Kaz's example, n is not a constant
    expression, so `&(t.member-designator)` is not an address constant
    and therefore `offsetof(S, A[n])` has undefined behavior.

    Great; I'd like to hear reasons to avoid it so I don't look foolish
    for having overlooked it for manytyears. :)

    Every compiler I've tried handles this "correctly", and I tend to

    I'm sure I've seen foo.bar expressions on the right of an offsetof,
    but those still yield constants.


    I think it is a case of, it is not required to work...

    But, if the typical implementation is something like, say:
    #define offsetof(T, M) ((long)(&(((T *)0)->M)))

    It is probably going to work without issue.


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Wed Oct 8 22:49:37 2025
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:
    On 10/8/2025 8:59 AM, Janis Papanagnou wrote:

    While a higher level language might be nice sometimes...
    C++ is kind of a trash fire.

    I'm not familiar with the meaning of the term "trash fire".
    (If it's important to understand your post please explain.)

    I can say a lot concerning C++; both, pros and cons. (But
    not here and not now.)


    https://en.wiktionary.org/wiki/trash_fire


    [...]

    Only real reason to deal with it is that some people around seem to
    think C++ is a good idea, and have used it to write software.

    Well, I certainly don't think it's a bad idea; far from bad.

    And back then, when I was seeking for a HLL with OO support,
    and C++ became available - and even widely accepted - I was
    quite grateful to be able to use it professionally.


    Throughout much of my life, C++ has been around, but using it has often
    turned into a footgun. Early on the code had a bad habit of breaking
    from one compiler version to another, or the ability to compile C++ code
    in general would be broken (primarily with Cygwin and MinGW; where
    whether or not "g++" worked on a given install attempt, or with a given program, was very hit or miss).

    By the time I switched mostly to MSVC on Windows, I kept running into
    other issues that made C++ a less desirable option (such as being harder
    to process with automated tools, being harder to do any sort of FFI generation, etc).

    Or, later, that going much beyond a limited subset (like EC++ but with namespaces and similar re-added) is a problem (and that pretty much no
    real C++ code works with an EC++ style subset).



    In most cases, it left C as a more preferable option.
    C can be made to do the same stuff at similar performance, with often
    only minimal difference in expressive power.

    And, the main "powerful" tool of C++, templates, tending to do bad
    things to build times and result in excessive code bloat.


    And, if one tries to avoid C++'s drawbacks, the result was mostly code
    that still looks mostly like C.

    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.



    The only real exception I have found to this rule basically being in
    relation to some features I have borrowed from languages like GLSL and Verilog. But, some of this stuff isn't so much making the language
    "higher level" as much as "being easier to map to ISA features and
    optimize".

    Say:
    vd[62:52]=vs[20:10];
    Being easier to optimize than, say:
    vd=(vd&(~(2047ULL<<52)))|(((vs>>10)&2047ULL)<<52);

    Though, Verilog itself, not so much... Works well in an ASIC or FPGA,
    not so much on a CPU.

    Though, as can be noted:
    Bit-ranges are required to be constant at compile time;
    When used with normal integer types, both bounds are required.


    OTOH, GLSL offers nice and efficient ways to deal with SIMD.
    Well, and also having some types for bit-preserving casts.
    Or ability to specify endianess and alignment for individual struct members. ...



    Granted, a few of my own language design attempts ended up with a
    different mess: [...]

    A sensibly defined language isn't something easily to create
    or obtain! - Personally I'd have appreciated it more if more
    designers of "own languages" have oriented their designs on
    sensible existing and proven concepts. - There may be a
    "market" for all these "own languages", I don't know, but I
    also don't care much, given what I've seen or heard of yet.
    (This isn't meant to be offensive, just to be clear, only
    that I don't care much. As compiler writers don't care much
    what I think.)


    Yeah.

    They have either tended to not amount to much, or converged towards more conventional languages.



    [ attempt for a discussion on features of "own language"
    snipped; not my business ]

    Full GC is undesirable because basically no one has managed to avoid the
    issue of timing and performance instabilities. Almost invariably,
    performance stutters at regular intervals, or the program periodically
    stalls (and at least over the past 30 years, it seems no one has
    entirely avoided this issue).

    Well, some languages have no GC at all. Others even support
    a couple of functions to control GC on various levels. It
    may be triggered manually (on items, classes, or ranges),
    or automatically (on demand, or depending on conditions; it
    may depend on memory, time, heuristics, statistical behavior).

    Pick your language depending on your projects demands.


    ...



    Refcounting is also bad for performance, but typically the timing issues
    from refcounting is orders of magnitude smaller (and mostly cases where
    some large linked list or similar has its refcount drop to 0 and is
    freed, which in turn results in walking and freeing every member of the
    list).

    Tailor your application and language choice on the projects'
    requirements.


    Some amount of my stuff recently has involved various niche stuff.
    Interfacing with hardware;
    Motor controls;
    Implementing things like an OpenGL back-end or similar;
    Being used for a Boot ROM and OS kernel;
    Sometimes neural nets.

    Few traditional languages other than C work well at a lot of this.


    A usual argued weakness of C is that it requires manual memory
    management. But, OTOH, you *really* don't want a GC in motor controls or
    an OS kernel or similar.

    Like, if the GC triggers, and an interrupt handler happens at a bad
    time, then you have a problem.

    Or, if you have a 1us timing tolerance for motor controls and this gets
    blown because the GC takes 75ms, etc...

    Though, in some contexts, ref-counting may still be too big of an ask.



    Typically, one may also need to deal with things like a hardware memory
    map or possibly dealing with manual translation between multiple address spaces via page-table walking, etc.

    So, language design should not preclude this stuff.


    Some features are useful in some contexts but not others:
    For example, "__int128" is very helpful when writing FPU-emulation code
    for Binary128 handling, but has a lot fewer use-cases much beyond this.

    Or, like:
    exp=vala[126:112]; //extract exponent
    fra=(_BitInt(128)) { 0x0001i16, vala[111:0]}; //extract fraction
    ...

    But, then it becomes a drawback if one needs #ifdef's to deal with more
    normal C compilers.



    [...]

    Still, not really anything that fully replaces C though.
    One merit of C being, that it is C, and so all existing C code works in it.

    (On a larger time scale that seems not to match my observation.
    But okay, never mind.)


    Dunno about the far future, but technically both C and C++ have been
    around longer than I have.

    Like, basically a whole lifetime of mostly PC's running Windows, and
    software mostly being written in a mix of C and C++.

    Java and C# rose to try to dethrone them, but ultimately the status quo prevailed.

    Maybe C will be around indefinitely for all I know.


    Like, the passage of time still hasn't totally eliminated FORTRAN and
    COBOL. And, C is far more commonly used than either.

    Unless maybe something can come along that is a better C than C...




    Both C++ and other extended C dialects can have the advantage of being
    backwards compatible with existing C code (though for C++, "only
    sorta"); but the drawback of also being "crap piled onto C", inherently
    more messy and crufty than a "clean" language could have been.

    Are you talking here about the relation of "C" with C++?


    I was thinking some of languages that exist as C supersets:
    C++ (mostly, for C89);
    Objective-C;
    ...

    Vs languages that sorta resemble C but are different:
    C#
    Java
    ...

    Or, more distantly related:
    JavaScript, ActionScript, HaXE, ...



    Or, my compiler having an extended C dialect, which depending on how it
    is used, could be considered either C or a C superset.

    Can be made to look a fair bit like one of my other languages if you use #define to remap a lot of the keywords into not having __ prefixes.

    There are hard limits on practicality:
    If normal C code doesn't compile, its usefulness would be greatly
    diminished.


    I certainly agree to what a "clean language" can be.

    My opinion on that is, though, that the "C" base of C++ is part of
    the problem. Which doesn't let it appear to me "C" to be "better"
    than C++, but that the "C" base is part of C++'s problem. (Here
    I'm not speaking about "C++"'s own problems that probably entered
    about with C++0x/C++11, IMO. - Mileages certainly vary.)


    Possibly.


    A new C-like language need not necessarily be strictly C based.

    My thinking would be likely keeping a similar basic syntax though,
    though likely more syntactically similar to C#, but retaining more in
    terms of implementation with C and C++.

    Would likely simplify or eliminate some infrequently used features in C.

    Possibly:
    Preprocessor, still exists, but its role is reduced.
    Its role can be partly replaced by compiler metadata.
    Trigraphs and digraphs: Gone;
    K&R style declarations, also gone;
    Parser should not depend on previous declarations;
    Non trivial types and declarator syntax: Eliminate;
    ...

    Possibly:
    Pointers and arrays can be specified on the type rather than declarator
    (so, more like C# here)
    ...

    But, as I see it, drastically changing the syntax (like in Go or Rust)
    is undesirable. Contrast, say, C# style syntax was more conservative.


    Though, the harder problem here isn't necessarily that of designing or implementing it, but more in how to make its use preferable to jus
    staying with C.


    One merit is if code can be copy-pasted, but if one has to change all instances of:
    char *s0, *s1;
    To:
    char* s0, s1;

    Well, this is likely to get old, unless it still uses, or allows C style declaration syntax in this case.

    Java and C# had made 'char' 16-bit, but I now suspect this may have been
    a mistake. It may be preferable instead keep 'char' as 8 bits and make
    UTF-8 the default string format. In the vast majority of cases, strings
    hold primarily or entirely ASCII characters.

    Also, can probably have a string type:
    string str="Some String";
    But, then allow that string is freely cast to "char*", ...

    Well, and that the underlying representation of a string is still as a
    pointer into a string-table or similar.


    Also the design of the standard library should remain conservative and
    not add piles of needless wrappers or cruft.



    [...]

    Like, one can throw out the whole mess that is dealing with
    Multiple-Inheritance

    Well, when I started with C++ there wasn't multiple-inheritance
    available. Personally thinking its omission would be a mistake;
    I missed it back these day.

    I'm not sure what "mess" you have in mind. - Explicit qualification
    isn't a hindrance. Weakening the independence of classes in complex multi-level class-topologies is something under control of the
    program designer. - So it's fine to have it with all design options
    it opens.


    There is both implementation complexity of MI, and also some added
    complexity with using it. The complexity gets messy.


    The SI + Interfaces model can reduce both.
    Granted, these can grow their own warts (like default methods or
    similar), but arguably still not as bad as MI.



    and all of the tweaky and weird ways that class
    instances can be structured (direct inheritance vs virtual inheritance,

    I'm not sure what you are thinking here. - It's a notation to avoid
    duplicate inclusions across "converging hierarchies".


    I am more thinking from the perspective of implementing a compiler.

    The single inheritance model is far simpler to deal with:
    Object layout can be made append only.


    With MI, you have:
    Cases where append is used;
    Cases where concatenation is used between superclasses;
    Cases where a superclass instance is replaced with a pointer to
    elsewhere in the object if not the first instance within the object;
    Cases where, due to the former, object layout differs between the parent
    and child class because a the first virtually-inherited instance in the
    parent class is no longer the first instance in the child class;
    ...

    Then, you also have to deal with all of this possibly being handled as a value-type object (with object copying needing to set up any internal
    pointers to the correct locations); ...

    Easier for the compiler to not need to deal with any of this.


    A compiler can sort of approximate it by treating MI cases as SI classes containing each base class as a hidden member (as an object reference), but: This doesn't match the existing ABIs;
    Trying to pass such an object by value will still require dealing with
    copying a tree of objects;
    Virtual inheritance still means one can't just call the copy logic for
    each parent class when copying a derived class;
    Code may exist which assumes the ability to use "sizeof()" and
    "memcpy()" on MI classes (without stuff exploding).
    ...




    friend classes, ... and all of the wacky effects these can have on
    object layout).

    Well, back then I wasn't a "friend" of the 'friend' feature. But it
    also didn't stress me in any way. (The only aspect I was concerned
    about a bit here was the uncontrolled access to class details; yet
    it's under the programmer's control.)


    Like MI, it isn't that bad for the programmer using the language.
    But, it is another thorn for anyone who wants to write their own compiler.



    Comparably, at least with Single-Inheritance and interfaces, [...]

    This insight came later. (Was it Java that served as paragon? I only
    seem to recall that the GNU compiler suite supported C++ 'interfaces'
    at some time; was it the late 1990's ?)


    I think Java popularized it, and C# followed along.
    Languages like ActionScript kept a Java-like model.


    [...]

    Also, it simplifies things if class instances are always by reference
    and never by value. So, structs retain the by value use-case, with
    structs being disallowed from having interfaces or virtual methods or
    supporting inheritance (which can be the exclusive domain of class
    objects).

    Well, I can only say that it was nice to use objects ("instances")
    in an orthogonal way like other [primitive, built-in] object entities.

    (I knew the concept of "ref-only" [for class objects] from Simula.
    But this distinction was something I never considered a nice concept.)


    It isn't as nice for the person using the language.
    But it is nice for the compiler.

    Eliminating both MI and by-value classes eliminates a whole lot of stuff
    that the compiler no longer needs to deal with.



    Janis

    [...]



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Oct 10 01:13:53 2025
    On 09/10/2025 04:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:

    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.



    The only real exception I have found to this rule basically being in relation to some features I have borrowed from languages like GLSL and Verilog. But, some of this stuff isn't so much making the language
    "higher level" as much as "being easier to map to ISA features and optimize".

    Say:
    ’ vd[62:52]=vs[20:10];
    Being easier to optimize than, say:
    ’ vd=(vd&(~(2047ULL<<52)))|(((vs>>10)&2047ULL)<<52);

    Using special bit-features makes it easier to generate decent code for a simple compiler.

    But gcc for example has no trouble optimising that masking/shifting version.

    (It can do it in four x64 instructions, whereas I need nine working from vd.[62..52] := vs.[20..10]. It could be improved though; I don't need to extract the data to bits 10..0 first for example.)

    The main advantage is that it is a LOT easier to write, read and
    understand. The C would need macros to make it practical.


    Though, Verilog itself, not so much... Works well in an ASIC or FPGA,
    not so much on a CPU.

    Though, as can be noted:
    ’ Bit-ranges are required to be constant at compile time;
    ’ When used with normal integer types, both bounds are required.

    I can handle some variable elements, but it gets rapidly complicated. At
    some point it needs to use library functions to do the work.


    OTOH, GLSL offers nice and efficient ways to deal with SIMD.
    Well, and also having some types for bit-preserving casts.
    Or ability to specify endianess and alignment for individual struct
    members.
    ...



    Granted, a few of my own language design attempts ended up with a
    different mess:’ [...]

    A sensibly defined language isn't something easily to create
    or obtain! - Personally I'd have appreciated it more if more
    designers of "own languages" have oriented their designs on
    sensible existing and proven concepts. - There may be a
    "market" for all these "own languages", I don't know, but I
    also don't care much, given what I've seen or heard of yet.
    (This isn't meant to be offensive, just to be clear, only
    that I don't care much. As compiler writers don't care much
    what I think.)


    Yeah.

    They have either tended to not amount to much, or converged towards more conventional languages.



    [ attempt for a discussion on features of "own language"
    ’’ snipped; not my business ]

    (There are those who can devise and use their own languages, and those
    who can't.)

    Some amount of my stuff recently has involved various niche stuff.
    ’ Interfacing with hardware;
    ’ Motor controls;
    ’ Implementing things like an OpenGL back-end or similar;
    ’ Being used for a Boot ROM and OS kernel;
    ’ Sometimes neural nets.

    Some impressive stuff.


    Some features are useful in some contexts but not others:
    For example, "__int128" is very helpful when writing FPU-emulation code
    for Binary128 handling, but has a lot fewer use-cases much beyond this.

    Or, like:
    ’ exp=vala[126:112];’ //extract exponent
    ’ fra=(_BitInt(128)) { 0x0001i16, vala[111:0]};’ //extract fraction

    I had i128/u128 types at one point (quite a nice implementation too; it
    was only missing full 128-bit divide, I had only 128/64.)

    But the only place they got used was implementing 128-bit support in the self-hosted compiler and its library! So they were dropped.

    Unless maybe something can come along that is a better C than C...

    There are lots of new products, mostly too ambitious, too big and too
    complex. But C is already ensconced everywhere.

    Would likely simplify or eliminate some infrequently used features in C.

    Possibly:
    ’ Preprocessor, still exists, but its role is reduced.
    ’’’ Its role can be partly replaced by compiler metadata.
    ’ Trigraphs and digraphs: Gone;
    ’ K&R style declarations, also gone;
    ’ Parser should not depend on previous declarations;
    ’ Non trivial types and declarator syntax: Eliminate;
    ’ ...

    Possibly:
    Pointers and arrays can be specified on the type rather than declarator
    (so, more like C# here)
    ...

    But, as I see it, drastically changing the syntax (like in Go or Rust)
    is undesirable. Contrast, say, C# style syntax was more conservative.

    Nobody cares about C syntax. Learning all its ins and outs seems be a
    rite of passage.

    The trouble is that C-style is so dominant, few people would know what a decent syntax looks like. Or, more, likely, they associate a clean, well-designed syntax with toy or scripting languages, and can't take it seriously.

    But if it looks as hairy as C++ then it must be the business!

    Though, the harder problem here isn't necessarily that of designing or implementing it, but more in how to make its use preferable to jus
    staying with C.


    One merit is if code can be copy-pasted, but if one has to change all instances of:
    ’ char *s0, *s1;
    To:
    ’ char* s0, s1;

    Well, this is likely to get old, unless it still uses, or allows C style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Fri Oct 10 01:54:29 2025
    On 2025-10-10, bart <bc@freeuk.com> wrote:
    On 09/10/2025 04:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:

    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.



    The only real exception I have found to this rule basically being in
    relation to some features I have borrowed from languages like GLSL and
    Verilog. But, some of this stuff isn't so much making the language
    "higher level" as much as "being easier to map to ISA features and
    optimize".

    Say:
    ’ vd[62:52]=vs[20:10];
    Being easier to optimize than, say:
    ’ vd=(vd&(~(2047ULL<<52)))|(((vs>>10)&2047ULL)<<52);

    Using special bit-features makes it easier to generate decent code for a simple compiler.

    But gcc for example has no trouble optimising that masking/shifting version.

    (It can do it in four x64 instructions, whereas I need nine working from vd.[62..52] := vs.[20..10]. It could be improved though; I don't need to extract the data to bits 10..0 first for example.)

    The main advantage is that it is a LOT easier to write, read and
    understand. The C would need macros to make it practical.

    I'm skeptical that the C macro system is powerful enough to actually
    create an operand like

    bits(vd, 52, 62)

    such that this constitutes an lvalue that can be assigned,
    such that those range of bits will receive the value.

    The closest C mechanism to that is the bitfield, which has
    compartments decided at compile-time.

    What if 52 and 62 could be variables?

    Common Lisp has this feature, via the LDB macro (load byte).

    For instance:

    (ldb (byte 4 2) 100)
    9

    I.e. 100 is 1100100 binary, and we are taking 4 bits starting
    from bit 2 (zero based), taking the 1001.

    If we have a variable that holds 100 we can overwrite those 4
    bits with say 15:

    (let ((x 100))
    (setf (ldb (byte 4 2) x) 15)
    x)
    124

    I.e. 1111100 which is 31 x 4 = 124.

    What is (byte 4 2)? It is a function which constructs an implementation-defined object that represents a "byte specification".

    When ldb is behaving as an assignable place, it can access the syntax to get to the constants; the function isn't necessarily called at run-time to construct a value, though such code can be generated also, in a less optimized implementation. I.e. it can generate code which calls the function dpb (deposit byte) which takes a byte spec and value.

    Such code has to be used in the worst cases, like when instead of
    (byte ...) expressions you have something which evaluates to a byte
    expression otherwise, like a varible:

    (let ((x (byte-spec-out-of-thin-air)))
    (y 73))
    (setf (ldb x y) 0))

    Let's see how CLISP deals with that one:

    [1]> (ext:expand-form '(setf (ldb x y) 0)) ;; output indented and annotated:
    (LET* ((#:BYTESPEC-3319 X) ;; evaluate x bytespec to temporary var
    (#:NEW-3320 0)) ;; evaluate 0 to temporary var
    ;; call DPB on these to put the bitfield into value taken from Y.
    (LET ((#:NEW-3318 (DPB #:NEW-3320 #:BYTESPEC-3319 Y)))
    ;; store edited value back into Y
    (SETQ Y #:NEW-3318)
    ;; return edited value
    #:NEW-3320)) ;

    If you didn't have byte and ldb in Common Lisp, you could write them
    yourself, and have them work with setf.

    You'd have to write a "setf expansion" for ldb, and you could have it
    spit out inline code to do shifting and masking, when the argument
    is a (byte ...) expression from which your expansion code can pull out the offset and width.


    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Thu Oct 9 19:43:51 2025
    On 10/9/2025 6:54 PM, Kaz Kylheku wrote:
    On 2025-10-10, bart <bc@freeuk.com> wrote:
    On 09/10/2025 04:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:

    [...]

    Fwiw:

    https://github.com/rofl0r/chaos-pp

    :^D

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Thu Oct 9 19:50:43 2025
    BGB <cr88192@gmail.com> writes:
    On 10/8/2025 8:39 PM, Kaz Kylheku wrote:
    On 2025-10-08, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    But in addition to that, in Kaz's example, n is not a constant
    expression, so `&(t.member-designator)` is not an address constant
    and therefore `offsetof(S, A[n])` has undefined behavior.
    Great; I'd like to hear reasons to avoid it so I don't look foolish
    for having overlooked it for manytyears. :)

    Every compiler I've tried handles this "correctly", and I tend to
    I'm sure I've seen foo.bar expressions on the right of an offsetof,
    but those still yield constants.


    I think it is a case of, it is not required to work...

    But, if the typical implementation is something like, say:
    #define offsetof(T, M) ((long)(&(((T *)0)->M)))

    It is probably going to work without issue.

    The cast needs to be (size_t), not (long). With that change,
    the behavior is still undefined, but it's likely to work in most implementations, which is all that's required for code that's part
    of the implementation.

    Several implementations I've tried (gcc, clang, tcc) implement the
    offsetof macro via "__builtin_offsetof". Whatever compiler magic
    is used to implement "__builtin_offsetof" typically works correctly
    for Kaz's example (which is of course one of the possible results of
    undefined behavior). One other compiler I've tried has a #define
    similar to yours (and also gets the type wrong, but the author of
    that implementation is not interested in bug reports from me).

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Thu Oct 9 22:50:38 2025
    On 10/9/2025 7:13 PM, bart wrote:
    On 09/10/2025 04:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:

    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.



    The only real exception I have found to this rule basically being in
    relation to some features I have borrowed from languages like GLSL and
    Verilog. But, some of this stuff isn't so much making the language
    "higher level" as much as "being easier to map to ISA features and
    optimize".

    Say:
    ’’ vd[62:52]=vs[20:10];
    Being easier to optimize than, say:
    ’’ vd=(vd&(~(2047ULL<<52)))|(((vs>>10)&2047ULL)<<52);

    Using special bit-features makes it easier to generate decent code for a simple compiler.

    But gcc for example has no trouble optimising that masking/shifting
    version.


    BGBCC is not so clever...

    Granted, its code footprint is tiny vs GCC, and it can do a full rebuild
    in a few seconds (with effectively the entire "compiler toolchain" in a
    single binary).

    Like, GCC and LLVM are both very large (over 10M lines).

    Contrast, BGBCC is still in kLOC territory.


    Granted, still not that small, still pretty big if compared with
    something like Doom; but alas, I haven't really been able to fit a C
    compiler into a Doom-like code footprint (say, trying to keep a C
    compiler under 30k lines).

    I did start making an attempt at one point, but ended up dropping the
    effort after I have already exceeded a Doom-like code footprint, and it
    still wasn't very close to being done.

    Is a little easier with an interpreter, but if one wants sensible native
    code generation, doing it within a small footprint is difficult.


    So, as-is, I have a compiler that is roughly around the size of the
    Quake 2 engine...


    (It can do it in four x64 instructions, whereas I need nine working from vd.[62..52] := vs.[20..10]. It could be improved though; I don't need to extract the data to bits 10..0 first for example.)

    The main advantage is that it is a LOT easier to write, read and
    understand. The C would need macros to make it practical.


    Shifts/Masks and macros is more traditional, but as noted, with my
    compiler explicit bit notation is easier to optimize, as well as read
    and write.


    Though, Verilog itself, not so much... Works well in an ASIC or FPGA,
    not so much on a CPU.

    Though, as can be noted:
    ’’ Bit-ranges are required to be constant at compile time;
    ’’ When used with normal integer types, both bounds are required.

    I can handle some variable elements, but it gets rapidly complicated. At some point it needs to use library functions to do the work.


    In my case, I only allowed constant ranges here.

    If runtime calls were used, they would eat any possible savings.
    But, the ability to generate efficient code here falls on its face if non-constant.



    OTOH, GLSL offers nice and efficient ways to deal with SIMD.
    Well, and also having some types for bit-preserving casts.
    Or ability to specify endianess and alignment for individual struct
    members.
    ...



    Granted, a few of my own language design attempts ended up with a
    different mess:’ [...]

    A sensibly defined language isn't something easily to create
    or obtain! - Personally I'd have appreciated it more if more
    designers of "own languages" have oriented their designs on
    sensible existing and proven concepts. - There may be a
    "market" for all these "own languages", I don't know, but I
    also don't care much, given what I've seen or heard of yet.
    (This isn't meant to be offensive, just to be clear, only
    that I don't care much. As compiler writers don't care much
    what I think.)


    Yeah.

    They have either tended to not amount to much, or converged towards
    more conventional languages.



    [ attempt for a discussion on features of "own language"
    ’’ snipped; not my business ]

    (There are those who can devise and use their own languages, and those
    who can't.)

    Some amount of my stuff recently has involved various niche stuff.
    ’’ Interfacing with hardware;
    ’’ Motor controls;
    ’’ Implementing things like an OpenGL back-end or similar;
    ’’ Being used for a Boot ROM and OS kernel;
    ’’ Sometimes neural nets.

    Some impressive stuff.


    Yes, and mostly C domain.


    I have my own experimental ISA which was partly designed with the
    intention of using it for a mix of motor controls and computer vision.

    Mostly I have just ended it for running ports of 90s era games (and
    otherwise mostly using ye olde ARM). Partly, one is still hard-pressed
    to get an FPGA to be performance competitive with something like a
    RasPi; and a RasPi is cheaper.


    But, for my own ISA, did end up writing the firmware and OS (including
    OpenGL) using my own compiler, although mostly in an extended C dialect.


    And, using my C compiler as:
    I could throw it together for my own ISA and design experiments;
    GCC or Clang would have been too much of an uphill battle;
    LCC would still have left me needing to do most of the relevant work myself; ...


    If comparing against GCC targeting RV64, my stuff gets better
    performance (though, BGBCC typically loses if both compilers are limited
    to plain RV64G; but my compiler with my ISA can beat GCC when GCC is
    limited to RV64G).


    Though, With a few carefully selected extensions, RISC-V can be brought
    into a similar performance profile as my own ISAs:
    Indexed Load and Store;
    Load/Store Pair (load or store pairs of 2 registers at a time);
    Jumbo Prefixes (can expand immediate values from 12 to 33 bits).

    In some programs, this combination can get a 40-60% speedup over plain
    RV64G.

    A lot of other things are possible, but the gains are generally a lot
    smaller.


    As I can note, ISA list supported by my compiler looks kinda like:
    SuperH: SH4
    BJX1 (an extended variant of SH4).
    (Split into several variants)
    ( Not currently maintained )
    BJX2 (Current ISA family)
    XG1: Original form of the ISA
    XG2: Intermediate form
    XG3: Reworked to coexist better with RISC-V.
    RISC-V
    RV64G/RV64GC
    Various optional extensions.


    While RISC-V exists and is popular, not fully jumped over to RV as in
    its basic form, its performance is a little weak (partly due to weak
    areas and "foot guns"). Its performance can be improved, but there are
    limits.

    The XG3 variant is promising, but is essentially XG2 and RV64G just sort
    of awkwardly hot-glued together. I don't expect it will see widespread adoption (even if it does get reasonably good performance). Like, an ISA design that is two unrelated ISAs glued together isn't necessarily the
    most elegant solution (even if XG3's encoding scheme was able to clean
    up some of the dog chew in XG2).

    Pretty much all of the normal RV64G (or RV64I) encodings are still
    usable in XG3, just with trade-offs (like, RV64 encodings have split X/F registers whereas XG3 encodings have a unified register space, ...).



    Some features are useful in some contexts but not others:
    For example, "__int128" is very helpful when writing FPU-emulation
    code for Binary128 handling, but has a lot fewer use-cases much beyond
    this.

    Or, like:
    ’’ exp=vala[126:112];’ //extract exponent
    ’’ fra=(_BitInt(128)) { 0x0001i16, vala[111:0]};’ //extract fraction

    I had i128/u128 types at one point (quite a nice implementation too; it
    was only missing full 128-bit divide, I had only 128/64.)

    But the only place they got used was implementing 128-bit support in the self-hosted compiler and its library! So they were dropped.


    I have them in my compiler, and to some extent in my ISA, but one of the
    main use cases I have for them is implementing Binary128 support code
    (or, "long double").

    RISC-V doesn't have "__int128", and (in used) many of the operations end
    up as runtime calls. It is more awkward in RISC-V as well as there also
    isn't really a good/efficient way to implement 128-bit math using the available 64-bit instructions.


    I ended up adding a modified form of the 'Q' instructions from RISC-V,
    where:
    "long double" is rarely used enough that the cost of handling it with emulation traps is acceptable;
    But, common enough that you don't want it to be too horribly slow;
    My ISA has access to some 128-bit integer operations;
    On the RISC-V side, the cost of doing everything with 64-bit integer
    math is slow enough to offset the cost of the emulation traps.

    Though not quite the same as the Q extension:
    Uses register pairs rather than 128 bit registers;
    But, my sentiment here is that for low-traffic uses (128-bit integer and floating-point stuff in general), then the use of pairs of 64-bit
    registers is preferable.


    In the underlying hardware, supporting 128-bit integer math was a
    cheaper option if compared with 128-bit FPU hardware; and int128 also
    sees a little more traffic.

    As for ISA level support for int128:
    ADD/SUB, Native (ALU chaining)
    Some control bits are needed to merge Carry-Select across the units;
    Carry-select scales well, so not too much added latency cost.
    Shift, Native (shift units ganged)
    Can use two 64-bit funnel shifters in parallel with some trickery.
    AND/OR/XOR: 2x ALU in parallel; No special handling needed.
    ...

    This leaves MUL/DIV/etc, no viable way to handle them directly or
    efficiently in hardware. Currently, the fastest way to do 128-bit
    multiply in this case being to build it from 32-bit widening-multiply instructions.

    And, for Binary128 FPU multiply, one needs the high 128-bits of a
    128*128 -> 256 bit widening multiply.


    Unless maybe something can come along that is a better C than C...

    There are lots of new products, mostly too ambitious, too big and too complex. But C is already ensconced everywhere.


    Yeah.

    Note trying to beat C at its own game is not about having the biggest
    possible feature list.

    A feature which might help in one context might be actively detrimental
    in another.


    Maybe OOP is itself optional, so a C-like subset basically has close to
    a 1:1 feature-set with C (and remains well suited for procedural code).

    Likewise, shouldn't need to pay for things like RTTI or Exceptions when
    not used.


    Also, ideally "don't pay for what you don't use".
    Some compiler or language features exist, but if programmers don't use
    them, ideally they shouldn't need to pay for them.


    Like, say, a language could have optional dynamic types; but if writing
    a small Boot ROM, then this sort of thing is strictly off-limits.

    Like, say, if you have 32K of ROM space, can't justify wasting it on non-essential features.


    But, if a C alternative language is basically just C with a slightly
    different syntax, not necessarily all that compelling either.

    ...


    Would likely simplify or eliminate some infrequently used features in C.

    Possibly:
    ’’ Preprocessor, still exists, but its role is reduced.
    ’’’’ Its role can be partly replaced by compiler metadata.
    ’’ Trigraphs and digraphs: Gone;
    ’’ K&R style declarations, also gone;
    ’’ Parser should not depend on previous declarations;
    ’’ Non trivial types and declarator syntax: Eliminate;
    ’’ ...

    Possibly:
    Pointers and arrays can be specified on the type rather than
    declarator (so, more like C# here)
    ...

    But, as I see it, drastically changing the syntax (like in Go or Rust)
    is undesirable. Contrast, say, C# style syntax was more conservative.

    Nobody cares about C syntax. Learning all its ins and outs seems be a
    rite of passage.

    The trouble is that C-style is so dominant, few people would know what a decent syntax looks like. Or, more, likely, they associate a clean, well-designed syntax with toy or scripting languages, and can't take it seriously.

    But if it looks as hairy as C++ then it must be the business!


    Comparably, C# style syntax is simplified if compared with C or C++, but retains many similar properties (and isn't quite as verbose or as
    awkward as Java).


    Though, the harder problem here isn't necessarily that of designing or
    implementing it, but more in how to make its use preferable to jus
    staying with C.


    One merit is if code can be copy-pasted, but if one has to change all
    instances of:
    ’’ char *s0, *s1;
    To:
    ’’ char* s0, s1;

    Well, this is likely to get old, unless it still uses, or allows C
    style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    ’typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.


    It is a thing of if the language uses a C# style syntax, even if
    "unsafe", directly copy-pasting from C would require some amount of editing.


    But, either way, to be useful as a C alternative:
    Would need to be able to do all of the same stuff as C in a roughly
    similar way;
    Should have similar or better performance;
    ...


    Though, one thing is:
    The language should not try to ram OOP down everyone's throat (a problem
    that existed in both C# and Java).

    Ideally, one could still do things using a C like procedural style.




    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Keith Thompson@3:633/10 to All on Thu Oct 9 20:59:26 2025
    bart <bc@freeuk.com> writes:
    On 09/10/2025 04:49, BGB wrote:
    [...]

    Nobody cares about C syntax.

    That is so manifestly untrue that I can't imagine what you actually
    meant.

    Many of us, myself included, don't particularly like some aspects of C
    syntax, but that's not the same as not caring about it.

    Learning all its ins and outs seems be a
    rite of passage.

    Perhaps. It's also necessary if you want to work with the language.

    The trouble is that C-style is so dominant, few people would know what
    a decent syntax looks like. Or, more, likely, they associate a clean, well-designed syntax with toy or scripting languages, and can't take
    it seriously.

    But if it looks as hairy as C++ then it must be the business!

    C syntax has survived and been propagated to other languages because
    it's well known, not, I think, because anybody really likes it.

    [...]

    One merit is if code can be copy-pasted, but if one has to change
    all instances of:
    ’ char *s0, *s1;
    To:
    ’ char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows C
    style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that. I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines. Using typeof that way triggers
    my WTF filter.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Fri Oct 10 04:20:41 2025
    On 2025-10-10, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Several implementations I've tried (gcc, clang, tcc) implement the
    offsetof macro via "__builtin_offsetof". Whatever compiler magic
    is used to implement "__builtin_offsetof" typically works correctly
    for Kaz's example (which is of course one of the possible results of undefined behavior).

    It is a documented extension, because GCC parses it, and the reason is
    given in GCC's documented grammar for the feature:

    primary:
    "__builtin_offsetof" "(" typename "," offsetof_member_designator ")"

    offsetof_member_designator:
    identifier
    | offsetof_member_designator "." identifier
    | offsetof_member_designator "[" expr "]"

    Accompanied by the remark: "In either case, member may consist of a
    single identifier, or a sequence of member accesses and array
    references."

    This is a section under: https://gcc.gnu.org/onlinedocs/gcc/Syntax-Extensions.html

    Expr is not constrained to be constant.

    It is not explained what the semantics is of the extended designator,
    but it can be reasonably inferred.

    There are probably code bases out there which perpetrate that trick;
    GCC has been in many places and had lots of people hack on it.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Windows v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Fri Oct 10 01:27:34 2025
    On 10/9/2025 10:59 PM, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    On 09/10/2025 04:49, BGB wrote:
    [...]

    Nobody cares about C syntax.

    That is so manifestly untrue that I can't imagine what you actually
    meant.

    Many of us, myself included, don't particularly like some aspects of C syntax, but that's not the same as not caring about it.


    Yes.


    Learning all its ins and outs seems be a
    rite of passage.

    Perhaps. It's also necessary if you want to work with the language.

    The trouble is that C-style is so dominant, few people would know what
    a decent syntax looks like. Or, more, likely, they associate a clean,
    well-designed syntax with toy or scripting languages, and can't take
    it seriously.

    But if it looks as hairy as C++ then it must be the business!

    C syntax has survived and been propagated to other languages because
    it's well known, not, I think, because anybody really likes it.


    I would gladly pick C style syntax over PASCAL, FORTRAN, or COBOL.


    [...]

    One merit is if code can be copy-pasted, but if one has to change
    all instances of:
    ’ char *s0, *s1;
    To:
    ’ char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows C
    style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that. I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines. Using typeof that way triggers
    my WTF filter.


    Agreed.



    I think it can be contrast with C# style syntax (with "unsafe") where
    one would write:
    char* s0, s1;
    Though, imagining a world where probably char is an unsigned byte, so
    that UTF-8 makes sense.



    So, say, if we had types (for a hypothetical language) like:
    sbyte, ubyte: 8-bits, signed/unsigned
    byte: 8-bits, probably unsigned
    char: 8-bits, probably unsigned (UTF-8)
    wchar: 16-bits, unsigned (UTF-16)
    short: 16-bits, signed
    ushort: 16-bits, unsigned
    int: 32-bits, signed
    uint: 32-bits, unsigned
    long: 64-bits, signed
    ulong: 64-bits, unsigned

    Maybe some more, but with explicit bit sizes.
    int8/int16/int32/int64/int128
    uint8/uint16/uint32/uint64/uint128
    But, no separate "unsigned".
    Core type name is always a single identifier, unlike C.

    And, special types:
    string: String, UTF-8
    wstring: String, UTF-16
    ...
    While a string is (effectively) a pointer to the first character, the
    type can be seen as distinct from that of 'char*'. Nominal
    representation would be as a series of codepoints terminated with a NUL
    byte.

    Default string type can be UTF-8 because, most of the time, UTF-16 would
    be a waste of memory (but, can be kept for "those that have that
    preference").


    And, floating point:
    float: Binary32
    double: Binary64
    half: Binary16
    Maybe explicit sized names:
    float16/float32/float64/float128

    Maybe vectors (optional):
    vec2f 2x float
    vec4f 4x float
    vec4h 4x half
    vec2d 2x double
    quatf quaternion (float)
    quath quaternion (half)

    Untyped bit-blobs:
    m8, m16, m32, m64, m128
    Say:
    ulong li;
    double f;
    li=(m64)f; //cast double->ulong as bit pattern
    f=(m64)li; //cast ulong->double as bit pattern


    With basic decl syntax like:
    int i; //normal variable
    int[16] ia; //fixed array (inline)
    int* pi; //pointer
    int[] ia2; //flexible array (reference, hosted only)
    int*[16] pia; //fixed array of pointers

    This would apply to primitive types and structs.

    Object types having different suffixes (hosted only):
    Foo obj1; //basic object
    Foo! obj2; //automatic
    Foo^ obj3; //refcount
    Foo(Z) obj4; //zone
    Foo[] aobj1; //flexible array of Foo
    Foo[16] aobj2; //fixed array of Foo
    ...

    Where, struct and class implicitly declare type, so:
    struct Str1 {
    int x, y;
    }
    class Foo:Bar { //class Foo, extends Bar
    ...
    }
    interface IBaz { //interface
    ...
    }

    Where, likely flexible arrays, classes, and interfaces, only exist:
    If the implementation is hosted;
    If a non-hosted implementation provides a full memory manager.


    Would eliminate some more obscure patterns from C syntax, like:
    int (*arr[16])[16]; //like, WUT?...
    Just sorta say, stuff like this doesn't exist.

    Might consider doing function pointers like:
    delegate void FuncT();
    ...
    FuncT fun;

    Well, in contrast to some of my own languages which had used
    typedef void FuncT();
    For this, and 'delegate' as a scope modifier (where identifier lookups
    may look into the variable). But, nothing stops using delegate for both purposes (based on whether it is followed by a prototype or object
    variable declaration).



    Scoping could look like:
    global top-level (behaves like C top-level);
    namespaces.

    Probably, things like function overloading are only allows inside
    classes or namespaces. At the global toplevel, no overloading is allowed
    (so, it is like C++ 'extern "C"' by default).

    So, say:
    namespace foo {
    using c.stdio;

    //why not?...
    int func(int x, int y)
    {
    printf("yeah...\n"); //via stdio
    return x+y;
    }
    }

    Where, say, we don't need a bunch of wrapper classes for file IO and
    printing, because C's "stdio.h" already does this.

    Here, there can be some magic behind the scenes, where the compiler can
    use namespaces and metadata rather than bulk textual inclusion.

    One option is that when the compiler compiles stuff, like the runtime
    library, it also generates "manifests" that the compiler can use to find declarations. Likely, the manifest files could exist like a sort of hierarchical database partly mapped onto a virtual filesystem, with
    search paths and similar (sorta like the class path in the JVM). Just,
    this metadata will only exist for compiling stuff.

    Why not textual inclusion?: Because it wastes a lot of CPU time and RAM
    to generate and parse a mountain of random stuff for every translation
    unit (the amount of text pulled in from headers is typically several
    orders of magnitude larger than the actual code for the translation unit itself). Also precompiled headers are a poor solution to this.



    May or may not do varargs differently.
    One possible interpretation could be, say:
    void vafunc(char* s, va...)
    {
    char* s1;
    long x, y;
    x=va[0]; //first variable argument
    y=va[1]; //second variable argument
    s1=(char*)va[2]; //third argument (string)
    }
    Where, the exact element type depends on the target, but probably 'long'
    or something (or 'int' on a 32-bit target).

    ABI rule then would be:
    If the ABI would otherwise have distinction based on argument types,
    vararg functions will receive all arguments as native machine words (int
    or long or similar). If they are passed in registers, they will be
    spilled to memory, and this memory will be returned as the argument list.

    Then, say, C style va_list could be faked as, say:
    long* vap=va;
    x=*vap++;
    y=*vap++;
    s1=(char*)(*vap++);

    ...



    As for whether or not this offers enough to actually be worth bothering
    with, vs "just use C", dunno...

    I am seemingly one of the few people considering ideas for a
    hypothetical C replacement that actually likes C.

    ...



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Fri Oct 10 12:06:10 2025
    On 10/10/2025 08:27, BGB wrote:
    On 10/9/2025 10:59 PM, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:


    One merit is if code can be copy-pasted, but if one has to change
    all instances of:
    ’ ’ char *s0, *s1;
    To:
    ’ ’ char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows C
    style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    ’ typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that.’ I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines.’ Using typeof that way triggers
    my WTF filter.


    Agreed.



    I think it can be contrast with C# style syntax (with "unsafe") where
    one would write:
    ’ char* s0, s1;

    Does C# treat s1 as "char*" in this case? That sounds like an
    extraordinarily bad design decision - having a syntax that is very like
    the dominant C syntax yet subtly different.

    Issues like this have been "solved" for decades - in the sense that
    people who care about their code don't make mistakes from mixups of
    "char" and "char*" declarations. There are a dozen different ways to be
    sure it is not an issue. Simplest of all is a style rule - never
    declare identifiers of different types in the same declaration. I'd
    have preferred that to be a rule baked into the language from the start,
    but we all have things we dislike about the C syntax.


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Fri Oct 10 11:25:39 2025
    On 10/10/2025 02:54, Kaz Kylheku wrote:
    On 2025-10-10, bart <bc@freeuk.com> wrote:
    On 09/10/2025 04:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:

    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.



    The only real exception I have found to this rule basically being in
    relation to some features I have borrowed from languages like GLSL and
    Verilog. But, some of this stuff isn't so much making the language
    "higher level" as much as "being easier to map to ISA features and
    optimize".

    Say:
    ’ vd[62:52]=vs[20:10];
    Being easier to optimize than, say:
    ’ vd=(vd&(~(2047ULL<<52)))|(((vs>>10)&2047ULL)<<52);

    Using special bit-features makes it easier to generate decent code for a
    simple compiler.

    But gcc for example has no trouble optimising that masking/shifting version. >>
    (It can do it in four x64 instructions, whereas I need nine working from
    vd.[62..52] := vs.[20..10]. It could be improved though; I don't need to
    extract the data to bits 10..0 first for example.)

    The main advantage is that it is a LOT easier to write, read and
    understand. The C would need macros to make it practical.

    I'm skeptical that the C macro system is powerful enough to actually
    create an operand like

    bits(vd, 52, 62)

    such that this constitutes an lvalue that can be assigned,
    such that those range of bits will receive the value.

    The closest C mechanism to that is the bitfield, which has
    compartments decided at compile-time.

    What if 52 and 62 could be variables?


    You wouldn't write the macros like that. There'd be Get/Set versions,
    and probably separate versions for individual bits and bitfields.

    The Set ones wouldn't take a reference either, but lvalues only.

    So the example might become:

    vd = SETBF(vd, 52, 62, GETBF(vs, 10, 20));

    You'd need to decide on which order indices are in. In my version, I can
    do A.[0..7] or A.[7..0] if the values are constants (the compiler will reorder), or are implemented by function (it will have code to swap if needed).

    But if the choice isn't there, I decided they would be ordered like conventional array indices and range syntax, so increasing LTR. Even
    though bit indices normally increase RTL.


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Fri Oct 10 17:28:02 2025
    On Fri, 10 Oct 2025 12:06:10 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 10/10/2025 08:27, BGB wrote:
    On 10/9/2025 10:59 PM, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:


    One merit is if code can be copy-pasted, but if one has to change
    all instances of:
    ? ? char *s0, *s1;
    To:
    ? ? char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows
    C style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    ? typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that.? I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines.? Using typeof that way triggers
    my WTF filter.


    Agreed.



    I think it can be contrast with C# style syntax (with "unsafe")
    where one would write:
    ? char* s0, s1;

    Does C# treat s1 as "char*" in this case? That sounds like an extraordinarily bad design decision - having a syntax that is very
    like the dominant C syntax yet subtly different.


    Generally, I disagree with your rule. Not that it makes no sense at
    all, but sometimes a violation has more sense. For example, I strongly
    prefer for otherwise C-like languages to parse 011 literal as decimal
    11 rather than 9.

    In this particular case it's more subtle.
    What makes it a non-issue in practice is the fact that pointers is C# is
    very rarely used expert-level feature, especially so after 7 or 8
    years ago the language got slices (Span<T>).
    A person that decides to use C# pointers has to understand at least
    half a dozen of more arcane things than this one.
    Also it's very unlikely in case somebody made such mistake that his
    code will pass compilation. After all, we're talking about C# here, not something like Python.



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Fri Oct 10 17:47:57 2025
    On 10/10/2025 16:28, Michael S wrote:
    On Fri, 10 Oct 2025 12:06:10 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 10/10/2025 08:27, BGB wrote:
    On 10/9/2025 10:59 PM, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:


    One merit is if code can be copy-pasted, but if one has to change
    all instances of:
    ’ ’ char *s0, *s1;
    To:
    ’ ’ char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows
    C style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    ’ typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that.’ I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines.’ Using typeof that way triggers
    my WTF filter.


    Agreed.



    I think it can be contrast with C# style syntax (with "unsafe")
    where one would write:
    ’ char* s0, s1;

    Does C# treat s1 as "char*" in this case? That sounds like an
    extraordinarily bad design decision - having a syntax that is very
    like the dominant C syntax yet subtly different.


    Generally, I disagree with your rule. Not that it makes no sense at
    all, but sometimes a violation has more sense. For example, I strongly
    prefer for otherwise C-like languages to parse 011 literal as decimal
    11 rather than 9.

    I did not intend to describe a general rule (and I agree with you in
    regard to octal).


    In this particular case it's more subtle.
    What makes it a non-issue in practice is the fact that pointers is C# is
    very rarely used expert-level feature, especially so after 7 or 8
    years ago the language got slices (Span<T>).
    A person that decides to use C# pointers has to understand at least
    half a dozen of more arcane things than this one.
    Also it's very unlikely in case somebody made such mistake that his
    code will pass compilation. After all, we're talking about C# here, not something like Python.


    Sure.

    It would seem to me, however, that it would have been better for the C# designers to pick a different syntax here rather than something that
    looks like C, but has subtle differences that are going to cause newbies confusion when they try to google for explanations for their problems.
    For example, if raw pointers are rarely used, then they should perhaps
    be accessible using a more verbose syntax than a punctuation mark -
    "ptr<char> s0, s1;" might work.

    However, I have no experience with C#, and don't know the reasons for
    its syntax choices.



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Fri Oct 10 15:01:51 2025
    On 10/10/2025 5:06 AM, David Brown wrote:
    On 10/10/2025 08:27, BGB wrote:
    On 10/9/2025 10:59 PM, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:


    One merit is if code can be copy-pasted, but if one has to change
    all instances of:
    ’ ’ char *s0, *s1;
    To:
    ’ ’ char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows C
    style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    ’ typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that.’ I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines.’ Using typeof that way triggers
    my WTF filter.


    Agreed.



    I think it can be contrast with C# style syntax (with "unsafe") where
    one would write:
    ’’ char* s0, s1;

    Does C# treat s1 as "char*" in this case?’ That sounds like an extraordinarily bad design decision - having a syntax that is very like
    the dominant C syntax yet subtly different.


    Yes. In this case, things like "*" or "[]" are associated with the type
    rather than the declarator.


    Issues like this have been "solved" for decades - in the sense that
    people who care about their code don't make mistakes from mixups of
    "char" and "char*" declarations.’ There are a dozen different ways to be sure it is not an issue.’ Simplest of all is a style rule - never
    declare identifiers of different types in the same declaration.’ I'd
    have preferred that to be a rule baked into the language from the start,
    but we all have things we dislike about the C syntax.


    The partial reason for some of the differences is that it allows a
    parser that does not need to know about previous typedefs and declarations.

    In C, you need to know prior typedefs to parse correctly.
    In C++, you also need to know previous template declarations, etc.
    With classes/structs/etc adding implicit typedefs.


    Avoiding the need to know typedefs in advance allows for a parser where
    there either is no preprocessor (Java), or the preprocessor still exists
    but its use is far more limited in scope and mostly unused (C#).

    Also typically, things like the type-system are handled later in the
    pipeline (in .NET, it was closer to what would be considered the linker
    stage in a traditional compiler).

    In effect, the front-end process works with relatively incomplete
    information, producing IL bytecode that specifies where to look for
    things and what to look for, but not the complete information. When an
    EXE or DLL is produced, it would resolve things for what exists within
    the current "assembly" (roughly equal to the EXE or DLL being compiled),
    with the ".NET runtime" needing to sort out the rest (typically AOT
    compiling the binaries into some internal form).

    However, I would assume not having a "runtime" here, meaning the linker
    would need to produce native code binaries.


    FWIW: BGBCC also generally uses a bytecode representation internally,
    and then produces native binaries as output. Though, the way the
    bytecode is structured and works differs from that of .NET bytecode.
    However, in both cases, they are using implicitly-typed stack machines
    at the IL stage. In BGBCC, for the backend stage, the bytecode IL is translated into "Three-Address-Code" roughly in "SSA Form" (though, not exactly the same as in LLVM; as it typically uses a combination of
    variable-ID and sequence-number, rather than creating a new "register"
    every time; also typically the "phi" operations are implicit).

    Can note that it does support ASM, but the handling is generally that
    any ASM code is preprocessed and then passed through the IL stage as
    string blobs (then assembled in the backend stage).

    Note, while it is possible to go more directly from a stack IL to native
    code (without going through 3AC/SSA), the generated code is garbage.

    Also, while it is possible to have a compiler that uses SSA as an
    on-disk IR format (like Clang), IMO this creates a lot of pain and
    exposes too much of the backend machinery (it would be very much a pain
    to use LLVM bitcode in anything other than LLVM).

    So, seemingly, a stack-oriented bytecode is the "least pain" option.
    Well... Unless they do it like WASM and find other creative ways to
    screw it up...



    Can note that in the case of a language like C#, the visibility of types
    and similar comes through the use of namespaces (which partly take on a similar role to headers in C or C++, or packages in Java or ActionScript).

    Where, say:
    namespace foo { using bar.baz; } //C# style
    namespace foo { using namespace bar::baz; } //C++ style
    package foo { import bar.baz; } //ActionScript style

    Though:
    import bar.baz.*; //Java

    But, Java differs here in that the code structure (and packages) are
    directly tied to organization of files in the filesystem (typically with
    one class per file).

    Contrast, .NET and C# used "assemblies" as the organizing principle; or, generally, everything that is being compiled together to become a given
    EXE or DLL is lumped into a single unit.


    Though, one option could be to organize code instead by namespace, with
    the toplevel tied to each location in the search path.

    Though, with such a compiler, rather than specifying a list of
    individual source files, one might specify directories and the compiler figures things out on its own (basically compiling everything in a given directory).

    One way to handle things like static libraries would be to build a blob
    of intermediate bytecode (and/or native-code COFF or ELF objects) along
    with a manifest database. The bytecode blob would contain all of the IR
    for the library (or machine-code if native), and the manifest would only contain declarations (preferably in a semi-compact form that is
    reasonably efficient to search). The manifests could then partly be used
    for knowing about declarations, and also for which objects or libraries
    to pull into the program being compiled (rather than giving them
    individually on the command-line).


    This approach would differ from .NET which embeds all of the metadata
    into the object-files and distributable binaries. But, here I am
    assuming that the final binary is a bare native EXE or DLL image here;
    meaning that any manifest data for a DLL would need to be handled more
    like an "import library".

    In .NET, generally the EXE or DLL was merely being used as an external packaging scheme for holding the VM's IR image (typically with no actual machine code in the ExE/DLL; or for EXE's merely a stub to try to launch
    the .NET runtime).




    Can note that JVM uses JAR files that repurpose the ZIP format, but ZIP
    is a high-overhead format when used in this way. For my own uses, I
    typically used a variant of the WAD2 format, or a custom format I called
    WAD4, which also has lower overheads if compared with ZIP.

    Can note:
    WAD2: Originated with Quake
    Typically has 16-byte names, no directories.
    I have a variant that adds directories,
    but names drop to 14 chars if non-root.
    Encodes directories by each entry encoding its parent directory.
    Names are typically stored without file extensions.
    WAD4: Custom, used in my project some for data packaging and small VFS
    Similar to WAD2, but with directory trees;
    Name size is expanded to 32 bytes;
    Also had some amount of Unix style file metadata.
    Mostly for when used as a VFS.
    Namely: UID/GID/Mode.

    Contrast to an actual filesystem, which typically has a more complex structure. But, a format like WAD2 or WAD4 can keep overhead low (and
    both are more versatile than the IWAD/PWAD format used in Doom; which
    only stored a flat list of 8-byte names).


    Decided to leave out a bunch of stuff (for conciseness) and note that
    the most likely option ATM could be to use a further modified form of
    WAD2 for manifests, where:
    If payload data is small enough, it may be stored inline in the
    directory entry;
    If the name is too large for the name field, it is stored externally
    (similar to payload data).

    Metadata would likely be structured in a way that is superficially
    similar to the "Windows System Registry". Within my compiler, I had
    already used a similar system, though in the past the metadata had been expressed in a textual form (based similar to the REG format used by
    Windows, usually when installing stuff; itself derived from the INI
    format). In this way, the WAD lumps being used more as key/values or
    data blobs rather than in a file-like way.


    Can note that my considered format is somewhat different from the "hive" format used by Windows, but that format would be needlessly bulky if
    used for compiler metadata (and more suited for HDD based access, not so
    much for blobs to be read into RAM buffers).


    All this stuff could likely differ from one compiler implementation to
    another though (similar to how compilers may differ as to which format
    they use for object files and static libraries, and mostly no one
    notices or cares).

    ...



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Fri Oct 10 16:32:09 2025
    On 10/10/2025 10:47 AM, David Brown wrote:
    On 10/10/2025 16:28, Michael S wrote:
    On Fri, 10 Oct 2025 12:06:10 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 10/10/2025 08:27, BGB wrote:
    On 10/9/2025 10:59 PM, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:

    One merit is if code can be copy-pasted, but if one has to change >>>>>>> all instances of:
    ’’ ’ char *s0, *s1;
    To:
    ’’ ’ char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows >>>>>>> C style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    ’’ typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that.’ I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines.’ Using typeof that way triggers
    my WTF filter.

    Agreed.



    I think it can be contrast with C# style syntax (with "unsafe")
    where one would write:
    ’ ’ char* s0, s1;

    Does C# treat s1 as "char*" in this case?’ That sounds like an
    extraordinarily bad design decision - having a syntax that is very
    like the dominant C syntax yet subtly different.


    Generally, I disagree with your rule. Not that it makes no sense at
    all, but sometimes a violation has more sense. For example, I strongly
    prefer for otherwise C-like languages to parse 011 literal as decimal
    11 rather than 9.

    I did not intend to describe a general rule (and I agree with you in
    regard to octal).


    Yeah, '0' by itself indicating octal is weird, so I might agree here.
    123 //decimal
    0123 //maybe reinterpret as decimal?
    0o123 //octal
    0x123 //hexadecimal
    0b101 //binary

    In BGBCC, had defined some additional handling for suffixes:
    iNN, where NN is an integer, specifies a number of bits.
    uNN or uiNN, specifies a number of bits, but unsigned.
    Types could specify non-power-of-2 widths (understood as _BitInt).


    Though, there was also the wonk that these literals could also allow X
    and Z in place of bits or hex digits, but this was more a side-effect of
    a fizzled effort to try to add Verilog support to BGBCC (which was also
    sort of where the bit notation came from).

    Though, generally, X and Z have no real purpose in C code though (and
    may not exist in actual integer values), so would be little more than a curiosity (with some of this more as stuff intended to try to test out functionality being added for sake of trying to support Verilog).

    But, as noted, in a few cases, the Verilog mechanisms can offer a
    performance advantage over traditional C constructs. In other cases, not
    so much....

    This was being worked on at one point as I sometimes face frustration at
    the almost non-existent debugging features in Verilator (you basically
    have to do a more awkward form of printf debugging; would kinda be nice sometimes if one could set breakpoints and inspect variables, ...).

    But, what passes for control-flow in Verilog doesn't really map over so
    well (basically need to update stuff based on "sensitivity graph" mostly driven by clock signals and similar).



    In this particular case it's more subtle.
    What makes it a non-issue in practice is the fact that pointers is C# is
    very rarely used expert-level feature, especially so after 7 or 8
    years ago the language got slices (Span<T>).
    A person that decides to use C# pointers has to understand at least
    half a dozen of more arcane things than this one.
    Also it's very unlikely in case somebody made such mistake that his
    code will pass compilation. After all, we're talking about C# here, not
    something like Python.


    Sure.

    It would seem to me, however, that it would have been better for the C# designers to pick a different syntax here rather than something that
    looks like C, but has subtle differences that are going to cause newbies confusion when they try to google for explanations for their problems.
    For example, if raw pointers are rarely used, then they should perhaps
    be accessible using a more verbose syntax than a punctuation mark - "ptr<char> s0, s1;" might work.

    However, I have no experience with C#, and don't know the reasons for
    its syntax choices.


    Early on, it didn't have generics and so wouldn't use that syntax.

    Unlike C++, it doesn't have templates, so "ptr<char>" would not make so
    much sense, and then 'ptr' would be limited to being a class instance,
    which are always by-reference. Also early on, no operator overloading
    either (as with generics, this part was added later).


    Also the language discouraged pointers anyways, so you had to opt-in by
    using the 'unsafe' keyword before the compiler would allow them (and
    then, only for 'trusted' executables).

    Though, for a hybrid language, would likely drop the concept of trusted executables (or, allow it, maybe with the added constraint that object lifetimes be statically-provable; maybe asking too much though).

    The concept of trusted executables doesn't make as much sense with
    native-code compilation.





    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Fri Oct 10 23:45:32 2025
    On 2025-10-10, David Brown <david.brown@hesbynett.no> wrote:
    On 10/10/2025 08:27, BGB wrote:
    On 10/9/2025 10:59 PM, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:


    One merit is if code can be copy-pasted, but if one has to change
    all instances of:
    ’ ’ char *s0, *s1;
    To:
    ’ ’ char* s0, s1;
    Well, this is likely to get old, unless it still uses, or allows C
    style declaration syntax in this case.

    That one's been fixed (50 years late): you instead write:

    ’ typeof(char*) s0, s1;

    But you will need an extension if it's not part of C23.

    Yes, that will work in C23, but it would never occur to me to
    write that.’ I'd just write `char *s0, *s1;` or, far more likely,
    define s0 and s1 on separate lines.’ Using typeof that way triggers
    my WTF filter.


    Agreed.



    I think it can be contrast with C# style syntax (with "unsafe") where
    one would write:
    ’ char* s0, s1;

    Does C# treat s1 as "char*" in this case? That sounds like an extraordinarily bad design decision - having a syntax that is very like
    the dominant C syntax yet subtly different.

    The detailed properties of C syntax do not have that much mind share
    in the kind of development done in C# and its ilk.

    Only a minority of developers within a minority moving between
    C and C# would suffer from confusion.


    Issues like this have been "solved" for decades - in the sense that
    people who care about their code don't make mistakes from mixups of
    "char" and "char*" declarations. There are a dozen different ways to be sure it is not an issue. Simplest of all is a style rule - never
    declare identifiers of different types in the same declaration.
    have preferred that to be a rule baked into the language from the start,
    but we all have things we dislike about the C syntax.

    But the C syntax lets us factor out a common, complex part of the type
    between two declared entities into the stem, so that we then highlight
    what is different between them, without using a typedef alias. And the
    fact that they are in the declaration, shows they are related.

    struct foo {
    /* lotsa members */

    } x[42], *px = x, px_end = x + 42;

    I /think/ that Java goes further in that you can factor out array
    derivation into the stem:

    int[3] a[4], b; // don't quote me on it

    But something occurs to me. typedef shouldn't be a storage class;
    that is silly. typedef should be something you can derive in a
    declarator. Then you could do this:

    struct {
    /* lotsa members */

    } typedef(foo), x[42], *px = x, px_end = x + 42;

    How about a two-argument variant of typedef for use in
    any part of a declarator:

    int typedef(typedef(*, ptr_t)foo[42], array_t);

    This is just

    int *foo[42];

    in which the pointer to int is typedefed as ptr_t, the array of 42
    of those as array_t, and foo is declared as an object of that
    array type.

    Maybe :typedef syntax could be better.

    struct {
    /* lotsa members */

    } foo : typedef, x[42], *px = x, px_end = x + 42;

    and

    int *:typedef(ptr_t) foo[42]:typedef(array_t);

    Same thing: when the pointer is derived via *, the :typedef(name) syntax
    takes a snapshot of that type and stores it into the scope under that
    typedef name. Same thing with :typedef(arg) after the array declarator.

    :)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Sat Oct 11 00:02:58 2025
    On 2025-10-10, BGB <cr88192@gmail.com> wrote:
    Yeah, '0' by itself indicating octal is weird, so I might agree here.
    123 //decimal
    0123 //maybe reinterpret as decimal?
    0o123 //octal
    0x123 //hexadecimal
    0b101 //binary

    Lisp people worked this out before the end of the 80s:

    777
    777
    00777
    777
    #o777
    511
    #x777
    1911
    #b1001
    9

    Leading zeros changing base is really a sneaky stupidity, and causes
    problems in shell scripts also, from time to time.

    $ printf "%d\n" 0777
    511
    $
    $ echo $(( 0777 + 0 ))
    511

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Mon Oct 13 06:20:25 2025
    On 11.10.2025 02:02, Kaz Kylheku wrote:
    On 2025-10-10, BGB <cr88192@gmail.com> wrote:
    Yeah, '0' by itself indicating octal is weird, so I might agree here.
    123 //decimal
    0123 //maybe reinterpret as decimal?
    0o123 //octal
    0x123 //hexadecimal
    0b101 //binary

    Lisp people worked this out before the end of the 80s:

    777
    777
    00777
    777
    #o777
    511
    #x777
    1911
    #b1001
    9

    Leading zeros changing base is really a sneaky stupidity, and causes
    problems in shell scripts also, from time to time.

    $ printf "%d\n" 0777
    511
    $
    $ echo $(( 0777 + 0 ))
    511


    Yes, indeed. And behavior between shells and versions differs as well.

    $ dash -c 'printf "%d\n" 077'
    63
    $ ksh93u -c 'printf "%d\n" 077'
    63
    $ ksh93u+ -c 'printf "%d\n" 077'
    77

    Now, is that good that ksh has fixed that? (I have my doubts.)

    Also if you get actual values from variable expansion (as opposed to
    constant literals) you may get surprises.

    At some point I used [in Kornshell] often explicit "base specifiers"
    (which is not generally available in shells), base#number

    $ ksh93u -c 'printf "%d\n" 10#077'
    77

    especially sensible if used with variables containing arbitrary number
    formats where the leading zero is hidden in 'var'.

    $ ksh93u -c 'var=077; printf "%d\n" 10#$var $var'
    77
    63

    Leading zero octals is yet another badly designed language feature,
    and not only in the shell language - but that ship has sailed...

    Janis


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Tue Oct 14 06:29:32 2025
    (Sorry for the delayed reply; your ~450 lines post was too long for
    me to consider a timely reply.)

    On 09.10.2025 05:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:
    On 10/8/2025 8:59 AM, Janis Papanagnou wrote:


    Throughout much of my life, C++ has been around, but using it has often turned into a footgun. Early on the code had a bad habit of breaking
    from one compiler version to another, or the ability to compile C++ code
    in general would be broken (primarily with Cygwin and MinGW; where
    whether or not "g++" worked on a given install attempt, or with a given program, was very hit or miss).

    I used it early on on various Unix platforms; all had some details
    different - like the way how templates worked in the development
    environment - but nothing was really an issue; as with current
    configuration settings this was covered and handled by the build
    system.

    It doesn't astonish me the least if you've faced specific problems
    on the Windows platforms.

    [...]

    In most cases, it left C as a more preferable option.
    C can be made to do the same stuff at similar performance, with often
    only minimal difference in expressive power.

    The problem is, IMO, rather that "C", in the first place, doesn't
    compare to C++ in its level of "expressive power".


    And, the main "powerful" tool of C++, templates,

    (IMO, the main powerful tool was primarily classes, polymorphisms,
    also [real] references.)

    tending to do bad
    things to build times and result in excessive code bloat.

    I recall that initially we had issues with code bloat, but I don't
    recall that it would have been a problem; we handled that (but,
    after that long time, don't ask me how).


    And, if one tries to avoid C++'s drawbacks, the result was mostly code
    that still looks mostly like C.

    (That sounds as if you haven't used OO designs, reference parameters, overloading, and so on, obviously.)


    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.

    IME, *writing* software in "C" requires much more time than in C++;
    presuming you meant that with "most efficient way to do things".

    (Saving a few seconds in "C" compared to C++ programs can hardly be
    relevant, I'd say; unless you were not really familiar with C++ ?
    Or have special application areas, as I read below in the post.)

    [...]

    Some amount of my stuff recently has involved various niche stuff.
    Interfacing with hardware;
    Motor controls;
    Implementing things like an OpenGL back-end or similar;
    Being used for a Boot ROM and OS kernel;
    Sometimes neural nets.

    "Nice. - I've done Neural Net simulations with C++ back these days.)


    Few traditional languages other than C work well at a lot of this.


    A usual argued weakness of C is that it requires manual memory
    management. But, OTOH, you *really* don't want a GC in motor controls or
    an OS kernel or similar.

    Like, if the GC triggers, and an interrupt handler happens at a bad
    time, then you have a problem.

    Or, if you have a 1us timing tolerance for motor controls and this gets
    blown because the GC takes 75ms, etc...

    Sure, you should know where to use static memory, dynamic management
    organized yourself, or "I-don't-want-to-care" and use GC management,
    or a sensible deliberate mixture of that (if the language allows).

    (I've never used GC with C++; is that meanwhile possible?)

    [...]

    Maybe C will be around indefinitely for all I know.

    Not unlikely.


    Like, the passage of time still hasn't totally eliminated FORTRAN and
    COBOL.

    There's obviously some demand. *shrug* - I don't care much. - My last
    "contact" with FORTRAN was when one of my children was asked to handle
    some legacy library code; my suggestion was to get rid of that task.

    And, C is far more commonly used than either.

    Unless maybe something can come along that is a better C than C...

    There's so many languages meanwhile - frankly, there were already a
    lot back then, four decades ago! - so I don't think the proliferation
    will stop; I don't think that evolution is a good thing. It seems that
    often the inventors have their own agenda and the success of languages
    depends mainly on the marketing efforts and the number of fan-people
    that got triggered by newly invented buzzwords, and an own invented
    terminology [for already existing old concepts]!

    [...]

    I certainly agree to what a "clean language" can be.

    My opinion on that is, though, that the "C" base of C++ is part of
    the problem. Which doesn't let it appear to me "C" to be "better"
    than C++, but that the "C" base is part of C++'s problem. (Here
    I'm not speaking about "C++"'s own problems that probably entered
    about with C++0x/C++11, IMO. - Mileages certainly vary.)


    Possibly.


    A new C-like language need not necessarily be strictly C based.

    (There's a couple things I like in "C". But if I'd have to invent a
    language it would certainly not be "C-like". I'd took a higher-level
    [better designed] language as paragon and support the "C" features I
    like, if not already present in that language.)


    My thinking would be likely keeping a similar basic syntax though,
    though likely more syntactically similar to C#,

    (But the syntax is one of C's and descendants' problem, IMO. - Part
    of what was described in existing "C-like" languages is either the
    less-desired elements or deviations, but the latter will probably
    just add to confusion if details are subtle. It's already bad enough
    with subtle differences between different "C" standards it seems.)

    but retaining more in
    terms of implementation with C and C++.

    (But weren't exactly these languages already [partly] invented with
    such an agenda?)


    Would likely simplify or eliminate some infrequently used features in C.

    Possibly:
    Preprocessor, still exists, but its role is reduced.
    Its role can be partly replaced by compiler metadata.
    Trigraphs and digraphs: Gone;
    K&R style declarations, also gone;
    Parser should not depend on previous declarations;
    Non trivial types and declarator syntax: Eliminate;
    ...

    Sounds all reasonable to me.


    Possibly:
    Pointers and arrays can be specified on the type rather than declarator
    (so, more like C# here)

    (Yeah, but mind the comments on effects of "subtle differences".)

    [...]

    Though, the harder problem here isn't necessarily that of designing or implementing it, but more in how to make its use preferable to jus
    staying with C.

    Well, as formulated, that's an individual thing. Meanwhile I have the
    freedom to use what I like in my recreational activities, but if we
    consider professional projects there's conditions and requirements to
    take into account.


    One merit is if code can be copy-pasted, but if one has to change all instances of:
    char *s0, *s1;
    To:
    char* s0, s1;

    Such changes would be annoying. (And I say that with a strong aversion
    of C's declaration syntax.) - For me, "C" is not a good base; neither
    to keep its bad syntax nor to have to change it alike in subtle ways.

    My style is anyway another; [mostly] separate declarations, and those initialized, as in

    char * s0 = some_alloc (...);
    char * s1 = 0;

    More important is that such declarations may appear anywhere not just
    at the begin of a block. (I'm still traumatized by K&R, I suppose.)

    [...]

    Java and C# had made 'char' 16-bit, but I now suspect this may have been
    a mistake. It may be preferable instead keep 'char' as 8 bits and make
    UTF-8 the default string format. In the vast majority of cases, strings
    hold primarily or entirely ASCII characters.

    I think we should be careful here! An Unicode "character" may require
    even 32 bit, but UTF-8 is just an "encoding" (in units of an octet).
    If we want a sensible type system defined we should be aware of that difference. The question is; what shall be expressed by a 'char' type;
    the semantic entity or the transfer syntax. (This question is similar
    to the Unix file system, also based on octets; that made it possible
    to represent any international multi-octet characters. There's some
    layer necessary to get from the "transfer-syntax" (the encoding) to
    the representation.) - What will, say, a "C" user expect from 'char';
    just move it around or represent it on some output (or input) medium.


    Also, can probably have a string type:
    string str="Some String";
    But, then allow that string is freely cast to "char*", ...

    (Wasn't that so in C++? - And in addition there's the corresponding
    template classes, IIRC. - But I don't recall all the gory details.)

    Well, and that the underlying representation of a string is still as a pointer into a string-table or similar.

    Also the design of the standard library should remain conservative and
    not add piles of needless wrappers or cruft.

    Not sure what you have in mind here.

    Personally, despite some resentment on some of the complex syntax
    and constructs necessary, I liked the C++ STL; its orthogonality
    and concepts in principle. (And especially if compared to some
    other languages' ad hoc "tool-chest" libraries I stumbled across.)

    [...]

    Like, one can throw out the whole mess that is dealing with
    Multiple-Inheritance

    Well, when I started with C++ there wasn't multiple-inheritance
    available. Personally thinking its omission would be a mistake;
    I missed it back these day.

    I'm not sure what "mess" you have in mind. - Explicit qualification
    isn't a hindrance. Weakening the independence of classes in complex
    multi-level class-topologies is something under control of the
    program designer. - So it's fine to have it with all design options
    it opens.

    There is both implementation complexity of MI, and also some added
    complexity with using it. The complexity gets messy.

    (Okay, if that's what you took from it, I of course accept it.
    But I'd have more expected that you might have dislike of some
    STL parts than [multiple] inheritance.)



    The SI + Interfaces model can reduce both.

    I've used classes with only "pure virtual" functions to achieve
    the interface abstraction; since I could easily design what I
    needed with standard features and practically no overhead I thus
    wasn't missing the 'interface' feature.

    (But of course I can see the implementation argument you make.)

    Granted, these can grow their own warts (like default methods or
    similar), but arguably still not as bad as MI.

    (Well, I appreciated it to have that feature available in C++,
    even though my first OO language, Simula, didn't support it, so
    I was used to not having it when I got into C++ and liked it.)


    I am more thinking from the perspective of implementing a compiler.

    Hah! Yeah. - Recently in another NG someone disliked a feature
    because he had suffered from troubles implementing it. (It was
    not MI but formatted I/O in that case.) - I'm not implementing
    complex languages, so I guess I can feel lucky if someone else
    did the language implementation job and I can just use it.

    [ implementation issues snipped and gracefully skipped ]

    [...]
    Virtual inheritance still means one can't just call the copy logic for
    each parent class when copying a derived class;

    (I don't think I agree here. - Or are you still talking of the
    implementers' challenges? - But never mind. Programming in C++
    I could model everything I liked. That was really nice.)

    Janis

    [...]



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Tue Oct 14 20:13:06 2025
    On 10/13/2025 11:29 PM, Janis Papanagnou wrote:
    (Sorry for the delayed reply; your ~450 lines post was too long for
    me to consider a timely reply.)

    On 09.10.2025 05:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:
    On 10/8/2025 8:59 AM, Janis Papanagnou wrote:


    Throughout much of my life, C++ has been around, but using it has often
    turned into a footgun. Early on the code had a bad habit of breaking
    from one compiler version to another, or the ability to compile C++ code
    in general would be broken (primarily with Cygwin and MinGW; where
    whether or not "g++" worked on a given install attempt, or with a given
    program, was very hit or miss).

    I used it early on on various Unix platforms; all had some details
    different - like the way how templates worked in the development
    environment - but nothing was really an issue; as with current
    configuration settings this was covered and handled by the build
    system.

    It doesn't astonish me the least if you've faced specific problems
    on the Windows platforms.


    It was pretty variable, but usual thing was that trying to build any
    kind of C++ code (even a trivial "Hello World") would on some installs
    of these compilers, simply die in a storm of error messages.

    Well, and for a given Cygwin install attempt, whether or not "g++" would
    work, etc, was a bit like playing roulette.


    After switching to MSVC, things were a little more stable here.
    But, by then there were other issues.


    [...]

    In most cases, it left C as a more preferable option.
    C can be made to do the same stuff at similar performance, with often
    only minimal difference in expressive power.

    The problem is, IMO, rather that "C", in the first place, doesn't
    compare to C++ in its level of "expressive power".


    ?...

    I have yet to find much that can be expressed in C++ but is not also expressible in C.


    The main things that are fundamentally different, are things like
    Exceptions and RTTI, but even in C++, these don't come free.

    Though, if exceptions are implemented using an approach similar to VEH
    in the Windows X64 ABI, it is at least modest.



    And, the main "powerful" tool of C++, templates,

    (IMO, the main powerful tool was primarily classes, polymorphisms,
    also [real] references.)


    These can be done in C via manually written vtables, and passing the
    address of a variable.


    tending to do bad
    things to build times and result in excessive code bloat.

    I recall that initially we had issues with code bloat, but I don't
    recall that it would have been a problem; we handled that (but,
    after that long time, don't ask me how).


    And, if one tries to avoid C++'s drawbacks, the result was mostly code
    that still looks mostly like C.

    (That sounds as if you haven't used OO designs, reference parameters, overloading, and so on, obviously.)


    We can do OO, just using a different approach, say:
    typedef struct FooObj_s FooObj;
    typedef struct FooObj_vt_s FooObj_vt;
    struct FooObj_vt_s (
    void *resv1;
    void *resv2;
    void *resv3;
    void *resv4;
    int (*Method1)(FooObj *self, int x, int y);
    int (*Method2)(FooObj *self, int x, int y, int z);
    ...
    };
    struct FooObj_s {
    FooObj_vt *vt;
    int w;
    ...
    };

    And, references as:
    int someFunction(int *rvar);
    ...
    someFunction(&somevar);

    It all works, and doesn't require significantly more LOC than it would
    have in C++.



    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.

    IME, *writing* software in "C" requires much more time than in C++;
    presuming you meant that with "most efficient way to do things".

    (Saving a few seconds in "C" compared to C++ programs can hardly be
    relevant, I'd say; unless you were not really familiar with C++ ?
    Or have special application areas, as I read below in the post.)


    Main limiting factor at present is that it is a harder issue to write a non-trivial C++ compiler.

    I could write C++ code, but then it isn't really portable outside
    running on my PC or similar.


    Though, I have a mostly usable C compiler at least.
    At least, usable for porting single programs.
    Trying to port something like the Linux userland, not so much.
    Too much stuff here is written to assume GCC.

    Some simple programs worked with "./configure" scripts and getting it to
    mimic GCC enough that configure will try to use it as a cross compiler,
    but then programs invariably break when trying to use various GCC'isms
    or trying to rely on glibc specific stuff or other Linux specific
    headers or so on.

    So, yeah, nowhere near up to the level of trying to deal with trying to
    port "bash" and "coreutils" and similar.

    But, was able to experimentally port things like "Quake 3 Arena" and
    similar, though Q3A is a little impractical on a 50MHz CPU; but Doom
    runs well.


    Granted, a new language would not really address and of the "make
    existing software work" issues.


    [...]

    Some amount of my stuff recently has involved various niche stuff.
    Interfacing with hardware;
    Motor controls;
    Implementing things like an OpenGL back-end or similar;
    Being used for a Boot ROM and OS kernel;
    Sometimes neural nets.

    "Nice. - I've done Neural Net simulations with C++ back these days.)


    I have experimented with some, but in this case mostly using a lot of SIMD.


    I had noted that in some cases, like primarily SIMD heavy NN code, my
    50MHz FPGA soft-processor could compete surprisingly well with an early
    2000s laptop.


    But, then again, also noted by benchmarking said laptop:
    memcpy: ~ 450 MB/sec;
    x87 multiply-accumulate: ~ 60 MFLOP.
    CPU speed: 1400 MHz, 32-bit x86.
    Has MMX and similar, but was not using MMX.
    Had noted the process is mostly bandwidth limited.


    The SIMD unit on my soft-processor has a theoretical hard-limit of 200
    MFLOP, but if using compact formats (mostly FP8 for storage, FP16
    internally) and careful pipelining, can approach similar performance to
    the laptop at this task.

    I had experimented some with more compact encodings for weights, for
    example:
    FP8U A/B (E4.M4)
    3-bit per value: S,FF (S=Sign)
    A<=B: Interpolated: A, B, (5/8)A+(3/8)B, (5/8)B+(3/8)A
    A> B: Similar, but '111' encodes 0, ...
    With interpolation as bytes, results unpacked into vectors of 4x
    Binary16 (with 4 weight vectors in 64 bits).

    Could also be used as:
    Monochrome HDR format with 16 texels per block;
    A color HDR format with 4 texels per block.


    In this case, there is a big gap in main RAM bandwidth, which seems to
    be a big issue with this task. But, the RAM bandwidth gap is reduced by
    using more compact storage for values (with special-purpose
    instructions, like a Load that also performs the 4xFP8 to 4xFP16
    conversion, ...).


    But, yeah, getting "even anywhere close" is kinda notable given the
    laptop has a 28x clock-speed advantage.

    Though, the laptop is far more powerful at running Quake and similar (no contest regarding Quake performance).



    The FPGA soft-processor could maybe compete better if it could do
    Binary16 SIMD multiply-accumulate operations, but the latency would be
    too high and I couldn't pipeline it.

    Getting this result requires writing ASM and manually scheduling the
    pipeline though.




    Few traditional languages other than C work well at a lot of this.


    A usual argued weakness of C is that it requires manual memory
    management. But, OTOH, you *really* don't want a GC in motor controls or
    an OS kernel or similar.

    Like, if the GC triggers, and an interrupt handler happens at a bad
    time, then you have a problem.

    Or, if you have a 1us timing tolerance for motor controls and this gets
    blown because the GC takes 75ms, etc...

    Sure, you should know where to use static memory, dynamic management organized yourself, or "I-don't-want-to-care" and use GC management,
    or a sensible deliberate mixture of that (if the language allows).

    (I've never used GC with C++; is that meanwhile possible?)


    It is possible to use conservative mark/sweep collectors in C.
    But performance leaves something to be desired.

    Younger me tried to do this, but even for things like 3D engines, I
    ended up trying more to find ways to avoid needing to run the GC.


    [...]

    Maybe C will be around indefinitely for all I know.

    Not unlikely.


    Like, the passage of time still hasn't totally eliminated FORTRAN and
    COBOL.

    There's obviously some demand. *shrug* - I don't care much. - My last "contact" with FORTRAN was when one of my children was asked to handle
    some legacy library code; my suggestion was to get rid of that task.


    In my case, I don't have any descendants.

    Apparently they still exist in some places, mostly as languages that no
    one uses.

    Seemingly a lot of businesses made a migration from COBOL to Java.


    And, C is far more commonly used than either.

    Unless maybe something can come along that is a better C than C...

    There's so many languages meanwhile - frankly, there were already a
    lot back then, four decades ago! - so I don't think the proliferation
    will stop; I don't think that evolution is a good thing. It seems that
    often the inventors have their own agenda and the success of languages depends mainly on the marketing efforts and the number of fan-people
    that got triggered by newly invented buzzwords, and an own invented terminology [for already existing old concepts]!


    Apparently the languages people are trying to push as C replacements are mostly Rust, Zig, and Go.

    None of these particularly compel me though.
    They seem more like needless deviations from C than a true successor.


    I guess the older generations mostly had Pascal and Ada.

    There was ALGOL, but both C and Pascal descended from ALGOL.


    [...]

    I certainly agree to what a "clean language" can be.

    My opinion on that is, though, that the "C" base of C++ is part of
    the problem. Which doesn't let it appear to me "C" to be "better"
    than C++, but that the "C" base is part of C++'s problem. (Here
    I'm not speaking about "C++"'s own problems that probably entered
    about with C++0x/C++11, IMO. - Mileages certainly vary.)


    Possibly.


    A new C-like language need not necessarily be strictly C based.

    (There's a couple things I like in "C". But if I'd have to invent a
    language it would certainly not be "C-like". I'd took a higher-level
    [better designed] language as paragon and support the "C" features I
    like, if not already present in that language.)


    I would think some major goals might be:
    Allowing for a compiler with a smaller code footprint.
    Though, the backend is often a big source of pain here.
    Language is reasonably clean and orthogonal;
    Is amendable to efficient code generation;
    Low requirements for implementation overhead.
    Should aim for similar hard constraints to C.
    Should still be usable for bare-metal and firmware.
    And for OS kernel programming.

    Sadly, cleaning up the frontend language wont do as much to simplify the backend.

    Cleaning up the backend mostly means needing to limit complexity in
    areas that effect code generation:
    Corner cases in data representation and the type-system;
    Conner cases in the native ABI;
    ...

    My preference is to keep a C family syntax, sorta like C# or GLSL.



    As noted elsewhere, my thinking is partly that pipeline looks like:
    Preprocessor (basic or optional, C like)
    Parser (Context-independent, generates ASTs)
    Front end compiler: Compiles ASTs to a stack IL.
    With front-end semi-type-aware.

    Core language should only require frontend to understand primitive types
    (like in C# and Java; with complex types offloaded to backend). Would
    aim to eliminate headers mostly because headers add considerable bulk to
    the ASTs (far more time and memory often spent dealing with header stuff
    than the actual code in the translation units).

    Backend:
    IL -> 3AC/SSA;
    Does code generation and similar.

    Likely, most packaging (for IL object files and static libraries) would
    be based around a variation of the WAD format (probably WAD2 based;
    though in simple cases the Doom IWAD/PWAD format works well).


    One of my past (stalled) attempts at doing a smaller C compiler had been
    using a modified WAD in place of the COFF format, though in this case it
    is debatable how much was really saved by using WAD in place of COFF
    here (and some of the tables partly derived from ELF as well).


    It is kinda pros/cons between modified WAD and RIFF-style TLV formats.
    RIFF is more traditional;
    But, WAD sometimes fits use-patterns better, and can be adapted to
    different contexts.

    One use is to add an additional magic to encode the use-case of the header.


    Would be nice if I could manage the thing of fitting a full-featured
    compiler in under 40k lines.



    My thinking would be likely keeping a similar basic syntax though,
    though likely more syntactically similar to C#,

    (But the syntax is one of C's and descendants' problem, IMO. - Part
    of what was described in existing "C-like" languages is either the less-desired elements or deviations, but the latter will probably
    just add to confusion if details are subtle. It's already bad enough
    with subtle differences between different "C" standards it seems.)


    Some simplification is possible, particularly regarding things like declarations; without drastically changing the look of the language.

    So, the language may still look like C, but be a little easier to parse.

    Keeping the general syntax intact helps with familiarity and ease of
    writing code for those who already know similar languages. Though, yes, looking mostly similar to C, but not exactly, could annoy some people.

    Though, mostly, the syntax could follow similar patterns to C# and Java.
    Would differ from Java mostly in the avoidance of needless verbosity;
    and allowing a more free-form program structure.


    It is also possible to allow for a subset of code that is valid in both languages.


    but retaining more in
    terms of implementation with C and C++.

    (But weren't exactly these languages already [partly] invented with
    such an agenda?)


    ?...


    I am imagining something that basically does similar stuff to what C
    already does, and can ideally be used in a similar context.

    The main downsides is that C and C++ are more complicated than ideal in
    many areas. This has a detrimental effect on compilers.

    Not so much intending to make a language that tries to be more intuitive
    or hand-holding though. However, if it is possible to make provisions
    for things like static-analysis or bounds-checked arrays (in a way that ideally doesn't adversely effect performance), this can be nice.

    In some cases, one can try to pass some compile-tie metadata through the type-system, but this has the downside of adding complexity for the
    compiler. Though, it could be allowed in cases where mostly relevant for static analysis but without actually adding new requirements for the implementation.

    Say, for example:
    "int*" and "int[]" are equivalent for the ABI and for a minimal implementation, but a more advanced implementation is allowed to
    constrain the allowed semantics for "int[]" in ways that would not
    necessarily be valid for "int*".

    Well, and you could "have your cake and eat it too", say, by having
    "int[]" and friends allow for very aggressive TBAA (type-based alias
    analysis) but "int*" is assumed to alias readily.

    Also, "int[]" can be assumed to potentially convey implicit array
    bounds, whereas "int*" can be assumed to not convey array bounds (even
    if the compiler represents both, at the machine level, as a bare pointer
    to the first element of the array).



    So, for example:
    Foo obj1;
    Bar obj2;
    Where obj1 and obj2 may only alias in the case of subclass/superclass relationships (but, if not potentially the same class instance; can be
    assumed that no alias is possible).


    Yeah, I am aware there is the "provenance" thing, but personally I fail
    to understand how exactly the provenance model is supposed to work (so
    it makes more sense to me to operate within the limits of more
    conventional aliasing semantics; and defining rules for when it is safe,
    and when it is not safe, to assume non-alias based on types or similar).

    Well, and personally I feel "assume TBAA may wreck your day; so just use memcpy()" to be a crappy solution. I also sympathize with the desire to
    not ask people to put "restrict" or similar all over the place (and to
    be able to optimize stuff by assuming that non-aliasing things don't alias).


    Realistically, asking the compiler to infer any value flow outside of a
    single frame is asking too much. So my model would likely assume that
    each function exists as its own island in terms of pointer aliasing.

    Well, and a simpler model (used by BGBCC) where taking the address of something, etc, effectively nukes the ability to assume non-alias.




    Would likely simplify or eliminate some infrequently used features in C.

    Possibly:
    Preprocessor, still exists, but its role is reduced.
    Its role can be partly replaced by compiler metadata.
    Trigraphs and digraphs: Gone;
    K&R style declarations, also gone;
    Parser should not depend on previous declarations;
    Non trivial types and declarator syntax: Eliminate;
    ...

    Sounds all reasonable to me.


    Possibly:
    Pointers and arrays can be specified on the type rather than declarator
    (so, more like C# here)

    (Yeah, but mind the comments on effects of "subtle differences".)


    Possible, though if there is a type mismatch here, most likely the
    compiler will error-out.


    [...]

    Though, the harder problem here isn't necessarily that of designing or
    implementing it, but more in how to make its use preferable to jus
    staying with C.

    Well, as formulated, that's an individual thing. Meanwhile I have the
    freedom to use what I like in my recreational activities, but if we
    consider professional projects there's conditions and requirements to
    take into account.


    Probably.

    I would like something to be a "good" alternative to C, while:
    Allowing cheap/simple compiler;
    Rules turned to make static analysis less of a pain;
    Doesn't overly hinder a memory-safe implementation;
    But, also can be used for machine-level development;
    Shouldn't be overly unfamiliar to those who know similar languages.




    One merit is if code can be copy-pasted, but if one has to change all
    instances of:
    char *s0, *s1;
    To:
    char* s0, s1;

    Such changes would be annoying. (And I say that with a strong aversion
    of C's declaration syntax.) - For me, "C" is not a good base; neither
    to keep its bad syntax nor to have to change it alike in subtle ways.

    My style is anyway another; [mostly] separate declarations, and those initialized, as in

    char * s0 = some_alloc (...);
    char * s1 = 0;

    More important is that such declarations may appear anywhere not just
    at the begin of a block. (I'm still traumatized by K&R, I suppose.)


    Yeah.

    I would assume allowing putting declarations wherever.




    [...]

    Java and C# had made 'char' 16-bit, but I now suspect this may have been
    a mistake. It may be preferable instead keep 'char' as 8 bits and make
    UTF-8 the default string format. In the vast majority of cases, strings
    hold primarily or entirely ASCII characters.

    I think we should be careful here! An Unicode "character" may require
    even 32 bit, but UTF-8 is just an "encoding" (in units of an octet).
    If we want a sensible type system defined we should be aware of that difference. The question is; what shall be expressed by a 'char' type;
    the semantic entity or the transfer syntax. (This question is similar
    to the Unix file system, also based on octets; that made it possible
    to represent any international multi-octet characters. There's some
    layer necessary to get from the "transfer-syntax" (the encoding) to
    the representation.) - What will, say, a "C" user expect from 'char';
    just move it around or represent it on some output (or input) medium.


    It is a tradeoff.
    But, if "char*" can point to a string, then "char" needs to be the same
    size as an item in memory (thus, probably a byte).

    Otherwise, it would make sense to have "char" as an alias to "int" and
    require "ubyte*" for use as strings. For consistency with C, makes more
    sense to assume char to be a byte.


    Also, can probably have a string type:
    string str="Some String";
    But, then allow that string is freely cast to "char*", ...

    (Wasn't that so in C++? - And in addition there's the corresponding
    template classes, IIRC. - But I don't recall all the gory details.)


    C++ string seemingly assumes some sort of object representation (that
    could be cast to a pointer).


    I am more assuming that it is an implementation type which would be represented as basically the equivalent of "const char *restrict".

    But, with the compiler able to assume that it is a string type, so one
    of either:
    A pointer to a string literal in some presumably read-only memory area;
    A character array or buffer that was interned into a string table.

    But not "a pointer to a modifiable character buffer".

    In the latter case, "char*" or "char[]" would be considered the correct
    types to use.

    Nominally, "string" would likely not allow pointer arithmetic, but could
    decay into "const char *" or similar, which would allow pointer arithmetic.


    While, object-based strings are a perennial feature in many languages,
    having them as anything much more complex than a pointer to a string
    table adds overhead.

    One can argue that one merit of object-based representations is that
    then you don't have to use a generic "strlen()"; but for constant
    strings and string tables, there is a workaround that I have used in
    some of my own past languages:
    Look at the preceding byte:
    00: Raw string, you will need to "strlen()" it;
    01..BF: We are not looking at the start of a string.
    C0..EF: Length-prefix present
    Encoded as a byte-transposed UTF-8 value.

    The prefix can also be used to encode the character encoding, but for
    this case I will assume it is always UTF-8.


    Well, and that the underlying representation of a string is still as a
    pointer into a string-table or similar.

    Also the design of the standard library should remain conservative and
    not add piles of needless wrappers or cruft.

    Not sure what you have in mind here.

    Personally, despite some resentment on some of the complex syntax
    and constructs necessary, I liked the C++ STL; its orthogonality
    and concepts in principle. (And especially if compared to some
    other languages' ad hoc "tool-chest" libraries I stumbled across.)


    I was primarily thinking of Java and its excessive piles of wrapper
    classes. Like, C gives you the stdio functions, which are basic but
    effective.

    Java has:
    WhateverInputStream, WhateverOutputStream,
    WhateverRandomAccessWhateverStream, etc.

    We don't need this. Java just sort of ran with it, creating piles of
    random wrapper classes whose existence serves almost no practical
    purpose (and would have been much better served, say, by simply
    providing a File class that holds a mock-up of C's stdio interface;
    which is, ironically, closer to the approach C# had taken here).


    The great sin here of C++ is mostly things like iostream.


    I would in any case assume not following Java's pattern of "overly bureaucratic boilerplate". Well, or assuming that programmers can't
    think for themselves and will just look through a word-salad list until
    they find whatever class has the combination of words describing the
    specific task they intend to do.



    [...]

    Like, one can throw out the whole mess that is dealing with
    Multiple-Inheritance

    Well, when I started with C++ there wasn't multiple-inheritance
    available. Personally thinking its omission would be a mistake;
    I missed it back these day.

    I'm not sure what "mess" you have in mind. - Explicit qualification
    isn't a hindrance. Weakening the independence of classes in complex
    multi-level class-topologies is something under control of the
    program designer. - So it's fine to have it with all design options
    it opens.

    There is both implementation complexity of MI, and also some added
    complexity with using it. The complexity gets messy.

    (Okay, if that's what you took from it, I of course accept it.
    But I'd have more expected that you might have dislike of some
    STL parts than [multiple] inheritance.)


    Not exactly a fan of STL either, but these are different.

    As noted, my concern here is more for compiler complexity, and MI is
    more a big thorn in the side to anyone who wants to write their own
    compiler.




    The SI + Interfaces model can reduce both.

    I've used classes with only "pure virtual" functions to achieve
    the interface abstraction; since I could easily design what I
    needed with standard features and practically no overhead I thus
    wasn't missing the 'interface' feature.

    (But of course I can see the implementation argument you make.)


    Yeah.

    An "abstract base class" can be inferred to be an interface.

    So, one could also end up with a C++ implementation that allows:
    single inheritance;
    abstract base classes.

    But doesn't allow true MI.
    Ironically this is closer to how BGBCC's attempt at C++ turned out.



    Granted, these can grow their own warts (like default methods or
    similar), but arguably still not as bad as MI.

    (Well, I appreciated it to have that feature available in C++,
    even though my first OO language, Simula, didn't support it, so
    I was used to not having it when I got into C++ and liked it.)


    I am more thinking from the perspective of implementing a compiler.

    Hah! Yeah. - Recently in another NG someone disliked a feature
    because he had suffered from troubles implementing it. (It was
    not MI but formatted I/O in that case.) - I'm not implementing
    complex languages, so I guess I can feel lucky if someone else
    did the language implementation job and I can just use it.


    I am writing from the POV of someone who did start making an attempt to implement C++ support, and mostly gave up at roughly an early 1990s
    feature level.

    If you dropped MI, templates, and pretty much everything following from
    these, stuff would be a lot easier.



    [ implementation issues snipped and gracefully skipped ]

    [...]
    Virtual inheritance still means one can't just call the copy logic for
    each parent class when copying a derived class;

    (I don't think I agree here. - Or are you still talking of the
    implementers' challenges? - But never mind. Programming in C++
    I could model everything I liked. That was really nice.)


    Still implementation.


    Theoretically, any combination of features that is allowed in the
    language should also be allowed by the compiler.

    With simpler "POD" classes, it is mostly a "memcpy()" internally.

    With things like virtual inheritance and non-trivial inheritance
    patterns, "all hell breaks loose".


    It solves "the diamond inheritance problem" from the perspective of the
    user, but creates a new problem for the implementation:
    Now the in-memory layout of the parent classes depends on how they are
    used within the derived class.

    Combined with the ability to assign classes by-value, as far as compiler implementation goes, now you have a mess on your hands.



    Janis

    [...]




    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Wed Oct 15 11:26:52 2025
    On 15/10/2025 02:13, BGB wrote:

    Apparently the languages people are trying to push as C replacements are mostly Rust, Zig, and Go.

    None of these particularly compel me though.
    ’ They seem more like needless deviations from C than a true successor.

    So what would a true successor look like?



    I guess the older generations mostly had Pascal and Ada.

    There was ALGOL, but both C and Pascal descended from ALGOL.

    I've heard that before that C was somehow derived from Algol and even
    Algol 68.

    But it is so utterly unlike either of those, that if it's from the same family, then it must have been adopted.


    As noted elsewhere, my thinking is partly that pipeline looks like:
    ’ Preprocessor (basic or optional, C like)
    ’ Parser (Context-independent, generates ASTs)
    ’ Front end compiler: Compiles ASTs to a stack IL.

    Backend:
    ’ IL -> 3AC/SSA;

    That's odd: you're going from a stack IL to a 3AC non-stack IR/IL?

    Why not go straight to 3AC?

    (I've tried both stack and 3AC ILs, but not both in the same compiler! I finally decided to stay with stack; 3AC code *always* got too fiddly to
    deal with.

    So stack IL is directly translated to register-based, unoptimised native
    code, which reasonably efficient. Performance is usually somewhere in
    between Tiny C and gcc-O2.)



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Wed Oct 15 13:00:19 2025
    On 10/15/2025 5:26 AM, bart wrote:
    On 15/10/2025 02:13, BGB wrote:

    Apparently the languages people are trying to push as C replacements
    are mostly Rust, Zig, and Go.

    None of these particularly compel me though.
    ’’ They seem more like needless deviations from C than a true successor.

    So what would a true successor look like?


    Probably sorta like C with a few vaguely C++ like features, but with a
    cleaner and simpler design.

    Should ideally be usable for similar stuff to C.
    Not drastically or needlessly different.

    Looking around, it seems like the CMU C0 and C1 teaching languages also
    seem in the general area design-wise, though they exist more as limited
    C-like subset languages intended more for introductory programming for
    CS courses.

    Could make sense to have some C++ style functionality, but with an aim
    of not going down the rabbit hole of adding excessive implementation complexity.




    I guess the older generations mostly had Pascal and Ada.

    There was ALGOL, but both C and Pascal descended from ALGOL.

    I've heard that before that C was somehow derived from Algol and even
    Algol 68.

    But it is so utterly unlike either of those, that if it's from the same family, then it must have been adopted.


    Idea is that it went ALGOL -> BCPL -> B -> C.
    Going the other way, ALGOL was derived from FORTRAN.

    ALGOL was also the ancestor of Pascal and Ada, so there was a bit of
    mutation there,



    As noted elsewhere, my thinking is partly that pipeline looks like:
    ’’ Preprocessor (basic or optional, C like)
    ’’ Parser (Context-independent, generates ASTs)
    ’’ Front end compiler: Compiles ASTs to a stack IL.

    Backend:
    ’’ IL -> 3AC/SSA;

    That's odd: you're going from a stack IL to a 3AC non-stack IR/IL?

    Why not go straight to 3AC?

    (I've tried both stack and 3AC ILs, but not both in the same compiler! I finally decided to stay with stack; 3AC code *always* got too fiddly to
    deal with.


    Well, the downside of 3AC (as an IL) is that it tends to be fiddly and
    often is much more specific to the design choices of the frontend and
    backend that produced it.

    Also, going from a Stack IL to 3AC is fairly easy, and generally less of
    a mess than dealing with a 3AC IL here. Also with 3AC one has to decide
    on things like whether or not it is in SSA form, as SSA vs non-SSA
    follow different rules.


    Downside is that a stack IL is often further from the code you "actually
    want to generate" than a 3AC IL would have been (and to generate more efficient 3AC you may need to generate less-concise stack code, such as
    my having the frontend manually use temporary variables, partly negating
    some of the conceptual benefits of a stack IR, but alas).

    But, on the positive side, the stack manipulations/etc map readily to
    SSA form.


    A stack IL that makes sense for a compiler might look like:
    Stack ops for each major operator;
    No explicit types in most instructions.
    Type can be carried along the stack.
    The .NET IL also did this.
    Control flow is via labels and conditional branches.
    Typically no items on the stack during a branch.
    May make sense to combine common stack-ops with storing to a variable.
    Say: "ADD; STORE n" => "ADD_ST n"
    Rationale being that this is less work for the backend.
    Types can be identified by signature strings.


    Granted, one can note that a stack IL typically needs around 70% more operations than you would need for a 3AC, but most of these operations
    will disappear in the conversion process.

    one semi-unresolved design issue is whether it is better to have a
    single unified numbering space for local variables, like in the JVM and similar, or several different numbering spaces (arguments, locals, and temporary variables). In my ILs, I have often ended up going for the latter.

    Say, for example, you can encode the "name"/"symbol" for Load/Store/Etc
    as a VLN, say:
    0xxxxxxx: 0..127
    10xxxxxx xxxxxxxx 128..16383
    110xxxxx ...: 16384..2M
    ...
    And then use a tagging scheme to encode variable IDs, say:
    ...xxxx00 Local
    ...xxxx10 Temporary
    ...xxx001 Argument
    ...xxx101 Int32 Literal
    ...xx0011 Global Variable
    ...xx1011 String Literal

    Where Locals and Temporaries are given the shortest code as these are
    more common and preferably have shorter (single byte) encodings when
    possible (so, for example, the first 32 local variables can be single
    byte, etc).

    For integer literals, one can additionally use a zigzag coding
    (0,-1,1,-2,2, ...). String literals can be encoded as an offset into a
    string table.

    for something like a typecast operator, you might encode an offset into
    a string table for a type-signature string.

    ...

    Well, sorta, the IL used in BGBCC isn't quite so clean.
    It instead encodes strings and symbols inline, and uses a sliding table
    to refer back to them when they repeat. This also works, but is more
    ugly than encoding IDs and using a string table might have been.

    But, string tables make more sense for an externally-structured format.


    Ironically, came up with a possible format for manifest files (loosely
    WAD based) that could also make sense as an IL packaging format.

    Ended up going back and forth between having it be WAD2 or WAD4 based,
    instead ended up with a compromise of supporting mixed 32 and 64 byte
    entries. Would have a tree structure similar to WAD4, but with the
    downside that for the 32-byte entries names are reduced to 10 bytes (vs
    32 bytes for the 64-byte entries; or 16 bytes in the original WAD2 format).

    But, can debate whether or not this would make sense in a
    space-efficiency sense. The design is more focused on semi-efficient
    random access rather than compactness (whereas typically bytecode IL
    packaging is more focused on being compact).

    Though, compactness may not matter as much for things like object-files
    which are less likely to be used to actually distribute code.


    Though, one merit is that it could more easily allow for a compiler that decodes stack-IR into 3AC one function at a time, or demand-loads parts
    of the image, rather than needing to load everything for the whole
    program in advance (and burning a lot of RAM this way).

    Annoyingly, even a simple format like IWAD would still end up needing 16
    bytes per entry.

    But, it can offer more flexibility (and not needing an additional
    mechanism to look things up by QName), say, if compared with a format
    like RIFF (which has an 8-byte minimum overhead per lump). Well, and the scheme as-is, allows lumps with <= 12 bytes of payload to encode it
    inline in a 32-byte entry, so... Might not be too far behind, may just
    make sense to use it, and then LZ compress it if it needs to be more
    compact.



    So stack IL is directly translated to register-based, unoptimised native code, which reasonably efficient. Performance is usually somewhere in between Tiny C and gcc-O2.)



    I usually go stack -> 3AC, and then 3AC -> Native.
    In BGBCC, as-is, there was no separate assembler step, but I now realize
    this probably isn't ideal (and I still end up needing an assembler
    anyways, just now it is a little more of a mess as it isn't really
    cleanly separated from the rest of the 3AC->Native backend).

    Long ago, when I originally write it, there was an x86 backend, which
    didn't use an assembler. For the SuperH backend, I initially skipped
    having an assembler, and my current backend (targeting my BJX2 ISA and
    also RISC-V) was derived from a fork off the SuperH backend.

    So it continued in not having an assembler, and generating instructions
    with big "switch()" blocks, which also scales poorly, but sorta makes
    sense when the ISA is smallish.

    The stalled out new compiler would have used an assembler, and more so,
    an assembler driven by an instruction listing table.


    It stalled out though because code footprint quickly exceeded my
    original target and still wasn't done enough to be useful.

    I was also trying for a more traditional compiler design (one
    translation unit at a time, producing native object files as an
    intermediate step, with a native-code linker). I suspect now this may
    have been a mistake, and Frontend->IL + IL->Native may be a better option.



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Thu Oct 16 06:37:34 2025
    On 15.10.2025 03:13, BGB wrote:
    On 10/13/2025 11:29 PM, Janis Papanagnou wrote:
    (Sorry for the delayed reply; your ~450 lines post was too long for
    me to consider a timely reply.)

    (Now ~800 lines; it escalates.)


    On 09.10.2025 05:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:
    On 10/8/2025 8:59 AM, Janis Papanagnou wrote:

    [...]

    Well, and for a given Cygwin install attempt, whether or not "g++" would work, etc, was a bit like playing roulette.

    I didn't "like" Cygwin, but also never had any "roulette" experience.


    [...]

    In most cases, it left C as a more preferable option.
    C can be made to do the same stuff at similar performance, with often
    only minimal difference in expressive power.

    The problem is, IMO, rather that "C", in the first place, doesn't
    compare to C++ in its level of "expressive power".

    ?...

    I have yet to find much that can be expressed in C++ but is not also expressible in C.

    You may adhere to another sort of expressiveness than I. (For me
    assembler, for example, is not more expressive than "C".) It's all
    about expressing "complex" things in easy ways.

    The main things that are fundamentally different, are things like
    Exceptions and RTTI, but even in C++, these don't come free.

    Back then they said that exceptions come for "almost free" (or so);
    I've never counted the seconds difference, since our project goals
    and priorities were lying on other factors.

    RTTI, OTOH, I rarely used in the first place. Part of it was due to
    my design principle to avoid casts; here (WRT RTTI), dynamic casts.
    This features wasn't often used in our projects.

    Though, if exceptions are implemented using an approach similar to VEH
    in the Windows X64 ABI, it is at least modest.



    And, the main "powerful" tool of C++, templates,

    (IMO, the main powerful tool was primarily classes, polymorphisms,
    also [real] references.)

    These can be done in C via manually written vtables, and passing the
    address of a variable.

    (Yes, and you can also do it in assembler. - But that's not the point
    of using higher level structuring features. - Frankly, I'm so stumped
    that you wrote such a strange thing that I suppose it makes no sense
    to discuss that point further with you; our views, it seems, are here
    so fundamentally different, obviously.)

    [...]

    We can do OO, just using a different approach, say:
    [...]

    *shudder*

    It all works, and doesn't require significantly more LOC than it would
    have in C++.


    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way.

    IME, *writing* software in "C" requires much more time than in C++;
    presuming you meant that with "most efficient way to do things".

    (Saving a few seconds in "C" compared to C++ programs can hardly be
    relevant, I'd say; unless you were not really familiar with C++ ?
    Or have special application areas, as I read below in the post.)


    Main limiting factor at present is that it is a harder issue to write a non-trivial C++ compiler.

    I could write C++ code, but then it isn't really portable outside
    running on my PC or similar.

    (We've used it in professional contexts on various platforms for
    different customers without problem. - I cannot comment on your
    opinion or experiences.)

    [ snip "own compiler", speed and other topics ]


    Like, the passage of time still hasn't totally eliminated FORTRAN and
    COBOL.

    There's obviously some demand. *shrug* - I don't care much. - My last
    "contact" with FORTRAN was when one of my children was asked to handle
    some legacy library code; my suggestion was to get rid of that task.

    In my case, I don't have any descendants.

    Apparently they still exist in some places, mostly as languages that no
    one uses.

    (In scientific areas FORTRAN is obviously still widely used. And
    this is no "[geographically] local phenomenon" as I learned.)

    [...]

    [...]

    Apparently the languages people are trying to push as C replacements are mostly Rust, Zig, and Go.

    I've heard so. (But don't care much.)


    None of these particularly compel me though.
    They seem more like needless deviations from C than a true successor.

    I guess the older generations mostly had Pascal and Ada.

    Not sure what you are thinking here.

    While I knew of Pascal programs used even in professional projects
    (like in a nuclear reprocessing plant) but it never appeared to me
    that it is well usable for larger real world programs; at least in
    its standardized form back then; Pascal successors addressed these
    shortcomings to some degree, though. And Ada is (I think still) used
    in avionics, spacetravel, and some military areas. (Myself [an older generation] I had never programmed in Ada, or "professionally" in
    Pascal.)

    [...]

    [...]

    A new C-like language need not necessarily be strictly C based.

    (There's a couple things I like in "C". But if I'd have to invent a
    language it would certainly not be "C-like". I'd took a higher-level
    [better designed] language as paragon and support the "C" features I
    like, if not already present in that language.)


    [ ruminations about such new language snipped ]


    but retaining more in
    terms of implementation with C and C++.

    (But weren't exactly these languages already [partly] invented with
    such an agenda?)

    [...]

    I am imagining something that basically does similar stuff to what C
    already does, and can ideally be used in a similar context.

    The main downsides is that C and C++ are more complicated than ideal in
    many areas. This has a detrimental effect on compilers.

    Not so much intending to make a language that tries to be more intuitive
    or hand-holding though. However, if it is possible to make provisions
    for things like static-analysis or bounds-checked arrays (in a way that ideally doesn't adversely effect performance), this can be nice.

    I see.

    [...]

    [...]

    Java and C# had made 'char' 16-bit, but I now suspect this may have been >>> a mistake. It may be preferable instead keep 'char' as 8 bits and make
    UTF-8 the default string format. In the vast majority of cases, strings
    hold primarily or entirely ASCII characters.

    I think we should be careful here! An Unicode "character" may require
    even 32 bit, but UTF-8 is just an "encoding" (in units of an octet).
    If we want a sensible type system defined we should be aware of that
    difference. The question is; what shall be expressed by a 'char' type;
    the semantic entity or the transfer syntax. (This question is similar
    to the Unix file system, also based on octets; that made it possible
    to represent any international multi-octet characters. There's some
    layer necessary to get from the "transfer-syntax" (the encoding) to
    the representation.) - What will, say, a "C" user expect from 'char';
    just move it around or represent it on some output (or input) medium.

    It is a tradeoff.
    But, if "char*" can point to a string, then "char" needs to be the same
    size as an item in memory (thus, probably a byte).

    Otherwise, it would make sense to have "char" as an alias to "int" and require "ubyte*" for use as strings. For consistency with C, makes more
    sense to assume char to be a byte.

    (I don't think that addresses what I was pointing at. But never mind.)

    [...]

    Well, and that the underlying representation of a string is still as a
    pointer into a string-table or similar.

    Also the design of the standard library should remain conservative and
    not add piles of needless wrappers or cruft.

    Not sure what you have in mind here.

    Personally, despite some resentment on some of the complex syntax
    and constructs necessary, I liked the C++ STL; its orthogonality
    and concepts in principle. (And especially if compared to some
    other languages' ad hoc "tool-chest" libraries I stumbled across.)


    I was primarily thinking of Java and its excessive piles of wrapper
    classes. Like, C gives you the stdio functions, which are basic but effective.

    Yes, Java follows one (quite common) way, "C" another (primitive).
    But "tertium datur"! One other example is the STL.


    Java has:
    [...]
    We don't need this. Java just sort of ran with it, creating piles of
    random wrapper classes whose existence serves almost no practical
    purpose (and would have been much better served, say, by simply
    providing a File class that holds a mock-up of C's stdio interface;
    which is, ironically, closer to the approach C# had taken here).

    To be fair, and despite my dislike, there's of course a rationale
    for Java's approach. Its concepts can often very flexibly be used.

    (I recall I once wanted to use Regexps, found a simple Apache (I
    think) library that was clearly and sensibly defined and could be
    used easily and in a readable way. There was another flexible and
    bulky library with a three-level object hierarchy (or so). - Guess
    what became Java standard some year later!


    The great sin here of C++ is mostly things like iostream.

    It may appear so at first glance. But beyond some unfortunate design
    details it allows very flexible and powerful (yet readable) software
    designs in complex software architectures. (The problem is that folks
    seem to watch and stumble over the pebbles and thereby missing the
    landscape; figuratively formulated.)

    [...]


    I am more thinking from the perspective of implementing a compiler.

    Hah! Yeah. - Recently in another NG someone disliked a feature
    because he had suffered from troubles implementing it. (It was
    not MI but formatted I/O in that case.) - I'm not implementing
    complex languages, so I guess I can feel lucky if someone else
    did the language implementation job and I can just use it.


    I am writing from the POV of someone who did start making an attempt to implement C++ support, and mostly gave up at roughly an early 1990s
    feature level.

    If you dropped MI, templates, and pretty much everything following from these, stuff would be a lot easier.

    As student in a more radical mood I considered templates to be less
    important compared to inheritance; you can emulate them. But it's a
    lot simpler writing nice code if you have support for parameterized
    classes (templates) I had to admit back then; I wouldn't have wanted
    to miss them. (Stroustrup, BTW, considered not inventing templates
    earlier a mistake.)

    Janis

    [...]



    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Thu Oct 16 06:45:51 2025
    On 15.10.2025 12:26, bart wrote:
    On 15/10/2025 02:13, BGB wrote:

    There was ALGOL, but both C and Pascal descended from ALGOL.

    I've heard that before that C was somehow derived from Algol and even
    Algol 68.

    But it is so utterly unlike either of those, that if it's from the same family, then it must have been adopted.

    It's about some adopted language concepts (technical and semantical),
    not so much about its general appearance (or C's specific quirks).

    Janis


    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From BGB@3:633/10 to All on Thu Oct 16 04:43:24 2025
    On 10/15/2025 11:37 PM, Janis Papanagnou wrote:
    On 15.10.2025 03:13, BGB wrote:
    On 10/13/2025 11:29 PM, Janis Papanagnou wrote:
    (Sorry for the delayed reply; your ~450 lines post was too long for
    me to consider a timely reply.)

    (Now ~800 lines; it escalates.)


    On 09.10.2025 05:49, BGB wrote:
    On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
    On 08.10.2025 19:29, BGB wrote:
    On 10/8/2025 8:59 AM, Janis Papanagnou wrote:

    [...]

    Well, and for a given Cygwin install attempt, whether or not "g++" would
    work, etc, was a bit like playing roulette.

    I didn't "like" Cygwin, but also never had any "roulette" experience.


    Can't say, but I had issues with "g++" on Cygwin.
    Usually, "gcc" still worked fine, so long as it was used to compile C.


    [...]

    In most cases, it left C as a more preferable option.
    C can be made to do the same stuff at similar performance, with often
    only minimal difference in expressive power.

    The problem is, IMO, rather that "C", in the first place, doesn't
    compare to C++ in its level of "expressive power".

    ?...

    I have yet to find much that can be expressed in C++ but is not also
    expressible in C.

    You may adhere to another sort of expressiveness than I. (For me
    assembler, for example, is not more expressive than "C".) It's all
    about expressing "complex" things in easy ways.


    Assembler is very expressive.
    Concise, or portable, it is not...


    Contrast is, say, BASIC, where the things it can do can often be done in comparably few lines, but it often doesn't take long to run into limits
    as to what can be expressed in the language.


    Well, except in one project where I had made some funky extensions to
    BASIC and then proceeded to use it for CSG tasks. Though, in the
    context, I already had a BASIC interpreter, but wanted to do something
    similar to OpenSCAD, but it was a lot more of a "quick and dirty"
    solution to glue OpenSCAD functionality onto BASIC than to write a SCAD interpreter.

    Ironically, the BASIC code ended up being more concise than the SCAD as
    well, but "very weird".

    This dialect ended up with wackiness like:
    box1n=(csgaabb mins=[-15,-4,-4],maxs=[15,4,4],color=0x55AAFF)
    box3=(csgunion box1,box2)
    And:
    temp neck1=(gosub neck0 offs=[0,0,48])

    With the parenthesis allowing some statement-context keywords to be used
    as expressions (such as GOSUB).

    No proper functions or structured programing, mostly:
    GOTO label
    GOSUB label
    IF cond THEN GOTO label

    Also ended up being dynamically scoped, with each environment frame
    existing between GOSUB/RETURN pairs (contrast traditional BASIC being global-scoped). Though, ended up label-based rather than using line numbers.

    Why BASIC? Mostly because the language is well suited to doing a small interpreter (in this case, roughly 1500 lines of C).

    IIRC, was initially around 1000 lines, but the stuff for statements-as-expressions, dynamic scope, etc, added roughly 500 lines.

    Doing a structured BASIC (more like QBasic) would have required a more
    complex interpreter design.



    The main things that are fundamentally different, are things like
    Exceptions and RTTI, but even in C++, these don't come free.

    Back then they said that exceptions come for "almost free" (or so);
    I've never counted the seconds difference, since our project goals
    and priorities were lying on other factors.

    RTTI, OTOH, I rarely used in the first place. Part of it was due to
    my design principle to avoid casts; here (WRT RTTI), dynamic casts.
    This features wasn't often used in our projects.


    The usual cost of exceptions is, say:
    You need a table to map program locations to exception entry points;
    You need exception unwinders, even if mostly no-op stubs;
    Though, you might also need the handlers to call destructors, etc.
    Some provisions need to be made for unwinding the stack.

    The performance cost is usually fairly small, but it has a non-zero cost
    on code footprint.

    So, say, there is a delta of a few % or so on the size of the binary
    based on whether or not exceptions are disabled.


    The cost of RTTI is usually that it adds additional metadata structures
    that are typically pointed to by the first entry of the VTable (with the second entry usually holding an offset to be added to the object-address during method calls to get to the proper address for 'this' within the
    called method, *).


    *: Usually still applies to interfaces in a single-inheritance model,
    but is N/A for normal base classes. In C++ this would generally be
    required for all method calls because any given base-class could
    potentially be used as a secondary base class in a multiple-inheritance pattern.

    This potentially adds the cost of a memory load and an ADD instruction
    or similar to each virtual method call.


    Though, if exceptions are implemented using an approach similar to VEH
    in the Windows X64 ABI, it is at least modest.



    And, the main "powerful" tool of C++, templates,

    (IMO, the main powerful tool was primarily classes, polymorphisms,
    also [real] references.)

    These can be done in C via manually written vtables, and passing the
    address of a variable.

    (Yes, and you can also do it in assembler. - But that's not the point
    of using higher level structuring features. - Frankly, I'm so stumped
    that you wrote such a strange thing that I suppose it makes no sense
    to discuss that point further with you; our views, it seems, are here
    so fundamentally different, obviously.)


    Possibly.


    Often there does end up being a non-zero amount of dealing with OO stuff
    in assembler. For example, in my project I had used some COM-like
    interfaces for part of the system-call mechanism. Some of the logic for mapping COM method calls over a system call being written in assembler.

    Well, actually, there are several versions, mostly to there being
    several ISAs in use (both my own ISAs and RISC-V).


    Granted, one wouldn't usually want to do OO programming in assembler in
    most other contexts.

    Where, say, one merit of C here being its higher level of portability if compared with assembler.



    [...]

    We can do OO, just using a different approach, say:
    [...]

    *shudder*

    It all works, and doesn't require significantly more LOC than it would
    have in C++.


    Though, similar was often a problem in my other language design
    attempts: The most efficient way to do things was often also the C way. >>>
    IME, *writing* software in "C" requires much more time than in C++;
    presuming you meant that with "most efficient way to do things".

    (Saving a few seconds in "C" compared to C++ programs can hardly be
    relevant, I'd say; unless you were not really familiar with C++ ?
    Or have special application areas, as I read below in the post.)


    Main limiting factor at present is that it is a harder issue to write a
    non-trivial C++ compiler.

    I could write C++ code, but then it isn't really portable outside
    running on my PC or similar.

    (We've used it in professional contexts on various platforms for
    different customers without problem. - I cannot comment on your
    opinion or experiences.)


    You have, say:
    Desktop PC's;
    Other targets which are supported by GCC or similar;
    Say, RasPi, Cortex-M class microcontrollers, etc.
    ...

    If going outside of the realm of targets with GCC support and similar,
    stuff can break down.

    In terms of development experience, I would likely class something like
    a Raspberry Pi or similar as being in a similar category as a desktop PC.


    Whereas, say, something like a full custom ISA (with a custom C
    compiler), or an older lesser used ISA (such as 6502) would not.

    Whereas, you can (mostly) still run C on a 6502 (though usually with the
    pain of a 64K address space and wackiness like bank-switching). But, a
    lot of people on the 6502 were making heavy use of ASM.

    For a new language design, would likely consider targets like 6502 to be out-of-scope though.


    [ snip "own compiler", speed and other topics ]


    Like, the passage of time still hasn't totally eliminated FORTRAN and
    COBOL.

    There's obviously some demand. *shrug* - I don't care much. - My last
    "contact" with FORTRAN was when one of my children was asked to handle
    some legacy library code; my suggestion was to get rid of that task.

    In my case, I don't have any descendants.

    Apparently they still exist in some places, mostly as languages that no
    one uses.

    (In scientific areas FORTRAN is obviously still widely used. And
    this is no "[geographically] local phenomenon" as I learned.)


    Seems I could have worded this better.

    Basically, where some people still use FORTRAN and COBOL, but relatively
    few people know how to write or maintain code in these languages (so
    often the only hope is to try to get people to rewrite it in some other language).

    Apparently as well, there have been recent cases of people trying to use
    "vibe coding" (LLM AI) to translate old COBOL programs into Java and
    similar, with mixed results.


    [...]

    [...]

    Apparently the languages people are trying to push as C replacements are
    mostly Rust, Zig, and Go.

    I've heard so. (But don't care much.)


    I am not a great fan of these languages either.



    None of these particularly compel me though.
    They seem more like needless deviations from C than a true successor.

    I guess the older generations mostly had Pascal and Ada.

    Not sure what you are thinking here.

    While I knew of Pascal programs used even in professional projects
    (like in a nuclear reprocessing plant) but it never appeared to me
    that it is well usable for larger real world programs; at least in
    its standardized form back then; Pascal successors addressed these shortcomings to some degree, though. And Ada is (I think still) used
    in avionics, spacetravel, and some military areas. (Myself [an older generation] I had never programmed in Ada, or "professionally" in
    Pascal.)


    I think (long ago) I was in elementary school and someone showed me
    Pascal, but it didn't go anywhere.

    I later learned C, and mostly stuck with C. I sorta learned programming
    in C by poking around at C code (mostly stuff released by "id Software"
    and trying to understand how it works). But, in these early years, my
    attempts at messing with the code tended to cause it to fall apart into
    an unusable mess of bugs.

    And, at the time, the experiences with "g++" in Cygwin were off-putting
    (by late elementary school, was mostly in the Win9x era).

    Sometime around middle school, I tried using Java, but the experience
    sucked bad enough that I went back to C (well, and was also weird by preferring to run WinNT4 rather than Win98).

    There was C# not that long after, but I didn't personally have a strong
    reason to use C# over C.

    I later took some college classes as a CS major, and they were leaning
    mostly into using C# for everything. While arguably C# was a nice
    language for GUI programs and similar, didn't want to be stuck with a
    language that was mostly limited to Windows (or Mono on Linux, but the experience was generally worse than continuing to write code in C and similar).

    Where, say, I had often ended up running Linux on secondary computers,
    but not usually as a main OS on my main PC (even if at the moment, this
    is starting to look like a more attractive option).


    In all of this, generally C had remained as the option that presented
    the fewest issues across the range of things that I do.

    Granted, I still use C++ sometimes, but it hasn't really displaced the
    amount of stuff I end up staying with C for.

    Where:
    C : Can use pretty much anywhere;
    C++ : Can use on mainstream targets (Windows, Linux, ...);
    C# : Can use on Windows (pain on Linux, unusable on bare metal);
    Java: Kinda sucks everywhere.

    Meanwhile:
    Pascal, Ada: Sorta exist, but seemingly hardly no one uses them, so
    rarely seen in the wild.


    [...]

    [...]

    A new C-like language need not necessarily be strictly C based.

    (There's a couple things I like in "C". But if I'd have to invent a
    language it would certainly not be "C-like". I'd took a higher-level
    [better designed] language as paragon and support the "C" features I
    like, if not already present in that language.)


    [ ruminations about such new language snipped ]


    but retaining more in
    terms of implementation with C and C++.

    (But weren't exactly these languages already [partly] invented with
    such an agenda?)

    [...]

    I am imagining something that basically does similar stuff to what C
    already does, and can ideally be used in a similar context.

    The main downsides is that C and C++ are more complicated than ideal in
    many areas. This has a detrimental effect on compilers.

    Not so much intending to make a language that tries to be more intuitive
    or hand-holding though. However, if it is possible to make provisions
    for things like static-analysis or bounds-checked arrays (in a way that
    ideally doesn't adversely effect performance), this can be nice.

    I see.

    [...]

    [...]

    Java and C# had made 'char' 16-bit, but I now suspect this may have been >>>> a mistake. It may be preferable instead keep 'char' as 8 bits and make >>>> UTF-8 the default string format. In the vast majority of cases, strings >>>> hold primarily or entirely ASCII characters.

    I think we should be careful here! An Unicode "character" may require
    even 32 bit, but UTF-8 is just an "encoding" (in units of an octet).
    If we want a sensible type system defined we should be aware of that
    difference. The question is; what shall be expressed by a 'char' type;
    the semantic entity or the transfer syntax. (This question is similar
    to the Unix file system, also based on octets; that made it possible
    to represent any international multi-octet characters. There's some
    layer necessary to get from the "transfer-syntax" (the encoding) to
    the representation.) - What will, say, a "C" user expect from 'char';
    just move it around or represent it on some output (or input) medium.

    It is a tradeoff.
    But, if "char*" can point to a string, then "char" needs to be the same
    size as an item in memory (thus, probably a byte).

    Otherwise, it would make sense to have "char" as an alias to "int" and
    require "ubyte*" for use as strings. For consistency with C, makes more
    sense to assume char to be a byte.

    (I don't think that addresses what I was pointing at. But never mind.)

    [...]

    Well, and that the underlying representation of a string is still as a >>>> pointer into a string-table or similar.

    Also the design of the standard library should remain conservative and >>>> not add piles of needless wrappers or cruft.

    Not sure what you have in mind here.

    Personally, despite some resentment on some of the complex syntax
    and constructs necessary, I liked the C++ STL; its orthogonality
    and concepts in principle. (And especially if compared to some
    other languages' ad hoc "tool-chest" libraries I stumbled across.)


    I was primarily thinking of Java and its excessive piles of wrapper
    classes. Like, C gives you the stdio functions, which are basic but
    effective.

    Yes, Java follows one (quite common) way, "C" another (primitive).
    But "tertium datur"! One other example is the STL.


    Java has:
    [...]
    We don't need this. Java just sort of ran with it, creating piles of
    random wrapper classes whose existence serves almost no practical
    purpose (and would have been much better served, say, by simply
    providing a File class that holds a mock-up of C's stdio interface;
    which is, ironically, closer to the approach C# had taken here).

    To be fair, and despite my dislike, there's of course a rationale
    for Java's approach. Its concepts can often very flexibly be used.

    (I recall I once wanted to use Regexps, found a simple Apache (I
    think) library that was clearly and sensibly defined and could be
    used easily and in a readable way. There was another flexible and
    bulky library with a three-level object hierarchy (or so). - Guess
    what became Java standard some year later!


    Yeah...

    Contrast is the C approach:
    Give people the basic tools, then they do it themselves.

    How much things like 3rd party libraries are used is prone to vary.


    Like, the usual sort of tradeoff:
    Person wants to import and export JPEG images;
    Do they use "libjpeg" or similar, or write out like 2000 lines or so of
    C for the JPEG importer/exporter and then end up copy/pasting it into subsequent projects (because copy/pasting a 2kLOC blob of code is less
    hassle than dealing with a libjpeg dependency, or copy/pasting libjpeg).

    Wouldn't be surprised if instead Java has JPEG code as part of the class library (would have to look).



    Well, and then one ends up with a toolbox of image decoders/encoders:
    T.81 JPEG: Because we often need T.81 JPEG...
    -: JPEG decoding isn't very fast.
    (On a PC, hard to get much over 150 MPix/sec, single threaded)
    PNG: Because we often need PNG
    -: Depends on also having Deflate code.
    -: PNG eats a lot of RAM and is slow
    (typically worse then an optimized JPEG decoder).
    QOI:
    +: Reasonable stand-in for PNG (lossless RGBA)
    +: Reasonably cheap/fast to decode.
    -: Worse compression than PNG
    -: Less commonly supported than PNG
    UPIC: A custom format of mine (*)
    +: Can do both PNG and JPG use-cases.
    +: Typically faster decoding than either
    +: Roughly JPEG-competitive Q/bpp for lossy.
    +: Often slightly beats PNG for lossless.
    -: 3rd party software support is non-existent.

    *: UPIC is basically a similar structure to JPG, but with a few changes:
    Huffman -> STF+AdRice
    + Less lines of code
    + Less memory and cheaper setup
    - Slightly worse compression
    Though, depends on payload size.
    STF+AdRice can beat Huffman for small data.
    - Potentially slower
    Huffman with a 12 or 13-bit length-limit can be faster.
    Huffman with a 15 or 16 bit length limit is slower.
    Though, it is significantly faster for small payloads.
    Where, setting up a Huffman decoder table is slow.
    IDCT -> Block Haar Transform
    + Often better for synthetic images.
    + Faster (beats both IDCT and WHT for speed)
    + Fully reversible (lossless)
    - Worse compression for photos.
    YCbCr -> RCT
    + Reversible (Lossless)
    + Faster
    - Worse for compression.
    Slightly different VLC encoding scheme.
    Z3.V5,
    Values encoded similar to Deflate distances,
    with zigzag sign folding.
    Uses a different sort of TLV packaging scheme.

    Though, when dealing with both lossy and lossless use-cases, etc, it
    ends up with a similar code footprint to JPEG. Though the decoder has a
    lower RAM footprint. Decoder only needs a few kB of working memory, so
    most of the RAM use is for the input and output image buffers (generally operating on blocks of 16x16 pixels). For photo-like images Q/bpp was
    not significantly worse than JPEG, so "good enough".

    Often for JPEG-like formats, the color transform ends up as a major bottleneck, and RCT is at least slightly faster than most other YUV
    variants (while being reversible). Also tested YCoCg, but it failed to
    beat out RCT. Though, GDbDr can be faster than RCT, it is worse for compression.
    RCT: Y=(2*G+R+B)/4, U=B-G, V=R-G; G=Y-(U+V)/4, B=G+U, R=G+V;
    GDbDr: Y=G, U=B-G, V=R-G; G=Y, B=G+U, R=G+V;
    ...

    But, for some other cases, one might instead find that they are
    hard-pressed to beat out just using a 16 color BMP image or similar (particularly for low-res pixel-art type graphics).


    But, sadly, there isn't any single universally good option here (even
    for something as cut and dry as image storage).


    Well, nevermind that if, for audio, I had often ended up using the
    unpopular option of using ADPCM variants.

    ...



    The great sin here of C++ is mostly things like iostream.

    It may appear so at first glance. But beyond some unfortunate design
    details it allows very flexible and powerful (yet readable) software
    designs in complex software architectures. (The problem is that folks
    seem to watch and stumble over the pebbles and thereby missing the
    landscape; figuratively formulated.)


    While it is in premise easier to extend with new types, etc, using it
    does hurt build times pretty bad.

    When using C++, usually just stuck with "printf()" and friends.

    Nevermind if printing a custom type or similar requires writing a
    function to first convert it into a string or similar (and then needing
    to use a temporary buffer).


    [...]


    I am more thinking from the perspective of implementing a compiler.

    Hah! Yeah. - Recently in another NG someone disliked a feature
    because he had suffered from troubles implementing it. (It was
    not MI but formatted I/O in that case.) - I'm not implementing
    complex languages, so I guess I can feel lucky if someone else
    did the language implementation job and I can just use it.


    I am writing from the POV of someone who did start making an attempt to
    implement C++ support, and mostly gave up at roughly an early 1990s
    feature level.

    If you dropped MI, templates, and pretty much everything following from
    these, stuff would be a lot easier.

    As student in a more radical mood I considered templates to be less
    important compared to inheritance; you can emulate them. But it's a
    lot simpler writing nice code if you have support for parameterized
    classes (templates) I had to admit back then; I wouldn't have wanted
    to miss them. (Stroustrup, BTW, considered not inventing templates
    earlier a mistake.)


    Possibly.

    A possible alternative to templates could have been to trick out C
    macros similar to the macro-processing in many assemblers (with
    multi-line macros and conditional logic inside of macros, ...). Still
    ugly, but could have been cheaper to implement.

    Templates are not pretty in terms of what they will do to the compiler.

    Sadly, the languages that tried to avoid them often still ended up with generics.


    In some cases, my preference is to instead allow "auto" types in some
    contexts where they are not allowed. In this case, trying to invoke a
    function or method with an auto-type argument could cause the compiler
    to instantiate the function or method and fill in the rest with type inference.

    But, arguably, this is its own flavor of evil, and possibly doesn't save anything over generics.

    One other option is to allow for dynamic types; but these suck in a
    different way.


    Janis

    [...]




    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)