• Re: else ladders practice

    From David Brown@3:633/280.2 to All on Mon Nov 4 18:18:34 2024
    On 04/11/2024 05:00, Tim Rentsch wrote:
    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    fir <fir@grunge.pl> writes:

    Bart wrote:

    ral clear patterns here: you're testing the same variable 'n'
    against several mutually exclusive alternatives, which also happen
    to be consecutive values.

    C is short of ways to express this, if you want to keep those
    'somethings' as inline code (otherwise arrays of function pointers
    or even label pointers could be use

    so in short this groupo seem to have no conclusion but is tolerant
    foir various approaches as it seems

    imo the else latder is like most proper but i dont lkie it
    optically, swich case i also dont like (use as far i i remember
    never in my code, for years dont use even one)

    so i persnally would use bare ifs and maybe elses ocasionally
    (and switch should be mended but its fully not clear how,

    I think you should have confidence in your own opinion. All
    you're getting from other people is their opinion about what is
    easier to understand, or "clear", or "readable", etc. As long as
    the code is logically correct you are free to choose either
    style, and it's perfectly okay to choose the one that you find
    more appealing.

    There is a case where using 'else' is necessary, when there is a
    catchall action for circumstances matching "none of the above".
    Alternatively a 'break' or 'continue' or 'goto' or 'return' may
    be used to bypass subsequent cases, but you get the idea.

    With the understanding that I am offering more than my own opinion,
    I can say that I might use any of the patterns mentioned, depending
    on circumstances. I don't think any one approach is either always
    right or always wrong.

    maybe, but some may heve some strong arguments (for use this and not
    that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    There have been /some/ justifications for some of the opinions - but
    much of it has been merely opinions. And other people's opinions and
    thoughts can be inspirational in forming your own opinions.

    Once the OP (or anyone else) has looked at these, and garnered the ideas floated around, he might then modify his own opinions and preferences as
    a result. In the end, however, you are right that it is the OP's own
    opinions and preferences that should guide the style of the code - only
    he knows what the real code is, and what might suit best for the task in
    hand.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 4 22:56:03 2024
    On 04/11/2024 04:00, Tim Rentsch wrote:
    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    With the understanding that I am offering more than my own opinion,
    I can say that I might use any of the patterns mentioned, depending
    on circumstances. I don't think any one approach is either always
    right or always wrong.

    maybe, but some may heve some strong arguments (for use this and not
    that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    Pretty much everything about PL design is somebody's opinion.

    Somebody may try to argue about a particular approach or feature being
    more readable, easier to understand, to implement, more ergonomic, more intuitive, more efficient, more maintainable etc, but they are never
    going to convince anyone who has a different view or who is too used to another approach.

    In this case, it was about how to express a coding pattern in a
    particular language, as apparently the OP didn't like writing the 'else'
    in 'else if', and they didn't like using 'switch'.

    You are trying to argue against somebody's personal preference; that's
    never going to go well. Even when you use actual facts, such as having
    the wrong behaviour when those 'somethings' do certain things.

    Here, the question was, can:

    if (c1) s1;
    else if (c2) s2;

    always be rewritten as:

    if (c1) s1;
    if (c2) s2;

    In general, the answer has to be No. But when the OP doens't like that
    answer, what can you do?

    Even when the behaviour is the same for a particular set of c1/c2/s1/s2,
    the question then was: is it always going to be as efficient (since c2
    may be sometimes be evaluated unnessarily). Then it depends on quality
    of implementation, another ill-defined factor.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Mon Nov 4 23:29:09 2024
    On 04.11.2024 12:56, Bart wrote:
    [...]

    Here, the question was, can:

    if (c1) s1;
    else if (c2) s2;

    always be rewritten as:

    if (c1) s1;
    if (c2) s2;

    Erm, no. The question was even more specific. It had (per example)
    not only all ci disjunct but also defined as a linear sequence of
    natural numbers! - In other languages [than "C"] this may be more
    important since [historically] there were specific constructs for
    that case; see e.g. 'switch' definitions in Simula, or the 'case'
    statement of Algol 68, both mapping elements onto an array[1..N];
    labels in the first case, and expressions in the latter case. So
    in "C" we could at least consider using something similar, like,
    say, arrays of function pointers indexed by those 'n'. (Not that
    I'd suggest that by just pointing it out.)

    I'm a bit astonished, BTW, about this huge emphasis on the topic
    "opinions" in later posts of this thread. The OP asked (even in
    the subject) about "practice" which actually invites if not asks
    for providing opinions (besides practical experiences).

    (He also asked about two specific aspects; performance and terse
    code. Answers to that can already be derived from various posts'
    answers.)

    Janis

    [...]


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 4 23:38:06 2024
    On 04/11/2024 12:29, Janis Papanagnou wrote:
    On 04.11.2024 12:56, Bart wrote:
    [...]

    Here, the question was, can:

    if (c1) s1;
    else if (c2) s2;

    always be rewritten as:

    if (c1) s1;
    if (c2) s2;

    Erm, no. The question was even more specific.

    I mean that the question came down to this. After all he had already
    decided on that second form rather than the first, and had acknowledged
    that the 'else's were missing.

    That the OP's example contained some clear patterns has already been
    covered (I did so anyway).


    It had (per example)
    not only all ci disjunct but also defined as a linear sequence of
    natural numbers! - In other languages [than "C"] this may be more
    important since [historically] there were specific constructs for
    that case; see e.g. 'switch' definitions in Simula, or the 'case'
    statement of Algol 68, both mapping elements onto an array[1..N];
    labels in the first case, and expressions in the latter case. So
    in "C" we could at least consider using something similar, like,
    say, arrays of function pointers indexed by those 'n'.

    That too!

    ! (Not that
    I'd suggest that by just pointing it out.)

    I'm a bit astonished, BTW, about this huge emphasis on the topic
    "opinions" in later posts of this thread. The OP asked (even in
    the subject) about "practice" which actually invites if not asks
    for providing opinions (besides practical experiences).



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Mon Nov 4 23:40:48 2024
    On 02.11.2024 19:09, Tim Rentsch wrote:

    [...] As long as
    the code is logically correct you are free to choose either
    style, and it's perfectly okay to choose the one that you find
    more appealing.

    This is certainly true for one-man-shows. Hardly suited for most
    professional contexts I worked in. (Just my experience, of course.
    And people are free to learn things the Hard Way, if they prefer.)

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Mon Nov 4 23:46:34 2024
    On 04.11.2024 13:38, Bart wrote:

    That the OP's example contained some clear patterns has already been
    covered (I did so anyway).

    I haven't read every post, even if I occasionally take some time
    to catch up.[*]

    Janis

    [*] Threads in this group, even for trivial things, tend to get
    band-worms and individual posts often very longish.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From fir@3:633/280.2 to Bart on Tue Nov 5 02:02:16 2024
    To: Bart <bc@freeuk.com>

    Bart wrote:
    On 04/11/2024 04:00, Tim Rentsch wrote:
    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    With the understanding that I am offering more than my own opinion,
    I can say that I might use any of the patterns mentioned, depending
    on circumstances. I don't think any one approach is either always
    right or always wrong.

    maybe, but some may heve some strong arguments (for use this and not
    that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    Pretty much everything about PL design is somebody's opinion.

    overally when you think and discuss such thing some conclusions may do
    appear - and often soem do for me, though they are not always very clear
    or 'hard'

    overally from this thread i noted that switch (which i already dont
    liked) is bad.. note those two elements of switch it is "switch"
    and "Case" are in weird not obvious relation in c (and what will it
    work when you mix it etc)

    what i concluded was than if you do thing such way


    a { } //this is analogon to case - named block
    b { } //this is analogon to case - named block
    n() // here by "()" i noted call of some wariable that mey yeild
    'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
    pointer)
    c( ) //this is analogon to case - named block
    d( ) //this is analogon to case - named block


    then everything is clear - this call just selects and calls block , and
    block itself are just definitions and are skipped in execution until
    "called"


    this is example of some conclusion for me from thsi thread - and i think
    such codes as this my own initial example should be probably done such
    way (though it is not c, i know








    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)
  • From fir@3:633/280.2 to Bart on Tue Nov 5 02:06:56 2024
    To: Bart <bc@freeuk.com>

    fir wrote:
    Bart wrote:
    On 04/11/2024 04:00, Tim Rentsch wrote:
    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    With the understanding that I am offering more than my own opinion,
    I can say that I might use any of the patterns mentioned, depending
    on circumstances. I don't think any one approach is either always
    right or always wrong.

    maybe, but some may heve some strong arguments (for use this and not
    that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    Pretty much everything about PL design is somebody's opinion.

    overally when you think and discuss such thing some conclusions may do
    appear - and often soem do for me, though they are not always very clear
    or 'hard'

    overally from this thread i noted that switch (which i already dont
    liked) is bad.. note those two elements of switch it is "switch"
    and "Case" are in weird not obvious relation in c (and what will it
    work when you mix it etc)

    what i concluded was than if you do thing such way


    a { } //this is analogon to case - named block
    b { } //this is analogon to case - named block
    n() // here by "()" i noted call of some wariable that mey yeild
    'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
    pointer)
    c( ) //this is analogon to case - named block
    d( ) //this is analogon to case - named block


    then everything is clear - this call just selects and calls block , and
    block itself are just definitions and are skipped in execution until
    "called"


    this is example of some conclusion for me from thsi thread - and i think
    such codes as this my own initial example should be probably done such
    way (though it is not c, i know


    note in fact both array usage like tab[5] and fuunction call like foo()
    are analogues to swich case - as when you call fuctions the call is like switch and function definition sets are 'cases'


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 5 02:21:37 2024
    On 04/11/2024 15:06, fir wrote:
    fir wrote:
    Bart wrote:
    On 04/11/2024 04:00, Tim Rentsch wrote:
    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    With the understanding that I am offering more than my own opinion, >>>>>> I can say that I might use any of the patterns mentioned, depending >>>>>> on circumstances.ÿ I don't think any one approach is either always >>>>>> right or always wrong.

    maybe, but some may heve some strong arguments (for use this and not >>>>> that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    Pretty much everything about PL design is somebody's opinion.

    overally when you think and discuss such thing some conclusions may do
    appear - and often soem do for me, though they are not always very clear
    or 'hard'

    overally from this thread i noted that switch (which i already dont
    liked) is bad.. note those two elements of switch it is "switch"
    and "Case" are in weird not obvious relation in c (and what will it
    work when you mix it etc)

    what i concluded was than if you do thing such way


    a {ÿ }ÿ //this is analogon to case - named block
    b {ÿ }ÿ //this is analogon to case - named block
    n()ÿÿ // here by "()" i noted call of some wariable that mey yeild
    'call' to a ,b, c, d, e, fÿ //(in that case na would be soem enum or
    pointer)
    c(ÿ ) //this is analogon to case - named block
    d(ÿ ) //this is analogon to case - named block


    then everything is clear - this call just selects and calls block , and
    block itself are just definitions and are skipped in execution until
    "called"


    this is example of some conclusion for me from thsi thread - and i think
    such codes as this my own initial example should be probably done such
    way (though it is not c, i know


    note in fact both array usage like tab[5] and fuunction call like foo()
    are analogues to swich case - as when you call fuctions the call is like switch and function definition sets are 'cases'


    Yes, switch could be implemented via a table of label pointers, but it
    needs a GNU extension.

    For example this switch:

    #include <stdio.h>

    int main(void) {
    for (int i=0; i<10; ++i) {
    switch(i) {
    case 7: case 2: puts("two or seven"); break;
    case 5: puts("five"); break;
    default: puts("other");
    }
    }
    }


    Could also be written like this:

    #include <stdio.h>

    int main(void) {
    void* table[] = {
    &&Lother, &&Lother, &&L27, &&Lother, &&Lother, &&L5,
    &&Lother, &&L27, &&Lother, &&Lother};

    for (int i=0; i<10; ++i) {
    goto *table[i];

    L27: puts("two or seven"); goto Lend;
    L5: puts("five"); goto Lend;
    Lother: puts("other");
    Lend:;
    }
    }

    (A compiler may generate something like this, although it will be range-checked if need. In practice, small numbers of cases, or where the
    case values are too spread out, might be implemented as if-else chains.)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From fir@3:633/280.2 to Bart on Tue Nov 5 02:34:46 2024
    To: Bart <bc@freeuk.com>

    fir wrote:
    Bart wrote:
    On 04/11/2024 04:00, Tim Rentsch wrote:
    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    With the understanding that I am offering more than my own opinion,
    I can say that I might use any of the patterns mentioned, depending
    on circumstances. I don't think any one approach is either always
    right or always wrong.

    maybe, but some may heve some strong arguments (for use this and not
    that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    Pretty much everything about PL design is somebody's opinion.

    overally when you think and discuss such thing some conclusions may do
    appear - and often soem do for me, though they are not always very clear
    or 'hard'

    overally from this thread i noted that switch (which i already dont
    liked) is bad.. note those two elements of switch it is "switch"
    and "Case" are in weird not obvious relation in c (and what will it
    work when you mix it etc)

    what i concluded was than if you do thing such way


    a { } //this is analogon to case - named block
    b { } //this is analogon to case - named block
    n() // here by "()" i noted call of some wariable that mey yeild
    'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
    pointer)
    c( ) //this is analogon to case - named block
    d( ) //this is analogon to case - named block


    second wersion would be the one based on labels and goto

    a:
    b:
    n!
    c:
    d:

    gere n! wuld symbolize goto n and the different operator means dfference
    among "call" ang "jmp" on assembly level and lack of block
    would denote lack on ret on assembly level


    im not sure byut maybe that those two versions span all needed
    (not sure as to this, but as said one sxpresses jumps and one calls
    on assembly level


    then everything is clear - this call just selects and calls block , and
    block itself are just definitions and are skipped in execution until
    "called"


    this is example of some conclusion for me from thsi thread - and i think
    such codes as this my own initial example should be probably done such
    way (though it is not c, i know









    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)
  • From fir@3:633/280.2 to Bart on Tue Nov 5 02:52:17 2024
    To: Bart <bc@freeuk.com>

    Bart wrote:
    On 04/11/2024 15:06, fir wrote:
    fir wrote:
    Bart wrote:
    On 04/11/2024 04:00, Tim Rentsch wrote:
    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    With the understanding that I am offering more than my own opinion, >>>>>>> I can say that I might use any of the patterns mentioned, depending >>>>>>> on circumstances. I don't think any one approach is either always >>>>>>> right or always wrong.

    maybe, but some may heve some strong arguments (for use this and not >>>>>> that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    Pretty much everything about PL design is somebody's opinion.

    overally when you think and discuss such thing some conclusions may do
    appear - and often soem do for me, though they are not always very clear >>> or 'hard'

    overally from this thread i noted that switch (which i already dont
    liked) is bad.. note those two elements of switch it is "switch"
    and "Case" are in weird not obvious relation in c (and what will it
    work when you mix it etc)

    what i concluded was than if you do thing such way


    a { } //this is analogon to case - named block
    b { } //this is analogon to case - named block
    n() // here by "()" i noted call of some wariable that mey yeild
    'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
    pointer)
    c( ) //this is analogon to case - named block
    d( ) //this is analogon to case - named block


    then everything is clear - this call just selects and calls block , and
    block itself are just definitions and are skipped in execution until
    "called"


    this is example of some conclusion for me from thsi thread - and i think >>> such codes as this my own initial example should be probably done such
    way (though it is not c, i know


    note in fact both array usage like tab[5] and fuunction call like foo()
    are analogues to swich case - as when you call fuctions the call is
    like switch and function definition sets are 'cases'


    Yes, switch could be implemented via a table of label pointers, but it
    needs a GNU extension.

    For example this switch:

    #include <stdio.h>

    int main(void) {
    for (int i=0; i<10; ++i) {
    switch(i) {
    case 7: case 2: puts("two or seven"); break;
    case 5: puts("five"); break;
    default: puts("other");
    }
    }
    }


    Could also be written like this:

    #include <stdio.h>

    int main(void) {
    void* table[] = {
    &&Lother, &&Lother, &&L27, &&Lother, &&Lother, &&L5,
    &&Lother, &&L27, &&Lother, &&Lother};

    for (int i=0; i<10; ++i) {
    goto *table[i];

    L27: puts("two or seven"); goto Lend;
    L5: puts("five"); goto Lend;
    Lother: puts("other");
    Lend:;
    }
    }

    (A compiler may generate something like this, although it will be range-checked if need. In practice, small numbers of cases, or where the
    case values are too spread out, might be implemented as if-else chains.)


    probably swich is implemented like

    push __out__ //to simulate return under out_ adress
    cmp eax, "A"
    je __A__
    cmp eax, "B"
    je __B__
    cmp eax, "C"
    je __C__
    __out___:
    ....
    ....
    ....

    if elkse ladder would do the same i guess
    and sequence f ifs would not push __out__ if
    not detected it can ans those cases for sure may not appear

    its waste to check a long sequance of compares it someones unlucky
    though if the argument of switch is like 8bit wide it is probably no
    problem to put the labels in the teble and callvia the table



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 5 03:35:44 2024
    On 03/11/2024 21:00, Bart wrote:
    On 03/11/2024 17:00, David Brown wrote:
    On 02/11/2024 21:44, Bart wrote:

    I would disagree on that definition, yes.ÿ A "multi-way selection"
    would mean, to me, a selection of one of N possible things - nothing
    more than that.ÿ It is far too general a phrase to say that it must
    involve branching of some sort ("notional" or otherwise).

    Not really. If the possible options involving actions written in-line,
    and you only want one of those executed, then you need to branch around
    the others!


    And if it does /not/ involve actions "in-line", or if the semantics of
    the selection say that all parts are evaluated before the selection,
    then it would /not/ involve branching. I did not say that multi-way selections cannot involve branching - I said that the phrase "multi-way selection" is too vague to say that branches are necessary.

    ÿ And it is too general to say if you are selecting one of many things
    to do, or doing many things and selecting one.


    Sorry, but this is the key part. You are not evaluating N things and selecting one; you are evaluating ONLY one of N things.

    I understand that this is key to what /you/ mean by "multi-way
    selection". And if I thought that was what that phrase meant, then I'd
    agree with you on many of your other points.

    If you have some objective justification for insisting that the phrase
    has a particular common meaning that rules out the possibility of first creating N "things" and then selecting from them, then I would like to
    hear about it. Until then, I will continue to treat it as a vague
    phrase without a specific meaning, and repeating your assertions won't
    change my mind.

    To my mind, this is a type of "multi-way selection" :

    (const int []){ a, b, c }[n];

    I can't see any good reason to exclude it as fitting the descriptive
    phrase. And if "a", "b" and "c" are not constant, but require
    evaluation of some sort, it does not change things. Of course if these required significant effort to evaluate, or had side-effects, then you
    would most likely want a "multi-way selection" construction that did the selection first, then the evaluation - but that's a matter of programmer choice, and does not change the terms. (For some situations, such as
    vector processing or SIMD work, doing the calculations before the
    selection may be more time-efficient even if most of the results are
    then discarded.)



    For X, it builds a list by evaluating all the elements, and returns the value of the last. For Y, it evaluates only ONE element (using internal switch, so branching), which again is the last.

    You don't seem keen on keeping these concepts distinct?

    I am very keen on keeping the concepts distinct in cases where it
    matters. So they should be given distinct names or terms - or at least,
    clear descriptive phrases should be used to distinguish them.

    At the moment, you are saying that an "pet" is a four-legged creature
    that purrs, and getting worked up when I some pets are dogs. It doesn't matter how much of a cat person you are, there /are/ other kinds of pets.

    It doesn't matter how keen you are on making the selection before the evaluation, or how often it is the better choice, you can't impose
    arbitrary restrictions on a general phrase.


    The whole construct may or may not return a value. If it does, then
    one of the N paths must be a default path.


    No, that is simply incorrect.ÿ For one thing, you can say that it is
    perfectly fine for the selection construct to return a value sometimes
    and not at other times.

    How on earth is that going to satisfy the type system? You're saying
    it's OK to have this:

    ÿÿ int x = if (randomfloat()<0.5) 42;


    In C, no. But when we have spread to other languages, including
    hypothetical languages, there's nothing to stop that. Not only could it
    be supported by the run-time type system, but it would be possible to
    have compile-time types that are more flexible and only need to be "solidified" during code generation. That might allow the language to
    track things like "uninitialised" or "no value" during compilation
    without having them part of a real type (such as std::optional<> or a C
    struct with a "valid" field). All sorts of things are possible in a programming language when you don't always think in terms of direct translation from source to assembly.

    Or even this, which was discussed recently, and which is apparently
    valid C:

    ÿÿ int F(void) {
    ÿÿÿÿÿÿ if (randomfloat()<0.5) return 42;


    Presumably you meant to add the closing } here ? Yes, that is valid C,
    but it is undefined behaviour to use the value of F() if a value was not returned.

    In the first example, you could claim that no assignment takes place
    with a false condition (so x contains garbage). In the second example,
    what value does F return when the condition is false?


    It doesn't return a value. That is why it is UB to try to use that non-existent value.

    You can't hide behind your vast hyper-optimising compiler; the language needs to say something about it.


    I am not suggesting any kind of "hyper-optimising" compiler. I am
    suggesting that it is perfectly possible for a language to be defined in
    a way that is different from your limited ideas (noting that your style
    of language is not hugely different from C, at least in this aspect).


    My language will not allow it. Most people would say that that's a good thing. You seem to want to take the perverse view that such code should
    be allowed to return garbage values or have undefined behaviour.

    Is your idea of "most people" based on a survey of more than one person?

    Note that I have not suggested returning garbage values - I have
    suggested that a language might support handling "no value" in a
    convenient and safe manner. Many languages already do, though of course
    it is debatable how safe, convenient or efficient the chosen solution
    is. I've already given examples of std::optional<> in C++, Maybe types
    in Haskell, null pointers in C, and you can add exceptions to that list
    as a very different way of allowing functions to exit without returning
    a value.

    Totally independent of and orthogonal to that, I strongly believe that
    there is no point in trying to define behaviour for something that
    cannot happen, or for situations where there is no correct behaviour.
    The principle of "garbage in, garbage out" was established by Babbage's
    time, and the concept of functions that do not have defined values for
    all inputs is as at least as old as the concept of mathematical function
    - it goes back to the first person who realised you can't divide by
    zero. The concept of UB is no more and no less than this.


    After all, this is C! But please tell me, what would be the downside of
    not allowing it?

    Are you asking what are the downsides of always requiring a returned
    value of a specific type? Do you mean in addition to the things I have already written?


    ÿ It's fine if it never returns at all for some
    cases.ÿ It's fine to give selection choices for all possible inputs.
    It's fine to say that the input must be a value for which there is a
    choice.

    What I see here is that you don't like C's constructs (that may be for
    good reasons, it may be from your many misunderstandings about C, or
    it may be from your knee-jerk dislike of everything C related).

    With justification. 0010 means 8 in C? Jesus.


    I think the word "neighbour" is counter-intuitive to spell. Therefore
    we should throw out the English language, because it is all terrible,
    and it only still exists because some people insist on using it rather
    than my own personal language of gobbledegook.

    That's the knee-jerk "logic" you use in discussions about C. (Actually,
    it's worse than that - you'd reject English because you think the word "neighbour" is spelt with three "u's", or because you once saw it misspelt.)

    It's hardly knee-jerk either since I first looked at it in 1982, when my
    own language barely existed. My opinion has not improved.


    It's been knee-jerk all the time I have known you in this group.

    Of course some of your criticisms of the language will be shared by
    others - that's true of any language that has ever been used. And
    different people will dislike different aspects of the language. But
    you are unique in hating everything about C simply because it is in C.


    ÿ You have some different selection constructs in your own language,
    which you /do/ like.ÿ (It would be very strange for you to have
    constructs that you don't like in your own personal one-man language.)

    It's a one-man language but most of its constructs and features are universal. And therefore can be used for comparison.


    Once a thread here has wandered this far off-topic, it is perhaps not unreasonable to draw comparisons with your one-man language. But it is
    not as useful as comparisons to real languages that other people might
    be familiar with, or can at least read a little about.

    The real problem with your language is that you think it is perfect, and
    that everyone else should agree that it is perfect, and that any
    language that does something differently is clearly wrong and inferior.
    This hinders you from thinking outside the box you have build for yourself.


    One feature of my concept of 'multi-way select' is that there is one
    or more controlling expressions which determine which path is followed.


    Okay, that's fine for /your/ language's multi-way select construct.
    But other people and other languages may do things differently.

    FGS, /how/ different? To select WHICH path or which element requires
    some input. That's the controlling expression.

    Have you been following this thread at all? Clearly a "multi-way
    select" must have an input to choose the selection. But it does /not/
    have to be a choice of a path for execution or evaluation.

    When someone disagrees with a statement you made, please try to think a
    little about which part of it they disagree with.


    Or maybe with your ideal language, you can select an element of an array without bothering to provide an index!

    There are plenty of C programmers - including me - who would have
    preferred to have "switch" be a more structured construct which could
    not be intertwined with other constructs in this way.ÿ That does not
    mean "switch" is not clearly defined - nor does it hinder almost every
    real-world use of "switch" from being reasonably clear and structured.
    It does, however, /allow/ people to use "switch" in more complex and
    less clear ways.

    Try and write a program which takes any arbitrary switch construct (that usually means written by someone else, because obviously all yours will
    be sensible), and cleanly isolates all the branches including the
    default branch.


    No. I am well aware that the flexibility of C's switch, and the
    fall-through mechanism, make it more effort to parse and handle algorithmically than if it were more structured. That has no bearing on whether or not the meaning is clearly defined, or whether the majority
    of real-world uses of "switch" are fairly easy to follow.

    Hint: the lack of 'break' in a non-empty span between two case labels
    will blur the line. So will a conditional break (example below unless
    it's been culled).

    You are confusing "this makes it possible to write messy code" with a
    belief that messy code is inevitable or required.ÿ And you are
    forgetting that it is always possible to write messy or
    incomprehensible code in any language, with any construct.

    I can't write that randomfloat example in my language.

    Okay.

    I can't leave out
    a 'break' in a switch statement (it's not meaningful). It is impossible
    to do the crazy things you can do with switch in C.

    Okay - although I can't see why you'd have a "break" statement here in
    the first place.

    As I've said many times, I'd prefer it if C's switches were more structured.

    None of that has any bearing on other types of multi-way selection
    constructs.


    Yes, with most languages you can write nonsense programs, but that
    doesn't give the language a licence to forget basic rules and common
    sense, and just allow any old rubbish even if clearly wrong:

    ÿÿ int F() {
    ÿÿÿÿÿÿ F(1, 2.3, "four", F,F,F,F(),F(F()));
    ÿÿÿÿÿÿ F(42);
    ÿÿ }

    This is apparently valid C. It is impossible to write this in my language.

    It is undefined behaviour in C. Programmers are expected to write
    sensible code.

    I am confident that if I knew your language, I could write something meaningless. But just as with C, doing so would be pointless.


    You can't use such a statement as a solid basis for a multi-way
    construct that returns a value, since it is, in general, impossible
    to sensibly enumerate the N branches.


    It is simple and obvious to enumerate the branches in almost all
    real-world cases of switch statements.ÿ (And /please/ don't faff
    around with cherry-picked examples you have found somewhere as if they
    were representative of anything.)

    Oh, right. I'm not allowed to use counter-examples to lend weight to my comments. In that case, perhaps you shouldn't be allowed to use your sensible examples either. After all we don't know what someone will feed
    to a compiler.

    We /do/ know that most people would feed sensible code to compilers.


    But, suppose C was upgraded so that switch could return a value. For
    that, you'd need the value at the end of each branch. OK, here's a
    simple one:

    ÿÿÿ y = switch (x) {
    ÿÿÿÿÿÿÿ case 12:
    ÿÿÿÿÿÿÿÿÿÿÿ if (c) case 14: break;
    ÿÿÿÿÿÿÿÿÿÿÿ 100;
    ÿÿÿÿÿÿÿ case 13:
    ÿÿÿÿÿÿÿÿÿÿÿ 200;
    ÿÿÿÿÿÿÿÿÿÿÿ break;
    ÿÿÿÿÿÿÿ }

    Any ideas? I will guess that x=12/c=false or c=13 will yield 200. What
    avout x=12/c=true, or x=14, or x = anything else?


    What exactly is your point here? Am I supposed to be impressed that you
    can add something to C and then write meaningless code with that extension?


    So if I understand correctly, you are saying that chains of if/else,
    an imaginary version of "switch", and the C tertiary operator all
    evaluate the same things in the same way, while with C's switch you
    have no idea what happens?

    Yes. With C's switch, you can't /in-general/ isolate things into
    distinct blocks. You might have a stab if you stick to a subset of C and follow various guidelines, in an effort to make 'switch' look normal.

    See the example above.

    You /can/ isolate things into distinct blocks, with occasional
    fall-throughs, when you look at code people actually write. No one
    writes code like your example above, so no one needs to be able to
    interpret it.

    Occasionally, people use "switch" statements in C for fancy things, like coroutines. Then the logic flow can be harder to follow, but it is for
    niche cases. People don't randomly mix switches with other structures.



    ÿ That is true, if you cherry-pick what you choose to ignore in each
    case until it fits your pre-conceived ideas.

    You're the one who's cherry-picking examples of C!

    I haven't even given any examples.

    Here is my attempt at
    converting the above switch into my syntax (using a tool derived from my
    C compiler):

    ÿÿÿ switch x
    ÿÿÿ when 12 then
    ÿÿÿÿÿÿÿ if c then

    ÿÿÿÿÿÿÿ fi
    ÿÿÿÿÿÿÿ 100
    ÿÿÿÿÿÿÿ fallthrough
    ÿÿÿ when 13 then
    ÿÿÿÿÿÿÿ 200
    ÿÿÿ end switch

    It doesn't attempt to deal with fallthrough, and misses out that
    14-case, and that conditional break. It's not easy; I might have better
    luck with assembly!



    No, what you call "natural" is entirely subjective.ÿ You have looked
    at a microscopic fraction of code written in a tiny proportion of
    programming languages within a very narrow set of programming fields.

    I've worked with systems programming and have done A LOT in the 15 years until the mid 90s. That included pretty much everything involved in
    writing graphical applications given only a text-based disk OS that
    provided file-handling.

    I know you have done a fair bit of programming. That does not change
    what I said. (And I am not claiming that I have programmed in a wider
    range of fields than you.)


    Plus of course devising and implementing everthing needed to run my own systems language. (After mid 90s, Windows took over half the work.)

    That's not criticism - few people have looked at anything more.

    Very few people use their own languages, especially over such a long
    period, also use them to write commercial applications, or create
    languages for others to use.


    When you use your own language, that inevitably /restricts/ your
    experience with other programmers and other code. It is not a positive
    thing in this context.



    ÿ What I /do/ criticise is that your assumption that this almost
    negligible experience gives you the right to decide what is "natural"
    or "true", or how programming languages or tools "should" work.

    So, in your opinion, 'switch' should work how it works in C? That is the most intuitive and natural way implementing it?

    No, I think there is quite a bit wrong with the way C's "switch"
    statement works.

    I don't think there is a single "most intuitive" or "most natural" way
    to achieve a multi-way execution path selection statement in a language
    - because "intuitive" and "natural" are highly subjective. There are syntaxes, features and limitations that I think would be a reasonable
    fit in C, but those could well be very different in other languages.



    ÿ You need to learn that other people have different ideas, needs,
    opinions or preferences.

    Most people haven't got a clue about devising PLs.


    I think you'd be surprised. Designing a general-purpose programming
    language is not a small or easy task, and making a compiler is certainly
    a big job. But you'd search far and wide to find an experienced
    programmer who doesn't have opinions or ideas about languages and how
    they might like to change them.

    ÿ I'd question the whole idea of having a construct that can
    evaluate to something of different types in the first place, whether
    or not it returns a value, but that's your choice.

    If the result of a multi-way execution doesn't yield a value to be
    used, then the types don't matter.


    Of course they do.

    Of course they don't! Here, F, G and H return int, float and void* respectively:

    ÿÿÿÿÿÿÿ if (c1) F();
    ÿÿ else if (c2) G();
    ÿÿ elseÿÿÿÿÿÿÿÿ H();

    C will not complain that those branches yield different types. But you
    say it should do? Why?


    Those branches don't yield different types in C. In C, branches don't
    "yield" anything. Any results from calling these functions are, in
    effect, cast to void.

    You're just being contradictory for the sake of it aren't you?!


    No, but I think you are having great difficulty understanding what I
    write. Maybe that's my fault as much as yours.


    This is just common sense; I don't know why you're questioning it.
    (I'd quite like to see a language of your design!)

    def foo(n) :
    ÿÿÿÿÿif n == 1 : return 10
    ÿÿÿÿÿif n == 2 : return 20
    ÿÿÿÿÿif n == 3 : return

    That's Python, quite happily having a multiple choice selection that
    sometimes does not return a value.

    Python /always/ returns some value. If one isn't provided, it returns
    None. Which means checking that a function returns an explicit value
    goes out the window. Delete the 10 and 20 (or the entire body), and it
    still 'works'.


    "None" is the Python equivalent of "no value".

    Maybe you are thinking about returning an unspecified value of a type
    such as "int", rather than returning no value?


    ÿ Yes, that is a dynamically typed language, not a statically type
    language.

    std::optional<int> foo(int n) {
    ÿÿÿÿ if (n == 1) return 10;
    ÿÿÿÿ if (n == 2) return 20;
    ÿÿÿÿ if (n == 3) return {};
    }

    That's C++, a statically typed language, with a multiple choice
    selection that sometimes does not return a value - the return type
    supports values of type "int" and non-values.

    So what happens when n is 4? Does it return garbage (so that's bad).

    It is undefined behaviour, as you would expect. (In my hypothetical
    language that had better handling for "no value", falling off the end of
    the function would return "no value" - in C++, that's std::nullopt,
    which is what you get with "return {};" here.)

    Does it arrange to return some special value of 'optional' that means no value?

    No. C++ rules for function returns are similar to C's, but a little
    stricter - you are not allowed to fall off the end of a non-void
    function (excluding main(), constructors, destructors and coroutines).
    If you break the rules, there is no defined behaviour.

    The "return {};" returns the special "std::nullopt;" value (converted to
    the actual std::optional<T> type) that means "no value".

    Roughly speaking, a C++ std::optional<T> is like a C struct:

    structÿ{
    bool valid;
    T value;
    }


    In that case, the type still does matter, but the language is
    providing that default path for you.


    X Y A B are arbitrary expressions. The need for 'else' is determined
    during type analysis. Whether it will ever execute the default path
    would be up to extra analysis, that I don't do, and would anyway be
    done later.


    But if it is not possible for neither of X or Y to be true, then how
    would you test the "else" clause?ÿ Surely you are not proposing that
    programmers be required to write lines of code that will never be
    executed and cannot be tested?

    Why not? They still have to write 'end', or do you propose that can be
    left out if control never reaches the end of the function?!

    I'm guessing that "end" here is part of the syntax of your function definitions in your language. That's not executable code, but part of
    the syntax.


    (In earlier versions of my dynamic language, the compiler would insert
    an 'else' branch if one was needed, returning 'void'.

    I decided that requiring an explicit 'else' branch was better and more failsafe.)


    You can't design a language like this where valid syntax depends on
    compiler and what it might or might not discover when analysing the
    code.


    Why not?ÿ It is entirely reasonable to say that a compiler for a
    language has to be able to do certain types of analysis.

    This was the first part of your example:

    ÿconst char * flag_to_text_A(bool b) {
    ÿÿÿ if (b == true) {
    ÿÿÿÿÿÿÿ return "It's true!";
    ÿÿÿ } else if (b == false) {
    ÿÿÿÿÿÿÿ return "It's false!";

    /I/ would question why you'd want to make the second branch conditional
    in the first place. Write an 'else' there, and the issue doesn't arise.


    Perhaps I want to put it there for symmetry.

    Because I can't see the point of deliberately writing code that usually takes two paths, when either:

    ÿ(1) you know that one will never be taken, or
    ÿ(2) you're not sure, but don't make any provision in case it is

    Fix that first rather relying on compiler writers to take care of your
    badly written code.

    I am not expecting anything from compiler writers here. I am asking
    /you/ why you want to force /programmers/ to write extra code that they
    know is useless.


    And also, you keep belittling my abilities and my language, when C allows:

    ÿ int F(void) {}

    How about getting your house in order first.


    If I were the designer of the C language and the maintainer of the C standards, you might have a point. C is not /my/ language.

    Anyone who is convinced that their own personal preferences are more
    "natural" or inherently superior to all other alternatives, and can't
    justify their claims other than saying that everything else is "a
    mess", is just navel-gazing.

    I wrote more here but the post is already too long.

    Ah, a point that we can agree on 100% :-)

    Let's just that
    'messy' is a fair assessment of C's conditional features, since you can write this:

    No, let's not just say that.

    We can agree that C /lets/ people write messy code. It does not
    /require/ it. And I have never found a programming language that stops
    people writing messy code.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 5 06:50:40 2024
    On 04/11/2024 16:35, David Brown wrote:
    On 03/11/2024 21:00, Bart wrote:

    To my mind, this is a type of "multi-way selection" :

    ÿÿÿÿ(const int []){ a, b, c }[n];

    I can't see any good reason to exclude it as fitting the descriptive
    phrase.


    And if "a", "b" and "c" are not constant, but require
    evaluation of some sort, it does not change things.ÿ Of course if these required significant effort to evaluate,

    Or you had a hundred of them.

    or had side-effects, then you
    would most likely want a "multi-way selection" construction that did the selection first, then the evaluation - but that's a matter of programmer choice, and does not change the terms.


    You still don't get how different the concepts are. Everybody is
    familiar with N-way selection when it involves actions, eg. statements. Because they will be in form of a switch statement, or an if-else chain.

    They will expect one branch only to evaluated. Otherwise, there's no
    point in a selection or condition, if all will be evaluated anyway!

    But I think they are less familiar with the concept when it mainly
    involves expressions, and the whole thing returns a value.

    The only such things in C are the ?: operator, and those compound
    literals. And even then, those do not allow arbitrary statements.

    Here is a summary of C vs my language.

    In C, 0 or 1 branches will be evaluated (except for ?: where it is
    always 1.)

    In M, 0 or 1 branches are evaluated, unless it yields a value or lvalue,
    then it must be 1 (and those need an 'else' branch):

    C M

    if-else branches can be exprs/stmts Y Y
    if-else can yield a value N Y
    if-else can be an lvalue N Y

    ?: branches can be exprs/stmts Y Y (M's is a form of if)
    ?: can yield a value Y Y
    ?: can be an lvalue N Y (Only possible in C++)

    switch branches can have expr/stmts Y Y
    switch can yield a value N Y
    switch can be an lvalue N Y

    select can have exprs/stmts - Y (Does not exist in C)
    select can yield a value - Y
    select can be an lvalue - Y

    case-select has exprs/stmts - Y
    case-select can yield a value - Y
    case-select can be an lvalue - Y

    15 Ys in the M column, vs 4 Ys in the C column, with only 1 for value-returning. You can see why C users might be less familiar with the concepts.

    I am very keen on keeping the concepts distinct in cases where it
    matters.

    I know, you like to mix things up. I like clear lines:

    func F:int ... Always returns a value
    proc P ... Never returns a value


    ÿÿÿ int x = if (randomfloat()<0.5) 42;


    In C, no.ÿ But when we have spread to other languages, including hypothetical languages, there's nothing to stop that.ÿ Not only could it
    be supported by the run-time type system, but it would be possible to
    have compile-time types that are more flexible

    This is a program from my 1990s scripting language which was part of my
    CAD application:

    n := 999
    x := (n | 10, 20, 30)
    println x

    This uses N-way select (and evaluating only one option!). But there is
    no default path (it is added by the bytecode compiler).

    The output, since n is out of range, is this:

    <Void>

    In most arithmetic, using a void value is an error, so it's likely to
    soon go wrong. I now require a default branch, as that is safer.


    and only need to be
    "solidified" during code generation.ÿ That might allow the language to
    track things like "uninitialised" or "no value" during compilation
    without having them part of a real type (such as std::optional<> or a C

    But you are always returning an actual type in agreement with the
    language. That is my point. You're not choosing to just fall off that
    cliff and return garbage or just crash.

    However, your example with std::optional did just that, despite having
    that type available.

    It doesn't return a value.ÿ That is why it is UB to try to use that non-existent value.

    And why it is so easy to avoid that UB.

    My language will not allow it. Most people would say that that's a
    good thing. You seem to want to take the perverse view that such code
    should be allowed to return garbage values or have undefined behaviour.

    Is your idea of "most people" based on a survey of more than one person?

    So, you're suggesting that "most people" would prefer a language that
    lets you do crazy, unsafe things for no good reason? That is, unless you prefer to fall off that cliff I keep talking about.

    The fact is, I spend a lot of time implementing this stuff, but I
    wouldn't even know how to express some of the odd things in C. My
    current C compiler uses a stack-based IL. Given this:

    #include <stdio.h>

    int F(void){}

    int main(void) {
    int a;
    a=F();
    printf("%d\n", a);
    }

    It just about works when generating native code (I'm not quite sure
    how); but it just returns whatever garbage is in the register:

    c:\cxp>cc -run t # here runs t.c as native code in memory
    1900545

    But the new compiler can also directly interpret that stack IL:

    c:\cxp>cc -runp t
    PC Exec error: RETF/SP mismatch: old=3 curr=2 seqno: 7

    The problem is that the call-function handling expects a return value to
    have been pushed. But nothing has been pushed in this case. And the
    language doesn't allow me to detect that.

    (My compiler could detect some cases, but not all, and even it it could,
    it would report false positives of a missing return, for functions that
    did always return early.)

    So this is a discontinuity in the language, a schism, an exception that shouldn't be there. It's unnatural. It looked off to me, and it is off
    in practice, so it's not just an opinion.

    To fix this would require my always pushing some dummy value at the
    closing } of the function, if the operand stack is empty at that point.

    Which is sort of what you are claiming you don't want to impose on the programmer. But it looks like it's needed anyway, otherwise the function
    is out of kilter.


    Note that I have not suggested returning garbage values - I have
    suggested that a language might support handling "no value" in a
    convenient and safe manner.

    But in C it is garbage. And I've show an example of my language handling
    'no value' in a scheme from the 1990s; I decided to require an explicit
    'else' branch, which you seem to think is some kind of imposition.

    Well, it won't kill you, and it can make programs more failsafe. It is
    also friendly to compilers that aren't 100MB monsters.

    Totally independent of and orthogonal to that, I strongly believe that
    there is no point in trying to define behaviour for something that
    cannot happen,

    But it could for n==4.

    With justification. 0010 means 8 in C? Jesus.


    I think the word "neighbour" is counter-intuitive to spell.

    EVERYBODY agrees that leading zero octals in C were a terrible idea. You
    can't say it's just me thinks that!

    Once a thread here has wandered this far off-topic, it is perhaps not unreasonable to draw comparisons with your one-man language.

    Suppose I'd made my own hammer. The things I'd use it for are not going
    to that different: hammering in nails, pulling them out, or generally
    bashing things about.

    As I said, the things my language does are universal. The way it does
    them are better thought out and tidier.

    The real problem with your language is that you think it is perfect

    Compared with C, it a huge improvement. Compared with most other modern languages, 95% of what people expect now is missing.

    ÿÿÿ int F() {
    ÿÿÿÿÿÿÿ F(1, 2.3, "four", F,F,F,F(),F(F()));
    ÿÿÿÿÿÿÿ F(42);

    It is undefined behaviour in C.ÿ Programmers are expected to write
    sensible code.

    But it would be nice if the language stopped people writing such things,
    yes?

    Can you tell me which other current languages, other than C++ and
    assembly, allow such nonsense?

    None? So it's not just me and my language then! Mine is lower level and
    still plenty unsafe, but it has somewhat higher standards.

    If I were the designer of the C language and the maintainer of the C standards, you might have a point.ÿ C is not /my/ language.

    You do like to defend it though.


    We can agree that C /lets/ people write messy code.ÿ It does not
    /require/ it.ÿ And I have never found a programming language that stops people writing messy code.

    I had included half a dozen points that made C's 'if' error prone and confusing, that would not occur in my syntax because it is better designed.

    You seem to be incapable of drawing a line beween what a language can
    enforce, and what a programmer is free to express.

    Or rather, because a programmer has so much freedom anyway, let's not
    bother with any lines at all! Just have a language that simply doesn't care.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 5 08:48:12 2024
    On 04/11/2024 20:50, Bart wrote:
    On 04/11/2024 16:35, David Brown wrote:
    On 03/11/2024 21:00, Bart wrote:

    To my mind, this is a type of "multi-way selection" :

    ÿÿÿÿÿ(const int []){ a, b, c }[n];

    I can't see any good reason to exclude it as fitting the descriptive
    phrase.


    And if "a", "b" and "c" are not constant, but require evaluation of
    some sort, it does not change things.ÿ Of course if these required
    significant effort to evaluate,

    Or you had a hundred of them.

    or had side-effects, then you would most likely want a "multi-way
    selection" construction that did the selection first, then the
    evaluation - but that's a matter of programmer choice, and does not
    change the terms.


    You still don't get how different the concepts are.

    Yes, I do. I also understand how they are sometimes exactly the same
    thing, depending on the language, and how they can often have the same
    end result, depending on the details, and how they can often be
    different, especially in the face of side-effects or efficiency concerns.

    Look, it's really /very/ simple.

    A) You can have a construct that says "choose one of these N things to
    execute and evaluate, and return that value (if any)".

    B) You can have a construct that says "here are N things, select one of
    them to return as a value".

    Both of these can reasonably be called "multi-way selection" constructs.
    Some languages can have one as a common construct, other languages may
    have the other, and many support both in some way. Pretty much any
    language that allows the programmer to have control over execution order
    will let you do both in some way, even if there is not a clear language construct for it and you have to write it manually in code.

    Mostly type A will be most efficient if there is a lot of effort
    involved in putting together the things to select. Type B is likely to
    be most efficient if you already have the collection of things to choose
    from (it can be as simple as an array lookup), if the creation of the collection can be done in parallel (such as in some SIMD uses), or if
    the cpu can generate them all before it has established the selection index.

    Sometimes type A will be the simplest and clearest in the code,
    sometimes type B will be the simplest and clearest in the code.

    Both of these constructs are "multi-way selections".


    Your mistake is in thinking that type A is all there is and all that
    matters, possibly because you feel you have a better implementation for
    it than C has. (I think that you /do/ have a nicer switch than C, but
    that does not justify limiting your thinking to it.)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 5 09:25:32 2024
    On 04/11/2024 20:50, Bart wrote:
    On 04/11/2024 16:35, David Brown wrote:
    On 03/11/2024 21:00, Bart wrote:


    Here is a summary of C vs my language.


    <snip the irrelevant stuff>


    I am very keen on keeping the concepts distinct in cases where it
    matters.

    I know, you like to mix things up. I like clear lines:

    ÿ func F:int ...ÿÿÿÿÿÿÿÿÿÿÿÿÿ Always returns a value
    ÿ proc Pÿ ...ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Never returns a value



    Oh, you /know/ that, do you? And how do you "know" that? Is that
    because you still think I am personally responsible for the C language,
    and that I think C is the be-all and end-all of perfect languages?

    I agree that it can make sense to divide different types of "function".
    I disagree that whether or not a value is returned has any significant relevance. I see no difference, other than minor syntactic issues,
    between "int foo(...)" and "void foo(int * result, ...)".

    A much more useful distinction would be between Malcolm-functions and Malcolm-procedures. "Malcolm-functions" are "__attribute__((const))" in
    gcc terms or "[[unsequenced]]" in C23 terms (don't blame me for the
    names here). In other words, they have no side-effects and their
    result(s) are based entirely on their inputs. "Malcolm-procedures" can
    have side-effects and interact with external data. I would possibly add
    to that "meta-functions" that deal with compile-time information -
    reflection, types, functions, etc.


    and only need to be "solidified" during code generation.ÿ That might
    allow the language to track things like "uninitialised" or "no value"
    during compilation without having them part of a real type (such as
    std::optional<> or a C

    But you are always returning an actual type in agreement with the
    language. That is my point. You're not choosing to just fall off that
    cliff and return garbage or just crash.

    However, your example with std::optional did just that, despite having
    that type available.

    It doesn't return a value.ÿ That is why it is UB to try to use that
    non-existent value.

    And why it is so easy to avoid that UB.

    I agree. I think C gets this wrong. That's why I, and pretty much all
    other C programmers, use a subset of C that disallows falling off the
    end of a function with a non-void return type. Thus we avoid that UB.

    (The only reason it is acceptable syntax in C, AFAIK, is because early versions of C had "default int" everywhere - there were no "void"
    functions.)


    Note that I have not suggested returning garbage values - I have
    suggested that a language might support handling "no value" in a
    convenient and safe manner.

    But in C it is garbage.

    Note that /I/ have not suggested returning garbage values.

    I have not said that I think C is defined in a good way here. You are,
    as so often, mixing up what people say they like with what C does (or
    what you /think/ C does, as you are often wrong). And as usual you mix
    up people telling you what C does with what people think is a good idea
    in a language.


    Totally independent of and orthogonal to that, I strongly believe that
    there is no point in trying to define behaviour for something that
    cannot happen,

    But it could for n==4.

    Again, you /completely/ miss the point.

    If you have a function (or construct) that returns a correct value for
    inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
    then there is no undefined behaviour no matter what the code looks like
    for values other than 1, 2 and 3. If someone calls that function with
    input 4, then /their/ code has the error - not the code that doesn't
    handle an input 4.


    EVERYBODY agrees that leading zero octals in C were a terrible idea. You can't say it's just me thinks that!

    I agree that this a terrible idea. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>

    But picking one terrible idea in C does not mean /everything/ in C is a terrible idea! /That/ is what you got wrong, as you do so often.



    ÿÿÿ int F() {
    ÿÿÿÿÿÿÿ F(1, 2.3, "four", F,F,F,F(),F(F()));
    ÿÿÿÿÿÿÿ F(42);

    It is undefined behaviour in C.ÿ Programmers are expected to write
    sensible code.

    But it would be nice if the language stopped people writing such things, yes?

    Sure. That's why sane programmers use decent tools - the language might
    not stop them writing this, but the tools do.


    Can you tell me which other current languages, other than C++ and
    assembly, allow such nonsense?

    Python.

    Of course, it is equally meaningless in Python as it is in C.



    None? So it's not just me and my language then! Mine is lower level and still plenty unsafe, but it has somewhat higher standards.

    If I were the designer of the C language and the maintainer of the C
    standards, you might have a point.ÿ C is not /my/ language.

    You do like to defend it though.

    I defend it if that is appropriate. Mostly, I /explain/ it to you. It
    is bizarre that people need to do that for someone who claims to have
    written a C compiler, but there it is.



    We can agree that C /lets/ people write messy code.ÿ It does not
    /require/ it.ÿ And I have never found a programming language that
    stops people writing messy code.

    I had included half a dozen points that made C's 'if' error prone and confusing, that would not occur in my syntax because it is better designed.


    I'm glad you didn't - it would be a waste of effort.

    You seem to be incapable of drawing a line beween what a language can enforce, and what a programmer is free to express.


    I can't see how you could reach that conclusion.

    Or rather, because a programmer has so much freedom anyway, let's not
    bother with any lines at all! Just have a language that simply doesn't
    care.


    You /do/ understand that I use top-quality tools with carefully chosen warnings, set to throw fatal errors, precisely because I want a language
    that has a lot more "lines" and restrictions that your little tools?
    /Every/ C programmer uses a restricted subset of C - some more
    restricted than others. I choose to use a very strict subset of C for
    my work, because it is the best language for the tasks I need to do. (I
    also use a very strict subset of C++ when it is a better choice.)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 5 10:44:34 2024
    On 04/11/2024 22:25, David Brown wrote:
    On 04/11/2024 20:50, Bart wrote:

    But it could for n==4.

    Again, you /completely/ miss the point.

    If you have a function (or construct) that returns a correct value for inputs 1, 2 and 3, and you never pass it the value 4 (or anything else), then there is no undefined behaviour no matter what the code looks like
    for values other than 1, 2 and 3.ÿ If someone calls that function with
    input 4, then /their/ code has the error - not the code that doesn't
    handle an input 4.

    This is the wrong kind of thinking.

    If this was a library function then, sure, you can stipulate a set of
    input values, but that's at a different level, where you are writing
    code on top of a working, well-specified language.

    You don't make use of holes in the language, one that can cause a crash.
    That is, by allowing a function to run into an internal RET op with no provision for a result. That's if there even is a RET; perhaps your
    compilers are so confident that that path is not taken, or you hint it
    won't be, that they won't bother!

    It will start executing whatever random bytes follow the function.

    As I said in my last post, a missing return value caused an internal
    error in one of my C implementations because a pushed return value was missing.

    How should that be fixed, via a hack in the implementation which pushes
    some random value to avoid an immediate crash? And then what?

    Let the user - the author of the function - explicitly provide that
    value then at least that can be documented: if N isn't in 1..3, then F
    returns so and so.

    You know that makes perfect sense, but because you've got used to that dangerous feature in C you think it's acceptable.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 5 13:11:46 2024
    On 04/11/2024 21:48, David Brown wrote:
    On 04/11/2024 20:50, Bart wrote:
    On 04/11/2024 16:35, David Brown wrote:
    On 03/11/2024 21:00, Bart wrote:

    To my mind, this is a type of "multi-way selection" :

    ÿÿÿÿÿ(const int []){ a, b, c }[n];

    I can't see any good reason to exclude it as fitting the descriptive
    phrase.


    And if "a", "b" and "c" are not constant, but require evaluation of
    some sort, it does not change things.ÿ Of course if these required
    significant effort to evaluate,

    Or you had a hundred of them.

    or had side-effects, then you would most likely want a "multi-way
    selection" construction that did the selection first, then the
    evaluation - but that's a matter of programmer choice, and does not
    change the terms.


    You still don't get how different the concepts are.

    Yes, I do.ÿ I also understand how they are sometimes exactly the same
    thing, depending on the language, and how they can often have the same
    end result, depending on the details, and how they can often be
    different, especially in the face of side-effects or efficiency concerns.

    Look, it's really /very/ simple.

    A) You can have a construct that says "choose one of these N things to execute and evaluate, and return that value (if any)".

    B) You can have a construct that says "here are N things, select one of
    them to return as a value".

    Both of these can reasonably be called "multi-way selection" constructs.
    ÿSome languages can have one as a common construct, other languages may have the other, and many support both in some way.ÿ Pretty much any
    language that allows the programmer to have control over execution order will let you do both in some way, even if there is not a clear language construct for it and you have to write it manually in code.

    Mostly type A will be most efficient if there is a lot of effort
    involved in putting together the things to select.ÿ Type B is likely to
    be most efficient if you already have the collection of things to choose from (it can be as simple as an array lookup), if the creation of the collection can be done in parallel (such as in some SIMD uses), or if
    the cpu can generate them all before it has established the selection
    index.

    Sometimes type A will be the simplest and clearest in the code,
    sometimes type B will be the simplest and clearest in the code.

    Both of these constructs are "multi-way selections".


    Your mistake is in thinking that type A is all there is and all that matters, possibly because you feel you have a better implementation for
    it than C has.ÿ (I think that you /do/ have a nicer switch than C, but
    that does not justify limiting your thinking to it.)


    You STILL don't get it. Suppose this wasn't about returning a value, but executing one piece of code from a conditional set of statements.

    In C that might be using an if/else chain, or switch. Other languages
    might use a match statement.

    Universally only one of those pieces of code will be evaluated. Unless
    you can point me to a language where, in IF C THEN A ELSE B, *both* A
    and B statements are executed.

    Do you agree so far? If so call that Class I.

    Do you also agree that languages have data stuctures, and those often
    have constructors that will build a data structure element by element?
    So all elements necessarily have to be evaluated. (Put aside selecting
    one for now; that is a separate matter).

    Call that Class II.

    What my languages do, is that ALL the constructs in Class I that are
    commonly used to execute one of N branches, can also return values.
    (Which can require each branch to yield a type compatible with all the
    others; another separate matter.)

    Do you now see why it is senseless for my 'multi-way' selections to work
    any other way. It would mean that:

    x := if C then A else B fi

    really could both evaluate A and B whatever the value of C! Whatever
    that IF construct does here, has to do the same even without that 'x :='
    a the start.

    Of course, I support the sorts of indexing, of an existing or
    just-created data structure, that belong in Class II.

    Although it would not be particularly efficient to do this:

    (f1(), f2(), .... f100())[100] # (1-based)

    Since you will execute 100 functions rather than just one. But perhaps
    there is a good reason for it. In that is needed, then the construct exists.

    Another diference between Class I (when used to yield values) and Class
    II, is that an out-of-bounds selector in Part II either yields a runtime
    error (or raises an exception), or may just go wrong in my lower-level language.

    But in Class I, the selector is either range-checked or falls off the
    end of a test sequence, and a default value is provided.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 5 19:26:24 2024
    On 05/11/2024 00:44, Bart wrote:
    On 04/11/2024 22:25, David Brown wrote:
    On 04/11/2024 20:50, Bart wrote:

    But it could for n==4.

    Again, you /completely/ miss the point.

    If you have a function (or construct) that returns a correct value for
    inputs 1, 2 and 3, and you never pass it the value 4 (or anything
    else), then there is no undefined behaviour no matter what the code
    looks like for values other than 1, 2 and 3.ÿ If someone calls that
    function with input 4, then /their/ code has the error - not the code
    that doesn't handle an input 4.

    This is the wrong kind of thinking.

    If this was a library function then, sure, you can stipulate a set of
    input values, but that's at a different level, where you are writing
    code on top of a working, well-specified language.

    You don't make use of holes in the language, one that can cause a crash. That is, by allowing a function to run into an internal RET op with no provision for a result. That's if there even is a RET; perhaps your compilers are so confident that that path is not taken, or you hint it
    won't be, that they won't bother!

    It will start executing whatever random bytes follow the function.

    As I said in my last post, a missing return value caused an internal
    error in one of my C implementations because a pushed return value was missing.

    How should that be fixed, via a hack in the implementation which pushes
    some random value to avoid an immediate crash? And then what?

    Let the user - the author of the function - explicitly provide that
    value then at least that can be documented: if N isn't in 1..3, then F returns so and so.

    You know that makes perfect sense, but because you've got used to that dangerous feature in C you think it's acceptable.



    I am a serious programmer. I write code for use by serious programmers.
    I don't write code that is bigger and slower for the benefit of some half-wit coder that won't read the relevant documentation or rub a
    couple of brain cells together. I have no time for hand-holding and spoon-feeding potential users of my functions - if someone wants to use play-dough plastic knives, they should not have become a programmer.

    My programming stems from mathematics, not from C, and from an education
    in developing provably correct code. I don't try to calculate the log
    of 0, and I don't expect the mathematical log function to give me some "default" value if I try. The same applies to my code.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Tue Nov 5 23:42:34 2024
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means branching, even if notionally, on one-of-N possible code paths.

    OK.

    The whole construct may or may not return a value. If it does, then one
    of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From fir@3:633/280.2 to Waldek Hebisch on Wed Nov 6 00:23:04 2024
    To: Waldek Hebisch <antispam@fricas.org>

    Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means
    branching, even if notionally, on one-of-N possible code paths.

    OK.

    The whole construct may or may not return a value. If it does, then one
    of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    in fact when consider in mind or see on assembly level the
    implementation of switch not necessary need "default"
    patch (which shopuld be named "other" btw)

    it has two natural ways
    1) ignore them
    2) signal an runtime error

    (both are kinda natural)


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Wed Nov 6 00:29:21 2024
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means
    branching, even if notionally, on one-of-N possible code paths.

    OK.

    I appreciate this is what Bart means by that phrase, but I don't agree
    with it. I'm not sure if that is covered by "OK" or not!


    The whole construct may or may not return a value. If it does, then one
    of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    I think this is all very dependent on what you mean by "all input values".

    Supposing I declare this function:

    // Return the integer square root of numbers between 0 and 10
    int small_int_sqrt(int x);


    To me, the range of "all input values" is integers from 0 to 10. I
    could implement it as :

    int small_int_sqrt(int x) {
    if (x == 0) return 0;
    if (x < 4) return 1;
    if (x < 9) return 2;
    if (x < 16) return 3;
    unreachable();
    }

    If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
    /their/ fault and /their/ problem. I said nothing about what would
    happen in those cases.

    But some people seem to feel that "all input values" means every
    possible value of the input types, and thus that a function like this
    should return a value even when there is no correct value in and no
    correct value out.

    This is, IMHO, just nonsense and misunderstands the contract between
    function writers and function users.

    Further, I am confident that these people are quite happen to write code
    like :

    // Take a pointer to an array of two ints, add them, and return the sum
    int sum_two_ints(const int * p) {
    return p[0] + p[1];
    }

    Perhaps, in a mistaken belief that it makes the code "safe", they will add :

    if (!p) return 0;

    at the start of the function. But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
    for integer overflow (and what would they do if it happened?).



    A function should accept all input values - once you have made clear
    what the acceptable input values can be. A "default" case is just a
    short-cut for conveniently handling a wide range of valid input values -
    it is never a tool for handling /invalid/ input values.





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Wed Nov 6 00:50:34 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 02.11.2024 19:09, Tim Rentsch wrote:

    [...] As long as
    the code is logically correct you are free to choose either
    style, and it's perfectly okay to choose the one that you find
    more appealing.

    This is certainly true for one-man-shows.

    The question asked concerned code in an individual programming
    effort. I was addressing the question that was asked.

    Hardly suited for most professional contexts I worked in.

    Note that the pronoun "you" is plural as well as singular. The
    conclusion applies to groups just as it does to individuals.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Wed Nov 6 01:11:18 2024
    Bart <bc@freeuk.com> writes:

    On 04/11/2024 04:00, Tim Rentsch wrote:

    fir <fir@grunge.pl> writes:

    Tim Rentsch wrote:

    With the understanding that I am offering [nothing] more than my
    own opinion, I can say that I might use any of the patterns
    mentioned, depending on circumstances. I don't think any one
    approach is either always right or always wrong.

    maybe, but some may heve some strong arguments (for use this and
    not that) i may overlook

    I acknowledge the point, but you haven't gotten any arguments,
    only opinions.

    Pretty much everything about PL design is somebody's opinion.

    First, the discussion is not about language design but language
    usage.

    Second, the idea that "pretty much everything" about language usage
    is just opinion is simply wrong (that holds for language design
    also). Most of what is offered in newsgroups is just opinion, but
    there are plenty of objective statements that could be made also.
    Posters in the newsgroup here rarely make such statements, mostly I
    think because they don't want to be bothered to make the effort to
    research the issues. But that doesn't mean there isn't much to say
    about such things; there is plenty to say, but for some strange
    reason the people posting in comp.lang.c think their opinions offer
    more value than statements of objective fact.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 6 02:03:54 2024
    On 05/11/2024 12:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means
    branching, even if notionally, on one-of-N possible code paths.

    OK.

    The whole construct may or may not return a value. If it does, then one
    of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    What's easier to implement in a language: to have a conditional need for
    an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
    of expressions...

    ....or to just always require 'else', with a dummy value if necessary?

    Even if you went with the first, what happens if the compiler can't
    guarantee that all values of a selector are covered; should it report
    that, or say nothing?

    What happens if you do need 'else', but later change things so all bases
    are covered; will the compiler report it as being unnecesary, so that
    you remove it?


    Now, C doesn't have such a feature to test out (ie. that is a construct
    with an optional 'else' branch, the whole of which returns a value). The nearest is function return values:

    int F(int n) {
    if (n==1) return 10;
    if (n==2) return 20;
    }

    Here, neither tcc not gcc report that you might run into the end of the function. It will return garbage if called with anything other than 1 or 2.

    gcc will say something with enough warning levels (reaches end of
    non-void function). But it will say the same here:

    int F(unsigned char c) {
    if (c<128) return 10;
    if (c>=128) return 20;
    }




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Wed Nov 6 03:02:04 2024
    On 05/11/2024 16:03, Bart wrote:
    On 05/11/2024 12:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.

    OK.
    The whole construct may or may not return a value. If it does, then one
    of the N paths must be a default path.


    You need to cover all input values.ÿ This is possible when there
    is reasonably small number of possibilities.ÿ For example, switch on
    char variable which covers all possible values does not need default
    path.ÿ Default is needed only when number of possibilities is too
    large to explicitely give all of them.ÿ And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    What's easier to implement in a language: to have a conditional need for
    an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
    of expressions...

    ...or to just always require 'else', with a dummy value if necessary?

    If this was a discussion on learning about compiler design for newbies,
    that might be a relevant point. Otherwise, what is easier to implement
    in a language tool is completely irrelevant to what is good in a language.

    A language should try to support things that are good for the
    /programmer/, not the compiler. But it does have to limited by what is practically possible for a compiler. A fair bit of the weaknesses of C
    as a language can be attributed to the limitations of compilers from its
    early days, and thereafter existing practice was hard to change.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Wed Nov 6 06:39:21 2024
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.

    OK.

    I appreciate this is what Bart means by that phrase, but I don't agree
    with it. I'm not sure if that is covered by "OK" or not!

    You may prefer your own definition, but Bart's is resonable one.


    The whole construct may or may not return a value. If it does, then one
    of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    I think this is all very dependent on what you mean by "all input values".

    Supposing I declare this function:

    // Return the integer square root of numbers between 0 and 10
    int small_int_sqrt(int x);


    To me, the range of "all input values" is integers from 0 to 10. I
    could implement it as :

    int small_int_sqrt(int x) {
    if (x == 0) return 0;
    if (x < 4) return 1;
    if (x < 9) return 2;
    if (x < 16) return 3;
    unreachable();
    }

    If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's /their/ fault and /their/ problem. I said nothing about what would
    happen in those cases.

    But some people seem to feel that "all input values" means every
    possible value of the input types, and thus that a function like this
    should return a value even when there is no correct value in and no
    correct value out.

    Well, some languages treat types more seriously than C. In Pascal
    type of your input would be 0..10 and all input values would be
    handled. Sure, when domain is too complicated to express in type
    than it could be documented restriction. Still, it makes sense to
    signal error if value goes outside handled rage, so in a sense all
    values of input type are handled: either you get valid answer or
    clear error.

    This is, IMHO, just nonsense and misunderstands the contract between function writers and function users.

    Further, I am confident that these people are quite happen to write code like :

    // Take a pointer to an array of two ints, add them, and return the sum
    int sum_two_ints(const int * p) {
    return p[0] + p[1];
    }

    I do not think that people wanting strong type checking are happy
    with C. Simply, either they use different language or use C
    without bitching, but aware of its limitations. I certainly would
    be quite unhappy with code above. It is possible that I would still
    use it as a compromise (say if it was desirable to have single
    prototype but handle points in spaces of various dimensions),
    but my first attempt would be something like:

    typedef struct {int p[2];} two_int;
    .....

    Perhaps, in a mistaken belief that it makes the code "safe", they will add :

    if (!p) return 0;

    at the start of the function. But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
    for integer overflow (and what would they do if it happened?).

    I am certainly unhappy with overflow handling in current hardware
    an by extention with overflow handling in C.

    A function should accept all input values - once you have made clear
    what the acceptable input values can be. A "default" case is just a short-cut for conveniently handling a wide range of valid input values -
    it is never a tool for handling /invalid/ input values.

    Well, default can signal error which frequently is right handling
    of invalid input values.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Wed Nov 6 06:53:12 2024
    Bart <bc@freeuk.com> wrote:
    On 05/11/2024 12:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.

    OK.

    The whole construct may or may not return a value. If it does, then one
    of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    What's easier to implement in a language: to have a conditional need for
    an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
    of expressions...

    ...or to just always require 'else', with a dummy value if necessary?

    Well, frequently it is easier to do bad job, than a good one. However, normally you do not need very complex analysis: if simple analysis
    is not enough, then first thing to do is to simpliy the program.
    And in cases where problem to solve is really hard and program can
    not be simplified ("irreducible complexity"), then it is time for
    cludges for example in form of default case. But it should not
    be the norm.

    Even if you went with the first, what happens if the compiler can't guarantee that all values of a selector are covered; should it report
    that, or say nothing?

    Compile time error.

    What happens if you do need 'else', but later change things so all bases
    are covered; will the compiler report it as being unnecesary, so that
    you remove it?

    When practical, yes.

    Now, C doesn't have such a feature to test out (ie. that is a construct
    with an optional 'else' branch, the whole of which returns a value). The nearest is function return values:

    int F(int n) {
    if (n==1) return 10;
    if (n==2) return 20;
    }

    Here, neither tcc not gcc report that you might run into the end of the function. It will return garbage if called with anything other than 1 or 2.

    Hmm, using gcc-12 with your code in "foo.c":

    gcc -Wall -O3 -c foo.c
    foo.c: In function ‘F’:
    foo.c:4:1: warning: control reaches end of non-void function [-Wreturn-type]
    4 | }
    | ^


    gcc will say something with enough warning levels (reaches end of
    non-void function). But it will say the same here:

    int F(unsigned char c) {
    if (c<128) return 10;
    if (c>=128) return 20;
    }

    Indeed, it says the same. Somebody should report this as a bug.
    IIUC gcc has all machinery needed to detect that all cases are
    covered.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Wed Nov 6 07:33:55 2024
    On 05/11/2024 20:39, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means >>>> branching, even if notionally, on one-of-N possible code paths.

    OK.

    I appreciate this is what Bart means by that phrase, but I don't agree
    with it. I'm not sure if that is covered by "OK" or not!

    You may prefer your own definition, but Bart's is resonable one.

    The only argument I can make here is that I have not seen "multi-way
    select" as a defined phrase with a particular established meaning. So
    it simply means what the constituent words mean - selecting something
    from multiple choices. There are no words in that phrase that talk
    about "branching", or imply a specific order to events. It is a very
    general and vague phrase, and I cannot see a reason to assume it has
    such a specific meaning as Bart wants to assign to it. And as I have
    pointed out in other posts, there are constructs in many languages
    (including C) that fit the idea of a selection from one of many things,
    but which do not fit Bart's specific interpretation of the phrase.

    Bart's interpretation is "reasonable" in the sense of being definable
    and consistent, or at least close enough to that to be useable in a discussion. But neither he, I, or anyone else gets to simply pick a
    meaning for such a phrase and claim it is /the/ definition. Write a
    popular and influential book with this as a key phrase, and /then/ you
    can start calling your personal definition "the correct" definition.



    The whole construct may or may not return a value. If it does, then one >>>> of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    I think this is all very dependent on what you mean by "all input values". >>
    Supposing I declare this function:

    // Return the integer square root of numbers between 0 and 10
    int small_int_sqrt(int x);


    To me, the range of "all input values" is integers from 0 to 10. I
    could implement it as :

    int small_int_sqrt(int x) {
    if (x == 0) return 0;
    if (x < 4) return 1;
    if (x < 9) return 2;
    if (x < 16) return 3;
    unreachable();
    }

    If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
    /their/ fault and /their/ problem. I said nothing about what would
    happen in those cases.

    But some people seem to feel that "all input values" means every
    possible value of the input types, and thus that a function like this
    should return a value even when there is no correct value in and no
    correct value out.

    Well, some languages treat types more seriously than C. In Pascal
    type of your input would be 0..10 and all input values would be
    handled. Sure, when domain is too complicated to express in type
    than it could be documented restriction. Still, it makes sense to
    signal error if value goes outside handled rage, so in a sense all
    values of input type are handled: either you get valid answer or
    clear error.

    No, it does not make sense to do that. Just because the C language does
    not currently (maybe once C++ gets contracts, C will copy them) have a
    way to specify input sets other than by types, does not mean that
    functions in C always have a domain matching all possible combinations
    of bits in the underlying representation of the parameter's types.

    It might be a useful fault-finding aid temporarily to add error messages
    for inputs that are invalid but can physically be squeezed into the parameters. That won't stop people making incorrect declarations of the function and passing completely different parameter types to it, or
    finding other ways to break the requirements of the function.

    And in general there is no way to check the validity of the inputs - you usually have no choice but to trust the caller. It's only in simple
    cases, like the example above, that it would be feasible at all.


    There are, of course, situations where the person calling the function
    is likely to be incompetent, malicious, or both, and where there can be serious consequences for what you might prefer to consider as invalid
    input values. You have that for things like OS system calls - it's no different than dealing with user inputs or data from external sources.
    But you handle that by extending the function - increase the range of
    valid inputs and appropriate outputs. You no longer have a function
    that takes a number between 0 and 10 and returns the integer square root
    - you now have a function that takes a number between -(2 ^ 31 + 1) and
    (2 ^ 31) and returns the integer square root if the input is in the
    range 0 to 10 or halts the program with an error message for other
    inputs in the wider range. It's a different function, with a wider set
    of inputs - and again, it is specified to give particular results for particular inputs.



    This is, IMHO, just nonsense and misunderstands the contract between
    function writers and function users.

    Further, I am confident that these people are quite happen to write code
    like :

    // Take a pointer to an array of two ints, add them, and return the sum
    int sum_two_ints(const int * p) {
    return p[0] + p[1];
    }

    I do not think that people wanting strong type checking are happy
    with C. Simply, either they use different language or use C
    without bitching, but aware of its limitations.

    Sure. C doesn't give as much help to writing correct programs as some
    other languages. That doesn't mean the programmer can't do the right thing.

    I certainly would
    be quite unhappy with code above. It is possible that I would still
    use it as a compromise (say if it was desirable to have single
    prototype but handle points in spaces of various dimensions),
    but my first attempt would be something like:

    typedef struct {int p[2];} two_int;
    ....


    I think you'd quickly find that limiting and awkward in C (but it might
    be appropriate in other languages). But don't misunderstand me - I am
    all in favour of finding ways in code that make input requirements
    clearer or enforceable within the language - never put anything in
    comments if you can do it in code. You could reasonably do this in C
    for the first example :


    // Do not use this directly
    extern int small_int_sqrt_implementation(int x);


    // Return the integer square root of numbers between 0 and 10
    static inline int small_int_sqrt(int x) {
    assert(x >= 0 && x <= 10);
    return small_int_sqrt_implementation(x);
    }


    There is no way to check the validity of pointers in C, but you might at
    least be able to use implementation-specific extensions to declare the function with the requirement that the pointer not be null.


    Perhaps, in a mistaken belief that it makes the code "safe", they will add : >>
    if (!p) return 0;

    at the start of the function. But they will not check that "p" actually
    points to an array of two ints (how could they?), nor will they check
    for integer overflow (and what would they do if it happened?).

    I am certainly unhappy with overflow handling in current hardware
    an by extention with overflow handling in C.

    A function should accept all input values - once you have made clear
    what the acceptable input values can be. A "default" case is just a
    short-cut for conveniently handling a wide range of valid input values -
    it is never a tool for handling /invalid/ input values.

    Well, default can signal error which frequently is right handling
    of invalid input values.


    Will that somehow fix the bug in the code that calls the function?

    It can be a useful debugging and testing aid, certainly, but it does not
    make the code "correct" or "safe" in any sense.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 6 09:48:28 2024
    On 05/11/2024 13:29, David Brown wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:


    Supposing I declare this function:

    // Return the integer square root of numbers between 0 and 10
    int small_int_sqrt(int x);


    To me, the range of "all input values" is integers from 0 to 10.ÿ I
    could implement it as :

    int small_int_sqrt(int x) {
    ÿÿÿÿif (x == 0) return 0;
    ÿÿÿÿif (x < 4) return 1;
    ÿÿÿÿif (x < 9) return 2;
    ÿÿÿÿif (x < 16) return 3;
    ÿÿÿÿunreachable();
    }


    If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's /their/ fault and /their/ problem.ÿ I said nothing about what would
    happen in those cases.

    But some people seem to feel that "all input values" means every
    possible value of the input types, and thus that a function like this
    should return a value even when there is no correct value in and no
    correct value out.

    Your example is an improvement on your previous ones. At least it
    attempts to deal with out-of-range conditions!

    However there is still the question of providing that return type. If 'unreachable' is not a special language feature, then this can fail
    either if the language requires the 'return' keyword, or 'unreachable'
    doesn't yield a compatible type (even if it never returns because it's
    an error handler).

    Getting that right will satisfy both the language (if it cared more
    about such matters than C apparently does), and the casual reader
    curious about how the function contract is met (that is, supplying that promised return int type if or when it returns).

    // Take a pointer to an array of two ints, add them, and return the sum
    int sum_two_ints(const int * p) {
    ÿÿÿÿreturn p[0] + p[1];
    }

    Perhaps, in a mistaken belief that it makes the code "safe", they will
    add :

    ÿÿÿÿif (!p) return 0;

    at the start of the function.ÿ But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
    for integer overflow (and what would they do if it happened?).

    This is a different category of error.

    Here's a related example of what I'd class as a language error:

    int a;
    a = (exit(0), &a);

    A type mismatch error is usually reported. However, the assignment is
    never done because it never returns from that exit() call.

    I expect you wouldn't think much of a compiler that didn't report such
    an error because that code is never executed.

    But to me that is little different from running into the end of function without the proper provisions for a valid return value.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 6 10:01:44 2024
    On 05/11/2024 20:33, David Brown wrote:
    On 05/11/2024 20:39, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it
    means
    branching, even if notionally, on one-of-N possible code paths.

    OK.

    I appreciate this is what Bart means by that phrase, but I don't agree
    with it.ÿ I'm not sure if that is covered by "OK" or not!

    You may prefer your own definition, but Bart's is resonable one.

    The only argument I can make here is that I have not seen "multi-way
    select" as a defined phrase with a particular established meaning.

    Well, it started off as 2-way select, meaning constructs like this:

    x = c ? a : b;
    x := (c | a | b)

    Where one of two branches is evaluated. I extended the latter to N-way
    select:

    x := (n | a, b, c, ... | z)

    Where again one of these elements is evaluated, selected by n (here
    having the values of 1, 2, 3, ... compared with true, false above, but
    there need to be at least 2 elements inside |...| to distinguish them).

    I applied it also to other statements that can be provide values, such
    as if-elsif chains and switch, but there the selection might be
    different (eg. a series of tests are done sequentially until a true one).

    I don't know how it got turned into 'multi-way'.

    Notice that each starts with an assignment (or the value is used in
    other ways like passing to a function), so provision has to be made for
    some value always to be returned.

    Such N-way selections can be emulated, for example:

    if (c)
    x = a;
    else
    x = b;

    But because the assignment has been brought inside (a dedicated one for
    each branch), the issue of a default path doesn't arise. You can leave
    out the 'else' for example; x is just left unchanged.

    This doesn't work however when the result is passed to a function:

    f(if (c) a);

    what is passed when c is false?



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 6 10:15:35 2024
    On 05/11/2024 19:53, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 05/11/2024 12:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means >>>> branching, even if notionally, on one-of-N possible code paths.

    OK.

    The whole construct may or may not return a value. If it does, then one >>>> of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    What's easier to implement in a language: to have a conditional need for
    an 'else' branch, which is dependent on the compiler performing some
    arbitrarily complex levels of analysis on some arbitrarily complex set
    of expressions...

    ...or to just always require 'else', with a dummy value if necessary?

    Well, frequently it is easier to do bad job, than a good one.

    I assume that you consider the simple solution the 'bad' one?

    I'd would consider a much elaborate one putting the onus on external
    tools, and still having an unpredictable result to be the poor of the two.

    You want to create a language that is easily compilable, no matter how
    complex the input.

    With the simple solution, the worst that can happen is that you have to
    write a dummy 'else' branch, perhaps with a dummy zero value.

    If control never reaches that point, it will never be executed (at
    worse, it may need to skip an instruction).

    But if the compiler is clever enough (optionally clever, it is not a requirement!), then it could eliminate that code.

    A bonus is that when debugging, you can comment out all or part of the previous lines, but the 'else' now catches those untested cases.

    normally you do not need very complex analysis:

    I don't want to do any analysis at all! I just want a mechanical
    translation as effortlessly as possible.

    I don't like unbalanced code within a function because it's wrong and
    can cause problems.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Wed Nov 6 18:26:25 2024
    On 2024-11-05, Bart <bc@freeuk.com> wrote:
    On 05/11/2024 20:33, David Brown wrote:
    On 05/11/2024 20:39, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it >>>>>> means
    branching, even if notionally, on one-of-N possible code paths.

    OK.

    I appreciate this is what Bart means by that phrase, but I don't agree >>>> with it.ÿ I'm not sure if that is covered by "OK" or not!

    You may prefer your own definition, but Bart's is resonable one.

    The only argument I can make here is that I have not seen "multi-way
    select" as a defined phrase with a particular established meaning.

    Well, it started off as 2-way select, meaning constructs like this:

    x = c ? a : b;
    x := (c | a | b)

    Where one of two branches is evaluated. I extended the latter to N-way select:

    x := (n | a, b, c, ... | z)

    This looks quite error-prone. You have to count carefully that
    the cases match the intended values. If an entry is
    inserted, all the remaining ones shift to a higher value.

    You've basically taken a case construct and auto-generated
    the labels starting from 1.

    If that was someone's Lisp macro, I would prefer they confine
    it to their own program. :)

    (defmacro nsel (expr . clauses)
    ^(caseql ,expr ,*[mapcar list 1 clauses]))
    nsel
    (nsel 1 (prinl "one") (prinl "two") (prinl "three"))
    "one"
    "one"
    (nsel (+ 1 1) (prinl "one") (prinl "two") (prinl "three"))
    "two"
    "two"
    (nsel (+ 1 3) (prinl "one") (prinl "two") (prinl "three"))
    nil
    (nsel (+ 1 2) (prinl "one") (prinl "two") (prinl "three"))
    "three"
    "three"
    nil
    (macroexpand-1 '(nsel x a b c d))
    (caseql x (1 a)
    (2 b) (3 c)
    (4 d))

    Yawn ...

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Wed Nov 6 18:38:47 2024
    On 06/11/2024 00:01, Bart wrote:
    On 05/11/2024 20:33, David Brown wrote:
    On 05/11/2024 20:39, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it >>>>>> means
    branching, even if notionally, on one-of-N possible code paths.

    OK.

    I appreciate this is what Bart means by that phrase, but I don't agree >>>> with it.ÿ I'm not sure if that is covered by "OK" or not!

    You may prefer your own definition, but Bart's is resonable one.

    The only argument I can make here is that I have not seen "multi-way
    select" as a defined phrase with a particular established meaning.

    Well, it started off as 2-way select, meaning constructs like this:

    ÿÿ x = c ? a : b;
    ÿÿ x := (c | a | b)

    Where one of two branches is evaluated. I extended the latter to N-way select:

    ÿÿ x := (n | a, b, c, ... | z)


    I appreciate that this is what you have in your language as a "multi-way select". I can see it being a potentially useful construct (though
    personally I don't like the syntax at all).

    The only thing I have disagreed with is your assertions that what you
    have there is somehow the only "true" or "correct" concept of a
    "multi-way selection".



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 6 21:01:16 2024
    On 06/11/2024 07:26, Kaz Kylheku wrote:
    On 2024-11-05, Bart <bc@freeuk.com> wrote:

    Well, it started off as 2-way select, meaning constructs like this:

    x = c ? a : b;
    x := (c | a | b)

    Where one of two branches is evaluated. I extended the latter to N-way
    select:

    x := (n | a, b, c, ... | z)

    This looks quite error-prone. You have to count carefully that
    the cases match the intended values. If an entry is
    inserted, all the remaining ones shift to a higher value.

    You've basically taken a case construct and auto-generated
    the labels starting from 1.

    It's a version of Algol68's case construct:

    x := CASE n IN a, b, c OUT z ESAC

    which also has the same compact form I use. I only use the compact
    version because n is usually small, and it is intended to be used within
    an expression: print (n | "One", "Two", "Three" | "Other").

    This an actual example (from my first scripting language; not written by
    me):

    Crd[i].z := (BendAssen |P.x, P.y, P.z)

    An out-of-bounds index yields 'void' (via a '| void' part inserted by
    the compiler). This is one of my examples from that era:

    xt := (messa | 1,1,1, 2,2,2, 3,3,3)
    yt := (messa | 3,2,1, 3,2,1, 3,2,1)

    Algol68 didn't have 'switch', but I do, as well as a separate
    case...esac statement that is more general. Those are better for
    multi-line constructs.

    As for being error prone because values can get out of step, so is a
    function call like this:

    f(a, b, c, d, e)

    But I also have keyword arguments.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Thu Nov 7 01:40:52 2024
    On 04/11/2024 22:25, David Brown wrote:
    On 04/11/2024 20:50, Bart wrote:
    On 04/11/2024 16:35, David Brown wrote:
    On 03/11/2024 21:00, Bart wrote:


    Here is a summary of C vs my language.


    <snip the irrelevant stuff>


    I am very keen on keeping the concepts distinct in cases where it
    matters.

    I know, you like to mix things up. I like clear lines:

    ÿÿ func F:int ...ÿÿÿÿÿÿÿÿÿÿÿÿÿ Always returns a value
    ÿÿ proc Pÿ ...ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Never returns a value



    Oh, you /know/ that, do you?ÿ And how do you "know" that?ÿ Is that
    because you still think I am personally responsible for the C language,
    and that I think C is the be-all and end-all of perfect languages?

    I agree that it can make sense to divide different types of "function".
    I disagree that whether or not a value is returned has any significant relevance.ÿ I see no difference, other than minor syntactic issues,
    between "int foo(...)" and "void foo(int * result, ...)".

    I don't use functional concepts; my functions may or may not be pure.

    But the difference between value-returning and non-value returning
    functions to me is significant:

    Func Proc
    return x; Y N
    return; N Y
    hit final } N Y
    Pure ? Unlikely
    Side-effects ? Likely
    Call within expr Y N
    Call standalone ? Y

    Having a clear distinction helps me focus more precisely on how a
    routine has to work.

    In C, the syntax is dreadful: not only can you barely distinguish a
    function from a procedure (even without attributes, user types and
    macros add in), but you can hardly tell them apart from variable
    declarations.

    In fact, function declarations can even be declared in the middle of a
    set of variable declarations.

    You can learn a lot about the underlying structure of of a language by implementing it. So when I generate IL from C for example, I found the
    need to have separate instructions to call functions and procedures, and separate return instructions too.

    If you have a function (or construct) that returns a correct value for inputs 1, 2 and 3, and you never pass it the value 4 (or anything else), then there is no undefined behaviour no matter what the code looks like
    for values other than 1, 2 and 3.ÿ If someone calls that function with
    input 4, then /their/ code has the error - not the code that doesn't
    handle an input 4.

    No. The function they are calling is badly formed. There should never be
    any circumstance where a value-returning function terminates (hopefully
    by running into RET) without an explicit set return value.


    I agree that this a terrible idea. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>

    But picking one terrible idea in C does not mean /everything/ in C is a terrible idea!ÿ /That/ is what you got wrong, as you do so often.

    What the language does is generally fine. /How/ it does is generally
    terrible. (Type syntax; no 'fun' keyword; = vs ==; operator precedence;
    format codes; 'break' in switch; export by default; struct T vs typedef
    T; dangling 'else'; optional braces; ... there's reams of this stuff!)

    So actually, I'm not wrong. There have been discussions about all of
    these and a lot more.

    Can you tell me which other current languages, other than C++ and
    assembly, allow such nonsense?

    Python.

    Of course, it is equally meaningless in Python as it is in C.

    Python at least can trap the errors. Once you fix the unlimited
    recursion, it will detect the wrong number of arguments. In C, before
    C23 anyway, any number and types of arguments is legal in that example.


    I defend it if that is appropriate.ÿ Mostly, I /explain/ it to you.ÿ It
    is bizarre that people need to do that for someone who claims to have written a C compiler, but there it is.

    It is bizarre that the ins and outs of C, a supposedly simple language,
    are so hard to understand. Like the rules for how many {} you can leave
    out for a initialising a nested data structure. Or how many extra ones
    you can have; this is OK:

    int a = {0};

    but not {{0}} (tcc accepts it though, so which set of rules is it using?).

    Or whether it is a static followed by a non-static declaration that is
    OK, or whether it's the other way around.

    I'm glad you didn't - it would be a waste of effort.

    I guessed that. You seemingly don't care that C is a messy language with
    many quirks; you just work around it by using a subset, with some help
    from your compiler in enforcing that subset.

    So you're using a strict dialect. The trouble is that everyone else
    using C will either be using their own dialect incompatible with yours,
    or are stuck using the messy language and laid-back compilers operating
    in lax mode by default.

    I'm interested in fixing things at source - within a language.

    You /do/ understand that I use top-quality tools with carefully chosen warnings, set to throw fatal errors, precisely because I want a language that has a lot more "lines" and restrictions that your little tools?
    /Every/ C programmer uses a restricted subset of C - some more
    restricted than others.ÿ I choose to use a very strict subset of C for
    my work, because it is the best language for the tasks I need to do.ÿ (I also use a very strict subset of C++ when it is a better choice.)

    I'd guess only 1% of your work with C involves the actual language, and
    99% using additional tooling.

    With me it's mostly about the language.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Thu Nov 7 01:50:21 2024
    On 05/11/2024 23:48, Bart wrote:
    On 05/11/2024 13:29, David Brown wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:


    Supposing I declare this function:

    // Return the integer square root of numbers between 0 and 10
    int small_int_sqrt(int x);


    To me, the range of "all input values" is integers from 0 to 10.ÿ I
    could implement it as :

    int small_int_sqrt(int x) {
    ÿÿÿÿÿif (x == 0) return 0;
    ÿÿÿÿÿif (x < 4) return 1;
    ÿÿÿÿÿif (x < 9) return 2;
    ÿÿÿÿÿif (x < 16) return 3;
    ÿÿÿÿÿunreachable();
    }


    If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
    /their/ fault and /their/ problem.ÿ I said nothing about what would
    happen in those cases.

    But some people seem to feel that "all input values" means every
    possible value of the input types, and thus that a function like this
    should return a value even when there is no correct value in and no
    correct value out.

    Your example is an improvement on your previous ones. At least it
    attempts to deal with out-of-range conditions!

    No, it does not. The fact that some invalid inputs also give
    deterministic results is a coincidence of the implementation, not an indication that the function is specified for those additional inputs or
    that it does any checking. I intentionally structured the example this
    way to show this - sometimes undefined behaviour gives you results you
    might like, but it is still undefined behaviour. This function has no
    defined behaviour for inputs outside the range 0 to 10, because I gave
    no definition of its behaviour - the effect of particular
    implementations of the function is irrelevant to that.

    As I suspected it might, this apparently confused you.


    However there is still the question of providing that return type. If 'unreachable' is not a special language feature, then this can fail
    either if the language requires the 'return' keyword, or 'unreachable' doesn't yield a compatible type (even if it never returns because it's
    an error handler).

    "unreachable()" is a C23 standardisation of a feature found in most
    high-end compilers. For gcc and clang, there is
    __builtin_unreachable(), and MSVC has its version. The functions are
    handled by the compilers as "undefined behaviour". (I mean that quite literally - gcc and clang turn it into an "UB" instruction in their
    internal representations.)

    Clearly, "unreachable()" has no return type - it does not in any sense "return". And since the compiler knows it will never be "executed", it
    knows control will never fall off the end of that function. You don't
    need a type for something that can never happen (it's like if I say
    "this is a length of 0" and you ask "was that 0 metres, or 0 inches?" -
    the question is meaningless).



    Getting that right will satisfy both the language (if it cared more
    about such matters than C apparently does), and the casual reader
    curious about how the function contract is met (that is, supplying that promised return int type if or when it returns).

    C gets it right here. There is no need for a return type when there is
    no return - indeed, trying to force some sort of type or "default" value
    would be counterproductive. It would be confusing to the reader, add untestable and unexecutable source code, make code flow more
    complicated, break invariants, cripple correctness analysis of the rest
    of the code, and make the generated object code inefficient.

    Remember how the function is specified. All you have to do is use it correctly - go outside the specifications, and I make no promises or guarantees about what will happen. If you step outside /your/ side of
    the bargain by giving it an input outside 0 to 10, then I give you no
    more guarantees that it will return an int of any sort than I give you a guarantee that it would be a great sales gimmick if printed on a t-shirt.

    But what I /can/ give you is something that can be very useful in being
    sure the rest of your code is correct, and which is impossible for a
    function with "default" values or other such irrelevant parts. I can guarantee you that:

    int y = small_int_sqrt(x);

    assert(y * y <= x);
    assert ((y + 1) * (y + 1) > x);


    That is to say - I can guarantee that the function works and gives you
    the correct results.

    But supposing I had replaced the "unreachable();" with a return of a
    default value - let's say 42, since that's the right answer even if you
    don't know the question. What does the user of small_int_sqrt() know now?

    Now you know that "y" is an int. You have no idea if it is a correct or useful result, unless you have first checked that x is in the specified
    range of 0 to 10.

    If you /have/ checked (in some way) that x is valid, then why would you
    bother calling the function when x is invalid? And thus why would you
    care what the function does or does not do when x is invalid?

    And if you haven't checked that x is valid, why would you bother calling
    the function if you have no idea whether or not it results in something
    useful and correct?


    So we have now established that returning a default int value is worse
    than useless - there are no circumstances in which it can be helpful,
    and it ruins the guarantees you want in order to be sure that the
    calling code is correct.


    Let's now look at another alternative - have the function check for
    validity, and return some kind of error signal if the input is invalid.
    There are two ways to do this - we can have a value of the main return
    type acting as an error signal, or we can have an additional return value.

    If we pick the first one - say, return -1 on error - then we have a
    compact solution that is easy to check for the calling function. But
    now we have a check for validity of the input whether we need it or not
    (since the callee function does the checking, even if the caller
    function knows the values are valid), and the caller function has to add
    a check a check for error return values. The return may still be an
    "int", but it is no longer representative of an integer value - it
    multiplexes two different concepts. We have lost the critical
    correctness equations that we previously had. And it won't work at all
    if there is no choice of an error indicator.

    If we pick the second one, we need to return two values. The checking
    is all the same, but at least the concepts of validity and value are separated. Now we have either a struct return with its added efficiency costs, or a monstrosity from the dark ages where the function must take
    a pointer parameter for where to store the results. (And how is the
    function going to check the validity of that pointer? Or is it somehow
    okay to skip that check while insisting that a check of the other input
    is vital?) This has most of the disadvantages of the first choice, plus
    extra efficiency costs.


    All in all, we have a significant costs in various aspects, with no real benefit, all in the name of a mistaken belief that we are avoiding
    undefined behaviour.



    // Take a pointer to an array of two ints, add them, and return the sum
    int sum_two_ints(const int * p) {
    ÿÿÿÿÿreturn p[0] + p[1];
    }

    Perhaps, in a mistaken belief that it makes the code "safe", they will
    add :

    ÿÿÿÿÿif (!p) return 0;

    at the start of the function.ÿ But they will not check that "p"
    actually points to an array of two ints (how could they?), nor will
    they check for integer overflow (and what would they do if it happened?).

    This is a different category of error.


    No, it is not. It is just another case of a function having
    preconditions on the input, and whether or not the called function
    should check those preconditions. You can say you think it is vital for functions to do these checks itself, or you can accept that it is the responsibility of the calling code to provide valid inputs. But you
    don't get to say it is vital to check /some/ types of inputs, but other
    types are fine to take on trust.

    Here's a related example of what I'd class as a language error:

    ÿÿ int a;
    ÿÿ a = (exit(0), &a);

    A type mismatch error is usually reported. However, the assignment is
    never done because it never returns from that exit() call.

    I expect you wouldn't think much of a compiler that didn't report such
    an error because that code is never executed.

    I would expect the compiler to know that "exit()" can't return, so the
    value of "a" is never used and it can be optimised away. But I do also
    expect that the compiler will enforce the rules of the language - syntax
    and grammar rules, along with constraints and anything else it is able
    to check. And even if I said it was reasonable for a language to say
    this "assignment" is not an error since it can't be executed, I think
    trying to put that level of detail into a language definition (and corresponding compilers) would quickly be a major complexity for no
    real-world gain.


    But to me that is little different from running into the end of function without the proper provisions for a valid return value.


    Yes, I think so too.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Thu Nov 7 02:47:50 2024
    On 06/11/2024 15:40, Bart wrote:
    On 04/11/2024 22:25, David Brown wrote:
    On 04/11/2024 20:50, Bart wrote:
    On 04/11/2024 16:35, David Brown wrote:
    On 03/11/2024 21:00, Bart wrote:


    Here is a summary of C vs my language.


    <snip the irrelevant stuff>


    I am very keen on keeping the concepts distinct in cases where it
    matters.

    I know, you like to mix things up. I like clear lines:

    ÿÿ func F:int ...ÿÿÿÿÿÿÿÿÿÿÿÿÿ Always returns a value
    ÿÿ proc Pÿ ...ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Never returns a value



    Oh, you /know/ that, do you?ÿ And how do you "know" that?ÿ Is that
    because you still think I am personally responsible for the C
    language, and that I think C is the be-all and end-all of perfect
    languages?

    I agree that it can make sense to divide different types of
    "function". I disagree that whether or not a value is returned has any
    significant relevance.ÿ I see no difference, other than minor
    syntactic issues, between "int foo(...)" and "void foo(int * result,
    ...)".

    I don't use functional concepts; my functions may or may not be pure.


    OK. You are not alone in that. (Standard C didn't support a difference
    there until C23.)

    But the difference between value-returning and non-value returning
    functions to me is significant:

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Funcÿ Proc
    return x;ÿÿÿÿÿÿÿÿ Yÿÿÿÿ N
    return;ÿÿÿÿÿÿÿÿÿÿ Nÿÿÿÿ Y
    hit final }ÿÿÿÿÿÿ Nÿÿÿÿ Y
    Pureÿÿÿÿÿÿÿÿÿÿÿÿÿ ?ÿÿÿÿ Unlikely
    Side-effectsÿÿÿÿÿ ?ÿÿÿÿ Likely
    Call within exprÿ Yÿÿÿÿ N
    Call standaloneÿÿ ?ÿÿÿÿ Y


    There are irrelevant differences in syntax, which could easily disappear entirely if a language supported a default initialisation value when a
    return gives no explicit value. (i.e., "T foo() { return; }; T x =
    foo();" could be treated in the same way as "T x;" in a static
    initialisation context.) /Your/ language does not support that, but
    other languages could.

    Then you list some things that may or may not happen, which are of
    course totally irrelevant. If you list the differences between bikes
    and cars, you don't include "some cars are red" and "bikes are unlikely
    to be blue".


    Having a clear distinction helps me focus more precisely on how a
    routine has to work.

    It's a pointless distinction. Any function or procedure can be morphed
    into the other form without any difference in the semantic meaning of
    the code, requiring just a bit of re-arrangement at the caller site:

    int foo(int x) { int y = ...; return y; }

    void foo(int * res, int x) { int y = ...; *res = y; }


    void foo(int x) { ... ; return; }

    int foo(int x) { ... ; return 0; }


    There is no relevance in the division here, which is why most languages
    don't make a distinction unless they do so simply for syntactic reasons.



    In C, the syntax is dreadful: not only can you barely distinguish a
    function from a procedure (even without attributes, user types and
    macros add in), but you can hardly tell them apart from variable declarations.

    As always, you are trying to make your limited ideas of programming
    languages appear to be correct, universal, obvious or "natural" by
    saying things that you think are flaws in C. That's not how a
    discussion works, and it is not a way to convince anyone of anything.
    The fact that C does not have a keyword used in the declaration or
    definition of a function does not in any way mean that there is the
    slightest point in your artificial split between "func" and "proc"
    functions.


    (It doesn't matter that I too prefer a clear keyword for defining
    functions in a language.)


    In fact, function declarations can even be declared in the middle of a
    set of variable declarations.

    You can learn a lot about the underlying structure of of a language by implementing it. So when I generate IL from C for example, I found the
    need to have separate instructions to call functions and procedures, and separate return instructions too.


    That is solely from your choice of an IL.

    If you have a function (or construct) that returns a correct value for
    inputs 1, 2 and 3, and you never pass it the value 4 (or anything
    else), then there is no undefined behaviour no matter what the code
    looks like for values other than 1, 2 and 3.ÿ If someone calls that
    function with input 4, then /their/ code has the error - not the code
    that doesn't handle an input 4.

    No. The function they are calling is badly formed. There should never be
    any circumstance where a value-returning function terminates (hopefully
    by running into RET) without an explicit set return value.


    There are no circumstances where you can use the function correctly and
    it does not return the correct answer. If you want to consider when
    people to use a function /incorrectly/, then there are no limits to how
    wrong they can be.


    I agree that this a terrible idea.
    <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>

    But picking one terrible idea in C does not mean /everything/ in C is
    a terrible idea!ÿ /That/ is what you got wrong, as you do so often.

    What the language does is generally fine. /How/ it does is generally terrible. (Type syntax; no 'fun' keyword; = vs ==; operator precedence; format codes; 'break' in switch; export by default; struct T vs typedef
    T; dangling 'else'; optional braces; ... there's reams of this stuff!)


    Making the same mistake again does not help your argument.

    So actually, I'm not wrong. There have been discussions about all of
    these and a lot more.


    Of course you are wrong!

    You have failed to grasp the key concept of programming - it is based on contracts and agreements. Tasks are broken down into subtasks, and for
    each subtask there is a requirement for what gets put into the subtask
    and a requirement for what comes out of it. The calling task is
    responsible for fulfilling the input requirements, the callee subtask is responsible for fulfilling the output requirements. The caller does not
    need to check that the outputs are correct, and the callee does not need
    to check that the input tasks are correct. That is the division of responsibilities - and doing anything else is, at best, wasted duplicate effort.

    You are right that C has its flaws - every language does. I agree with
    you in many cases where you think C has poor design choices.

    But can you not understand that repeating things that you dislike about
    C - things we have all heard countless times - does not excuse your
    tunnel vision about programming concepts or change your misunderstandings?


    Can you tell me which other current languages, other than C++ and
    assembly, allow such nonsense?

    Python.

    Of course, it is equally meaningless in Python as it is in C.

    Python at least can trap the errors. Once you fix the unlimited
    recursion, it will detect the wrong number of arguments. In C, before
    C23 anyway, any number and types of arguments is legal in that example.


    It is syntactically legal, but semantically undefined behaviour (look it
    up in the C standards). That means it is wrong, but the language
    standards don't insist that compilers diagnose it as an error.


    I defend it if that is appropriate.ÿ Mostly, I /explain/ it to you.
    It is bizarre that people need to do that for someone who claims to
    have written a C compiler, but there it is.

    It is bizarre that the ins and outs of C, a supposedly simple language,
    are so hard to understand.

    Have you ever played Go ? It is a game with very simple rules, and extraordinarily complicated gameplay.

    Compared to most general purpose languages, C /is/ small and simple.
    But that is a relative rating, not an absolute rating.


    I'm glad you didn't - it would be a waste of effort.

    I guessed that. You seemingly don't care that C is a messy language with many quirks; you just work around it by using a subset, with some help
    from your compiler in enforcing that subset.

    Yes.

    If there was an alternative language that I thought would be better for
    the tasks I have, I'd use that. (Actually, a subset of C++ is often
    better, so I use that when I can.)

    What do you think I should do instead? Whine in newsgroups to people
    that don't write language standards (for C or anything else) and don't
    make compilers? Make my own personal language that is useless to
    everyone else and holds my customers to ransom by being the only person
    that can work with their code? Perhaps that is fine for the type of
    customers you have, but not for my customers.

    I /do/ understand that C has its flaws (from /my/ viewpoint, for /my/
    needs). So I work around those.


    So you're using a strict dialect. The trouble is that everyone else
    using C will either be using their own dialect incompatible with yours,
    or are stuck using the messy language and laid-back compilers operating
    in lax mode by default.

    I'm interested in fixing things at source - within a language.

    You haven't fixed a thing.

    (I'm not claiming /I/ have fixed anything either.)


    You /do/ understand that I use top-quality tools with carefully chosen
    warnings, set to throw fatal errors, precisely because I want a
    language that has a lot more "lines" and restrictions that your little
    tools? /Every/ C programmer uses a restricted subset of C - some more
    restricted than others.ÿ I choose to use a very strict subset of C for
    my work, because it is the best language for the tasks I need to do.
    (I also use a very strict subset of C++ when it is a better choice.)

    I'd guess only 1% of your work with C involves the actual language, and
    99% using additional tooling.


    What a weird thing to guess.

    With me it's mostly about the language.


    An even weirder thing to say from someone who made his own tools.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Thu Nov 7 06:38:09 2024
    On 06/11/2024 15:47, David Brown wrote:
    On 06/11/2024 15:40, Bart wrote:

    There are irrelevant differences in syntax, which could easily disappear entirely if a language supported a default initialisation value when a return gives no explicit value.ÿ (i.e., "T foo() { return; }; T x =
    foo();" could be treated in the same way as "T x;" in a static initialisation context.)

    You wrote:

    T foo () {return;} # definition?

    T x = foo(); # call?

    I'm not quite sure what you're saying here. That a missing return value
    in non-void function would default to all-zeros?

    Maybe. A rather pointless feature just to avoid writing '0', and which
    now introduces a new opportunity for a silent error (accidentally
    forgetting a return value).

    It's not quite the same as a static initialisiation, which is zeroed
    when a program starts.


    Then you list some things that may or may not happen, which are of
    course totally irrelevant.ÿ If you list the differences between bikes
    and cars, you don't include "some cars are red" and "bikes are unlikely
    to be blue".

    Yes; if you're using a vehicle, or planning a journey or any related
    thing, it helps to remember if it's a bike or a car! At least here you acknowledge the difference.

    But I guess you find those likely/unlikely macros of gcc pointless too.
    If I know something is a procedure, then I also know it is likely to
    change global state, that I might need to deal with a return value, and
    a bunch of other stuff.

    Boldly separating the two with either FUNC or PROC denotations I find
    helps tremendously. YM-obviously-V, but you can't have a go at me for my
    view.

    If I really found it a waste of time, the distinction would have been
    dropped decades ago.

    It's a pointless distinction.ÿ Any function or procedure can be morphed
    into the other form without any difference in the semantic meaning of
    the code, requiring just a bit of re-arrangement at the caller site:

    ÿÿÿÿint foo(int x) { int y = ...; return y; }

    ÿÿÿÿvoid foo(int * res, int x) { int y = ...; *res = y; }


    ÿÿÿÿvoid foo(int x) { ... ; return; }

    ÿÿÿÿint foo(int x) { ... ; return 0; }


    There is no relevance in the division here, which is why most languages don't make a distinction unless they do so simply for syntactic reasons.

    As I said, you like to mix things up. You disagreed. I'm not surprised.

    Here you've demonstrated how a function that returns results by value
    can be turned into a procedure that returns a result by reference.

    So now, by-value and by-reference are the same thing?

    I listed seven practical points of difference between functions and procedures, and above is an eighth point, but you just dismissing them.
    Is there any point in this?

    I do like taking what some think as a single feature and having
    dedicated versions, because I find it helpful.

    That includes functions, loops, control flow and selections.


    In C, the syntax is dreadful: not only can you barely distinguish a
    function from a procedure (even without attributes, user types and
    macros add in), but you can hardly tell them apart from variable
    declarations.

    As always, you are trying to make your limited ideas of programming languages appear to be correct, universal, obvious or "natural" by
    saying things that you think are flaws in C.ÿ That's not how a
    discussion works, and it is not a way to convince anyone of anything.
    The fact that C does not have a keyword used in the declaration or definition of a function does not in any way mean that there is the slightest point in your artificial split between "func" and "proc" functions.


    void F();
    void (*G);
    void *H();
    void (*I)();

    OK, 4 things declared here. Are they procedures, functions, variables,
    or pointers to functions? (I avoided using a typedef in place of 'void'
    to make things easier.)

    I /think/ they are as follows: procedure, pointer variable, function (returning void*), and pointer to a procedure. But I had to work at it,
    even though the examples are very simple.

    I don't know about you, but I prefer syntax like this:

    proc F
    ref void G
    ref proc H
    func I -> ref void

    Now come on, scream at again for prefering a nice syntax for
    programming, one which just tells me at a glance what it means without
    having to work it out.



    (It doesn't matter that I too prefer a clear keyword for defining
    functions in a language.)

    Why? Don't your smart tools tell you all that anyway?


    That is solely from your choice of an IL.

    The IL design also falls into place from the natural way these things
    have to work.

    Of course you are wrong!

    You keep saying that. But then you also keep saying, from time to time,
    that you agree that something in C was a bad idea. So I'm still wrong
    when calling out the same thing?



    If there was an alternative language that I thought would be better for
    the tasks I have, I'd use that.ÿ (Actually, a subset of C++ is often
    better, so I use that when I can.)

    What do you think I should do instead?ÿ Whine in newsgroups to people
    that don't write language standards (for C or anything else) and don't
    make compilers?

    What makes you think I'm whining? The thread opened up a discussion
    about multi-way selections, and it got into how it could be done with
    features from other languages.

    I gave some examples from mine, as I'm very familiar with that, and it
    uses simple features that are easy to grasp and appreciate. You could
    have done the same from ones you know.

    But you just hate the idea that I have my own language to draw on, whose syntax is very sweet ('serious' languages hate such syntax for some
    reason, and is usually relegated to scripting languages.)

    I guess then you just have to belittle and insult me, my languages and
    my views at every opporunity.

    Make my own personal language that is useless to
    everyone else and holds my customers to ransom by being the only person
    that can work with their code?

    Plenty of companies use DSLs. But isn't that sort of what you do? That
    is, using 'C' with a particular interpretation or enforcement of the
    rules, which needs to go in hand with a particular compiler, version,
    sets of options and assorted makefiles.

    I for one would never be able to build one of your programs. It might as
    well be written in your inhouse language with proprietory tools.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Thu Nov 7 23:23:04 2024
    On 06/11/2024 14:50, David Brown wrote:
    On 05/11/2024 23:48, Bart wrote:
    On 05/11/2024 13:29, David Brown wrote:

    int small_int_sqrt(int x) {
    ÿÿÿÿÿif (x == 0) return 0;
    ÿÿÿÿÿif (x < 4) return 1;
    ÿÿÿÿÿif (x < 9) return 2;
    ÿÿÿÿÿif (x < 16) return 3;
    ÿÿÿÿÿunreachable();
    }

    "unreachable()" is a C23 standardisation of a feature found in most
    high-end compilers.ÿ For gcc and clang, there is
    __builtin_unreachable(), and MSVC has its version.

    So it's a kludge. Cool, I can create one of those too:

    func smallsqrt(int x)int =
    if
    elsif x=0 then 0
    elsif x<4 then 1
    elsif x<9 then 2
    elsif x<16 then 3
    dummyelse int.min
    fi
    end

    'dummyelse' is a special version of 'else' that tells the compiler that control will (should) never arrive there. ATM it does nothing but inform
    the reader of that and to remind the author. But later stages of the
    compiler can choose not to generate code for it, or to generate error-reporting code.

    (A couple of things about this: the first 'if' condition and branch can
    be omitted; it starts at elsif. This removes the special-casing for the
    first of an if-elsif-chain, so to allow easier maintenance, and better alignment.

    Second is that, unlike your C, the whole if-fi construct is a single expression term that yields the function return value. Hence the need
    for all branches to be present and balanced regarding their common type.

    This could have been handled internally (compiler adds 'dummyelse <empty
    value for type>'), but I think it's better that it is explicit (user
    might forget to add that branch).

    That int.main is something I sometimes use for in-band signalling. Here
    that is the value -9223372036854775808 so it's quite a wide band!
    Actually it is out-of-band it the user expects only result with an i32
    range.

    BTW your example lets through negative values; I haven't fixed that.)

    Getting that right will satisfy both the language (if it cared more
    about such matters than C apparently does), and the casual reader
    curious about how the function contract is met (that is, supplying
    that promised return int type if or when it returns).

    C gets it right here.ÿ There is no need for a return type when there is
    no return

    There is no return for only half the function! A function with a return
    type is a function that CAN return. If it can't ever return, then make
    it a procedure.

    Take this function where N can never be zero; is this the right way to
    write it in C:

    int F(int N) {
    if (N==0) unreachable();
    return abc/N; // abc is a global with value 100
    }

    If doesn't look right. If I compile it with gcc (using
    __builtin_unreachable), and call F(0), then it crashes. So it doesn't do
    much does it?!

    indeed, trying to force some sort of type or "default" value
    would be counterproductive.ÿ It would be confusing to the reader, > add untestable and unexecutable source code,

    But it IS confusing, since it quite clearly IS reachable. There's a
    difference between covering all possible values of N, so that is
    genuinely is unreachable, and having code that COULD be reachable.

    Let's now look at another alternative - have the function check for validity, and return some kind of error signal if the input is invalid. There are two ways to do this - we can have a value of the main return
    type acting as an error signal, or we can have an additional return value.
    ....
    All in all, we have a significant costs in various aspects, with no real benefit, all in the name of a mistaken belief that we are avoiding
    undefined behaviour.

    This is all a large and complex subject. But it's not really the point
    of the discussion.

    I'm not talking about what happens when running a program, but what
    happens at compilation, and satisfying the needs of the language.

    C here is less strict in being happy to have parts of a function body as
    no-go areas where various requirements can be ignored, like a function
    with a designed return type T, being allowed to return without
    satisfying that need.

    Here, you demostrated bolted-on hacks that are not part of the language,
    like the snappy __builtin_unreachable (the () are apparently optional).
    I can't see however that it does much.

    It is a fact C as a language allows this:

    T F() {} // T is not void

    (I've had to qualify T - point number 9 in procedures vs. function.)

    All that C says is that control flow running into that closing },
    without encountering a 'return x', is UB.

    IMV, sloppy. My language simply doesn't allow it.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 8 02:08:34 2024
    On 07/11/2024 12:23, Bart wrote:
    On 06/11/2024 14:50, David Brown wrote:

    C gets it right here.ÿ There is no need for a return type when there
    is no return

    There is no return for only half the function! A function with a return
    type is a function that CAN return. If it can't ever return, then make
    it a procedure.

    Take this function where N can never be zero; is this the right way to
    write it in C:

    ÿÿ int F(int N) {
    ÿÿÿÿÿÿ if (N==0) unreachable();
    ÿÿÿÿÿÿ return abc/N;ÿÿÿÿÿÿÿÿÿÿÿÿÿ // abc is a global with value 100
    ÿÿ }

    If doesn't look right. If I compile it with gcc (using __builtin_unreachable), and call F(0), then it crashes. So it doesn't do much does it?!

    It looks like it needs 'else' here. If I put that in, then F(0) returns
    either 0 or 1, so it returns garbage, whether or not 'unreachable' is
    used in the branch.

    So I'm struggling to see the point of it. Is it just to quieten the
    'reaches end of non-void function' warning when used before the final '}'?

    In any case, 'unreachable' is a misnomer. 'shouldnt_be_reachable' is
    more accurate.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Fri Nov 8 03:23:54 2024
    On 06/11/2024 20:38, Bart wrote:
    On 06/11/2024 15:47, David Brown wrote:
    On 06/11/2024 15:40, Bart wrote:

    There are irrelevant differences in syntax, which could easily
    disappear entirely if a language supported a default initialisation
    value when a return gives no explicit value.ÿ (i.e., "T foo() {
    return; }; T x = foo();" could be treated in the same way as "T x;" in
    a static initialisation context.)

    You wrote:

    ÿ T foo () {return;}ÿÿÿÿÿÿÿ # definition?

    ÿ T x = foo();ÿÿÿÿÿÿÿÿÿÿÿÿÿ # call?

    I'm not quite sure what you're saying here. That a missing return value
    in non-void function would default to all-zeros?


    It would not necessarily mean all zeros, but yes, that's the idea. You
    could easily say that returning from a non-void function without an
    explicit value, or falling off the end of it, returned the default value
    for the type in the same sense as you can have a default initialisation
    of non-stack objects in a language. (In C, this pretty much always
    means all zeros - in a more advanced language with object support, it
    would typically mean default construction.)

    Equally, you could say that in a void function, "return x;" simply casts
    "x" to void - just like writing "x;" as a statement does.

    I'm not suggesting that either of these things are a particularly good
    idea - I am merely saying that with a minor syntactic change to the
    language (your language, C, or anything similar) most of the rest of the differences between your "proc" and your "func" disappear.

    All you are left with is that "func" can be used in an expression, and
    "proc" cannot. For me, that is not sufficient reason to distinguish
    them as concepts.

    Maybe. A rather pointless feature just to avoid writing '0', and which
    now introduces a new opportunity for a silent error (accidentally
    forgetting a return value).


    Sure. As I say, I don't think it is a particularly good idea - at
    least, not as an addition to C (or, presumably, your language).

    It's not quite the same as a static initialisiation, which is zeroed
    when a program starts.


    Of course. (Theoretically in C, pointers are initialised to null
    pointers which don't have to be all zeros. But I don't know of any implementation which has something different.) I was just using that to
    show how some languages - like C - have a default value available.


    Then you list some things that may or may not happen, which are of
    course totally irrelevant.ÿ If you list the differences between bikes
    and cars, you don't include "some cars are red" and "bikes are
    unlikely to be blue".

    Yes; if you're using a vehicle, or planning a journey or any related
    thing, it helps to remember if it's a bike or a car! At least here you acknowledge the difference.


    There's a difference between cars and bikes - not between procs and funcs.

    Remember, if you are going to make such a distinction between two
    concepts, it has to be absolute - "likely" or "unlikely" does not help.
    You can't distinguish between your procs and funcs by looking at the
    existence of side-effects, since a code block that has side-effects
    might return a value or might not. It's like looking at a vehicle and
    seeing that it is red - it won't tell you if it is a bike or a car.

    This is why I say distinguishing between "func" and "proc" by your
    criteria - the existence or absence of a return type - gives no useful information to the programmer or the compiler that can't be equally well
    given by writing a return type of "void".

    But I guess you find those likely/unlikely macros of gcc pointless too.

    How is that even remotely relevant to the discussion? (Not that gcc has macros by those names.)

    If I know something is a procedure, then I also know it is likely to
    change global state, that I might need to deal with a return value, and
    a bunch of other stuff.

    That's useless information - both to the programmer, and to the
    compiler. (I am never sure which viewpoint you are taking - it would be helpful if you were clear there.) If the compiler /knows/ global state
    cannot be changed, and the function only uses data from its input
    values, then it can do a lot with that information - shuffling around
    calls, removing duplicates, pre-calculating constant data at compile
    time, or whatever. Similarly, if the programmer /knows/ global state
    cannot be changed in a function, then that can make it easier to
    understand what is going on in the code, or what is going wrong in it.

    But if you only know that it is /likely/ to be one thing or the other,
    you know nothing of use.


    Boldly separating the two with either FUNC or PROC denotations I find
    helps tremendously. YM-obviously-V, but you can't have a go at me for my view.


    I can have a go at you for not thinking! I believe that if you think
    more carefully about this, you will understand how little your
    distinction helps anyone. You might find the distinction I made -
    between being allowed to interact with global state (a "procedure") and
    having no possibility of interacting with global state (a "function") -
    to be useful. In my distinction, there is no grey area of "likely" or "unlikely" - it is absolute, and therefore gives potentially useful information. Of course it is then up to you to decide if it is worth
    the effort or not.

    Let me tempt you with this - whatever syntax or terms you use here,
    you'll be able to brag that it is nicer than C23's "[[unsequenced]]"
    attribute for pure functions!

    If I really found it a waste of time, the distinction would have been dropped decades ago.


    Why? Once you've put it in the language, there is no motivation to drop
    it. Pascal has the same procedure / function distinction you do. Just because it adds little of use to language, does not mean that you'd want
    to drop it and make your tools incompatible between language versions.

    It's a pointless distinction.ÿ Any function or procedure can be
    morphed into the other form without any difference in the semantic
    meaning of the code, requiring just a bit of re-arrangement at the
    caller site:

    ÿÿÿÿÿint foo(int x) { int y = ...; return y; }

    ÿÿÿÿÿvoid foo(int * res, int x) { int y = ...; *res = y; }


    ÿÿÿÿÿvoid foo(int x) { ... ; return; }

    ÿÿÿÿÿint foo(int x) { ... ; return 0; }


    There is no relevance in the division here, which is why most
    languages don't make a distinction unless they do so simply for
    syntactic reasons.

    As I said, you like to mix things up. You disagreed. I'm not surprised.

    Here you've demonstrated how a function that returns results by value
    can be turned into a procedure that returns a result by reference.

    So now, by-value and by-reference are the same thing?

    Returning something from a function by returning a value, or by having
    the caller pass a pointer (or mutable reference, if you prefer that
    term) and having the function pass its results via that pointer are not
    really very different. Sure, there are details of the syntax and the
    ABI that will differ, but not the meaning of the code.

    Remember that this is precisely what C compilers do when returning a
    struct that is too big to fit neatly in a register or two - the caller
    makes space for the return struct on the stack and passes a pointer to
    it as a hidden parameter to the function. The function has no normal
    return value. And yet the struct return is syntactically and
    semantically identical whether it is returned in registers or via a
    hidden pointer.


    I listed seven practical points of difference between functions and procedures, and above is an eighth point, but you just dismissing them.
    Is there any point in this?


    Maybe not, if you can't understand /why/ I am dismissing them. The only difference you listed that is real and has potential consequences for
    people using the language is that functions returning a value can be
    used in expressions - all the rest is minor detail or wishy-washy "maybes".

    I do like taking what some think as a single feature and having
    dedicated versions, because I find it helpful.

    That includes functions, loops, control flow and selections.


    If it ultimately comes down to just the word you want to use, then I
    guess that's fair enough. It is the /reasoning/ you gave that I am
    arguing with.

    If your language has "do ... until" and "do ... while" loops, and you
    justify it by saying you simply find it nicer to write some tests as
    positives and some tests as negatives, then I think that is reasonable.
    If you claim it is because they are fundamentally distinct and do
    different things because one is likely to loop more than three times and
    the other is unlikely to do so, then I'd argue against that claim.


    In C, the syntax is dreadful: not only can you barely distinguish a
    function from a procedure (even without attributes, user types and
    macros add in), but you can hardly tell them apart from variable
    declarations.

    As always, you are trying to make your limited ideas of programming
    languages appear to be correct, universal, obvious or "natural" by
    saying things that you think are flaws in C.ÿ That's not how a
    discussion works, and it is not a way to convince anyone of anything.
    The fact that C does not have a keyword used in the declaration or
    definition of a function does not in any way mean that there is the
    slightest point in your artificial split between "func" and "proc"
    functions.


    ÿ void F();
    ÿ void (*G);
    ÿ void *H();
    ÿ void (*I)();

    OK, 4 things declared here. Are they procedures, functions, variables,
    or pointers to functions? (I avoided using a typedef in place of 'void'
    to make things easier.)

    I /think/ they are as follows: procedure, pointer variable, function (returning void*), and pointer to a procedure. But I had to work at it,
    even though the examples are very simple.

    I don't know about you, but I prefer syntax like this:

    ÿÿ proc F
    ÿÿ ref void G
    ÿÿ ref proc H
    ÿÿ func I -> ref void

    Now come on, scream at again for prefering a nice syntax for
    programming, one which just tells me at a glance what it means without having to work it out.


    I quite agree that your syntax is clearer that the example in C for this
    kind of thing. I rarely see the C syntax as complicated - for my own
    code - because I use typedefs and spacing that makes it clear. But I
    fully agree that it is clearer in a language if it distinguishes better between declarations of variables and declarations of functions.

    However, I don't think it would make a huge difference to the clarity of
    your syntax if you had written :

    func F -> void
    ref void G
    ref func H -> void
    func I -> ref void

    or

    func F
    ref void G
    ref func H
    func I -> ref void


    It is not the use of a keyword for functions that I disagree with, nor
    am I arguing for C's syntax or against your use of "ref" or ordering. I simply don't think there is much to be gained by using "proc F" instead
    of "func F -> void" (assuming that's the right syntax) - or just "func F".

    But I think there is quite a bit to be gained if the func/proc
    distinction told us something useful and new, rather than just the
    existence or lack of a return type.



    (It doesn't matter that I too prefer a clear keyword for defining
    functions in a language.)

    Why? Don't your smart tools tell you all that anyway?


    Yes, they can. But it would be nicer with a keyword. Where possible, I prefer clear language constructs /and/ nice syntax highlighting and
    indexing from tools. Call me greedy if you like!


    That is solely from your choice of an IL.

    The IL design also falls into place from the natural way these things
    have to work.

    Of course you are wrong!

    You keep saying that. But then you also keep saying, from time to time,
    that you agree that something in C was a bad idea. So I'm still wrong
    when calling out the same thing?


    I can agree with you about some of the things you say about C, while
    still disagreeing with other things (about C or programming in general).



    If there was an alternative language that I thought would be better
    for the tasks I have, I'd use that.ÿ (Actually, a subset of C++ is
    often better, so I use that when I can.)

    What do you think I should do instead?ÿ Whine in newsgroups to people
    that don't write language standards (for C or anything else) and don't
    make compilers?

    What makes you think I'm whining? The thread opened up a discussion
    about multi-way selections, and it got into how it could be done with features from other languages.

    You /do/ whine a lot. But here I was asking, rhetorically, if you
    thought that was a good alternative to finding ways to make C work well
    for me.


    I gave some examples from mine, as I'm very familiar with that, and it
    uses simple features that are easy to grasp and appreciate. You could
    have done the same from ones you know.

    But you just hate the idea that I have my own language to draw on, whose syntax is very sweet ('serious' languages hate such syntax for some
    reason, and is usually relegated to scripting languages.)

    I guess then you just have to belittle and insult me, my languages and
    my views at every opporunity.

    I haven't been berating or belittling your language here - I have been
    arguing against some of the justification you have for some design
    decisions, and suggesting something that I think would be better.


    Make my own personal language that is useless to everyone else and
    holds my customers to ransom by being the only person that can work
    with their code?

    Plenty of companies use DSLs. But isn't that sort of what you do? That
    is, using 'C' with a particular interpretation or enforcement of the
    rules, which needs to go in hand with a particular compiler, version,
    sets of options and assorted makefiles.


    No.

    I for one would never be able to build one of your programs. It might as well be written in your inhouse language with proprietory tools.


    Pretty much every professional in my field could manage it. But
    software development is a wide discipline, with many niche areas.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 8 04:51:09 2024
    On 07/11/2024 16:23, David Brown wrote:
    On 06/11/2024 20:38, Bart wrote:

    [Functions vs. procedures]

    ÿÿ void F();
    ÿÿ void (*G);
    ÿÿ void *H();
    ÿÿ void (*I)();

    OK, 4 things declared here. Are they procedures, functions, variables,
    or pointers to functions? (I avoided using a typedef in place of
    'void' to make things easier.)

    I /think/ they are as follows: procedure, pointer variable, function
    (returning void*), and pointer to a procedure. But I had to work at
    it, even though the examples are very simple.

    I don't know about you, but I prefer syntax like this:

    ÿÿÿ proc F
    ÿÿÿ ref void G
    ÿÿÿ ref proc H
    ÿÿÿ func I -> ref void

    (The last two might be wrong interpretations of the C. I've stared at
    the C code for a minute and I'm still not sure.

    If I put it through my C compiler and examine the ST listing, it seems I
    I'd just swapped the last two:

    func H -> ref void
    ref proc I

    But you shouldn't need to employ a tool to figure out if a declaration
    is even a function, let alone whether it is also a procedure. That
    syntax is not fit for purpose. This is a HLL, so let's have some some HL syntax, not gobbledygook.)

    It is not the use of a keyword for functions that I disagree with, nor
    am I arguing for C's syntax or against your use of "ref" or ordering.ÿ I simply don't think there is much to be gained by using "proc F" instead
    of "func F -> void" (assuming that's the right syntax) - or just "func F".

    But I think there is quite a bit to be gained if the func/proc
    distinction told us something useful and new, rather than just the
    existence or lack of a return type.

    I use the same syntax for my dynamic language where type annotations are
    not used, including indicating a return type for a function. That means
    that without distinct keywords here:

    func F =
    end

    proc G =
    end

    I can't tell whether each a returns value or not. So 'func'/'proc' is
    useful to me, to readers, and makes it possible to detect errors and omissions:

    - 'return' without a value in functions
    - 'return x' used in procedures
    - A missing return or missing return value in functions (since this
    is also expression-based and the "return" keyword is optional in the
    last statement/expression)
    - A missing 'else' clause of multi-way constructs within functions
    - Trying to use the value of a function call when that is not a
    function.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Fri Nov 8 19:45:28 2024
    On 07/11/2024 13:23, Bart wrote:
    On 06/11/2024 14:50, David Brown wrote:
    On 05/11/2024 23:48, Bart wrote:
    On 05/11/2024 13:29, David Brown wrote:

    int small_int_sqrt(int x) {
    ÿÿÿÿÿif (x == 0) return 0;
    ÿÿÿÿÿif (x < 4) return 1;
    ÿÿÿÿÿif (x < 9) return 2;
    ÿÿÿÿÿif (x < 16) return 3;
    ÿÿÿÿÿunreachable();
    }

    "unreachable()" is a C23 standardisation of a feature found in most
    high-end compilers.ÿ For gcc and clang, there is
    __builtin_unreachable(), and MSVC has its version.

    So it's a kludge.

    You mean it is something you don't understand? Think of this as an opportunity to learn something new.


    Cool, I can create one of those too:

    ÿfunc smallsqrt(int x)int =
    ÿÿÿÿ if
    ÿÿÿÿ elsif x=0 thenÿ 0
    ÿÿÿÿ elsif x<4 thenÿ 1
    ÿÿÿÿ elsif x<9 thenÿ 2
    ÿÿÿÿ elsif x<16 then 3
    ÿÿÿÿ dummyelseÿÿÿÿÿÿ int.min
    ÿÿÿÿ fi
    ÿend

    'dummyelse' is a special version of 'else' that tells the compiler that control will (should) never arrive there. ATM it does nothing but inform
    the reader of that and to remind the author. But later stages of the compiler can choose not to generate code for it, or to generate error-reporting code.


    You are missing the point - that is shown clearly by the "int.min".

    Do you /really/ not understand when and why it can be useful to tell the compiler that something cannot happen?


    BTW your example lets through negative values; I haven't fixed that.)


    Again, you are missing the point.

    This is all a large and complex subject. But it's not really the point
    of the discussion.


    You haven't followed the discussion or considered it to have a point.
    To you, the "point" of /all/ discussions here is that you hate
    everything about C, think that everyone else loves everything about C,
    and see it as your job to prove them "wrong".

    You have your way of doing things, and have no interest in learning
    anything else or even bothering to listen or think. Your bizarre hatred
    of C is overpowering for you - it doesn't matter what anyone writes.
    All that matters to you is how you can use it as an excuse to fit it
    into your world-view that everything about C, and everything written in
    C, is terrible. You don't even appear to care about your own languages
    beyond the fact that they are not C.

    It is time to give up for now.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 9 04:37:20 2024
    On 08/11/2024 08:45, David Brown wrote:
    On 07/11/2024 13:23, Bart wrote:
    On 06/11/2024 14:50, David Brown wrote:
    On 05/11/2024 23:48, Bart wrote:
    On 05/11/2024 13:29, David Brown wrote:

    int small_int_sqrt(int x) {
    ÿÿÿÿÿif (x == 0) return 0;
    ÿÿÿÿÿif (x < 4) return 1;
    ÿÿÿÿÿif (x < 9) return 2;
    ÿÿÿÿÿif (x < 16) return 3;
    ÿÿÿÿÿunreachable();
    }

    "unreachable()" is a C23 standardisation of a feature found in most
    high-end compilers.ÿ For gcc and clang, there is
    __builtin_unreachable(), and MSVC has its version.

    So it's a kludge.

    You mean it is something you don't understand?ÿ Think of this as an opportunity to learn something new.

    You don't seem to understand a 'kludge' is. Think of it as a 'hack',
    something bolted-on to a language.

    This is from Hacker News about 'unreachable':

    "Note that gcc and clang's __builtin_unreachable() are optimization
    pragmas, not assertions. If control actually reaches a __builtin_unreachable(), your program doesn't necessarily abort.

    Terrible things can happen such as switch statements jumping into random addresses or functions running off the end without returning:"

    "Sure, these aren't for defensive programming—they're for places where
    you know a location is unreachable, but your compiler can't prove it for
    you."

    'dummyelse' is a special version of 'else' that tells the compiler
    that control will (should) never arrive there. ATM it does nothing but
    inform the reader of that and to remind the author. But later stages
    of the compiler can choose not to generate code for it, or to generate
    error-reporting code.


    You are missing the point - that is shown clearly by the "int.min".

    At least my code will never 'run off the end of a function'.

    But, it looks like you're happy with ensuring C programs don't do that,
    by the proven expedient of keeping your fingers crossed.



    You have your way of doing things, and have no interest in learning
    anything else or even bothering to listen or think.

    Ditto for you.

    ÿ Your bizarre hatred
    of C is overpowering for you

    Ditto for your hatred of my stuff.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 04:37:26 2024
    On 03.11.2024 18:00, David Brown wrote:
    On 02/11/2024 21:44, Bart wrote:

    (Note that the '|' is my example is not 'or'; it means 'then':

    ( c | a ) # these are exactly equivalent
    if c then a fi

    ( c | a | ) # so are these
    if c then a else b fi

    There is no restriction on what a and b are, statements or
    expressions, unless the whole returns some value.)

    Ah, so your language has a disastrous choice of syntax here so that
    sometimes "a | b" means "or", and sometimes it means "then" or
    "implies", and sometimes it means "else".

    (I can't comment on the "other use" of the same syntax in the
    "poster's language" since it's not quoted here.)

    But it's not uncommon in programming languages that operators
    are context specific, and may mean different things depending
    on context.

    You are saying "disastrous choice of syntax". - Wow! Hard stuff.
    I suggest to cool down before continuing reading further. :-)

    Incidentally above syntax is what Algol 68 supports; you have
    the choice to write conditionals with 'if' or with parenthesis.
    As opposed to "C", where you have also *two* conditionals, one
    for statements (if-then-else) and one for expressions ( ? : ),
    in Algol 68 you can use both forms (sort of) "anywhere", e.g.
    IF a THEN b ELSE c FI
    x := IF a THEN b ELSE c FI
    IF a THEN b ELSE c FI := x
    or using the respective alternative forms with ( a | b | c) ,
    or ( a | b ) where no 'ELSE' is required. (And there's also
    the 'ELIF' and the '|:' as alternative form available.)

    BTW, the same symbols can also be used as an alternative form
    of the 'case' statement; the semantic distinction is made by
    context, e.g. the types involved in the construct.
    I can understand if this sounds strange and feels unfamiliar.

    Why have a second syntax with
    a confusing choice of operators when you have a perfectly good "if /
    then / else" syntax?

    Because, depending on the program context, that may not be as
    legible as the other, simpler construct.

    Personally I use both forms depending on application context.
    In some cases one syntax is better legible, in other cases the
    other one.[*]

    In complex expressions it may even be worthwhile to mix(!) both
    forms; use 'if' on outer levels and parenthesis on inner levels.
    (Try an example and see before too quickly dismiss the thought.)

    Or if you feel an operator adds a lot to the
    language here, why not choose one that would make sense to people, such
    as "=>" - the common mathematical symbol for "implies".

    This is as opinion of course arguable. It's certainly also
    influenced where one is coming from (i.e. personal expertise
    from other languages). The detail of what symbols are used is
    not that important to me, if it fits to the overall language
    design.

    From the high-level languages I used in my life I was almost
    always "missing" something with conditional expressions. I
    don't want separate and restricted syntaxes (plural!) in "C"
    (for statements and expressions respectively), for example.
    Some are lacking conditional expressions completely. Others
    support the syntax with a 'fi' end-terminator and simplify
    structures (and add to maintainability) supporting 'else-if'.
    And few allow 'if' expressions on the left-hand side of an
    assignment. (Algol 68 happens to support everything I need.
    Unfortunately it's a language I never used professionally.)

    I'm positive that folks who use languages that support those
    syntactic forms wouldn't like to miss them. (Me for sure.)

    ("disastrous syntax" - I'm still laughing... :-)

    Bart, out of interest; have you invented that syntax for your
    language yourself of borrowed it from another language (like
    Algol 68)?

    Janis

    [*] BTW, in Unix shell I also use the '||' and '&&' syntax
    shortcuts occasionally, in addition to the if/then/else/fi
    constructs, depending on the application context.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 04:52:01 2024
    On 06.11.2024 00:01, Bart wrote:

    Well, it started off as 2-way select, meaning constructs like this:

    x = c ? a : b;
    x := (c | a | b)

    Where one of two branches is evaluated. I extended the latter to N-way select:

    x := (n | a, b, c, ... | z)

    Where again one of these elements is evaluated, selected by n (here
    having the values of 1, 2, 3, ... compared with true, false above, but
    there need to be at least 2 elements inside |...| to distinguish them).

    I suppose you borrowed that syntax from Algol 68, or is that just
    coincidence?

    Algol 68's 'CASE' statement has the abbreviated form you depicted
    above. (There's also some nesting supported with the '|:' operator,
    similar to the 'IF' syntax [in Algol 68].) - Personally, though,
    I use that only very rarely because of the restriction to support
    only integral numbers as branch selector.


    I applied it also to other statements that can be provide values, such
    as if-elsif chains and switch, but there the selection might be
    different (eg. a series of tests are done sequentially until a true one).

    I don't know how it got turned into 'multi-way'.

    [...]

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 04:53:52 2024
    On 06.11.2024 11:01, Bart wrote:

    x := (n | a, b, c, ... | z)

    It's a version of Algol68's case construct:

    x := CASE n IN a, b, c OUT z ESAC

    which also has the same compact form I use. I only use the compact
    version because n is usually small, and it is intended to be used within
    an expression: print (n | "One", "Two", "Three" | "Other").

    Which answers my upthread raised questions. :-)

    Thanks.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 9 05:03:54 2024
    On 03/11/2024 17:00, David Brown wrote:
    On 02/11/2024 21:44, Bart wrote:

    (Note that the '|' is my example is not 'or'; it means 'then':

    ÿÿÿ (ÿ c |ÿÿÿ a )ÿÿÿÿÿÿÿÿÿ # these are exactly equivalent
    ÿÿÿ if c then a fi

    ÿÿÿ (ÿ c |ÿÿÿ a |ÿÿÿ b )ÿÿÿÿ # so are these [fixed]
    ÿÿÿ if c then a else b fi

    There is no restriction on what a and b are, statements or
    expressions, unless the whole returns some value.)

    Ah, so your language has a disastrous choice of syntax here so that sometimes "a | b" means "or", and sometimes it means "then" or
    "implies", and sometimes it means "else".

    I missed this part of a very long post until JP commented on it.

    As I mentioned above, "|" here doesn't mean 'or' at all. In "( ... | ...
    | ... )", the first means "then" and the second "else". (It also wasn't
    my idea, it was taken from Algol 68.)


    Why have a second syntax with
    a confusing choice of operators when you have a perfectly good "if /
    then / else" syntax?

    if/then/else suits multi-line statements. (||) suits terms that are part
    of a larger one-line expression.

    I might as well ask why C uses ?: when it has if-else, or why it needs
    m when it has (*P).m.




    Or if you feel an operator adds a lot to the
    language here, why not choose one that would make sense to people, such
    as "=>" - the common mathematical symbol for "implies".

    It is not an operator, it is part of '(x | x,x,x | x)' syntax.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Nov 9 05:18:26 2024
    On 08/11/2024 18:37, Janis Papanagnou wrote:
    On 03.11.2024 18:00, David Brown wrote:
    On 02/11/2024 21:44, Bart wrote:

    (Note that the '|' is my example is not 'or'; it means 'then':

    ( c | a ) # these are exactly equivalent
    if c then a fi

    ( c | a | ) # so are these
    if c then a else b fi

    There is no restriction on what a and b are, statements or
    expressions, unless the whole returns some value.)

    Ah, so your language has a disastrous choice of syntax here so that
    sometimes "a | b" means "or", and sometimes it means "then" or
    "implies", and sometimes it means "else".

    (I can't comment on the "other use" of the same syntax in the
    "poster's language" since it's not quoted here.)

    But it's not uncommon in programming languages that operators
    are context specific, and may mean different things depending
    on context.


    Sure. Just look at the comma for an overloaded syntax in many languages.

    You are saying "disastrous choice of syntax". - Wow! Hard stuff.
    I suggest to cool down before continuing reading further. :-)


    The | operator means "or" in the OP's language (AFAIK - only he actually
    knows the language). So "(a | b | c)" in that language will sometimes
    mean the same as "(a | b | c)" in C, and sometimes it will mean the same
    as "(a ? b : c)" in C.

    There may be some clear distinguishing feature that disambiguates these
    uses. But this is a one-man language - there is no need for a clear
    syntax or grammar, documentation, consistency in the language, or a consideration for corner cases or unusual uses.

    Incidentally above syntax is what Algol 68 supports;

    Yes, he said later that Algol 68 was the inspiration for it. Algol 68
    was very successful in its day - but there are good reasons why many of
    its design choices were been left behind long ago in newer languages.


    Or if you feel an operator adds a lot to the
    language here, why not choose one that would make sense to people, such
    as "=>" - the common mathematical symbol for "implies".

    This is as opinion of course arguable. It's certainly also
    influenced where one is coming from (i.e. personal expertise
    from other languages).

    The language here is "mathematics". I would not expect anyone who even considers designing a programming language to be unfamiliar with that
    symbol.

    The detail of what symbols are used is
    not that important to me, if it fits to the overall language
    design.

    I am quite happy with the same symbol being used for very different
    meanings in different contexts. C's use of "*" for indirection and for multiplication is rarely confusing. Using | for "bitwise or" and also
    using it for a "pipe" operator would probably be fine - only one
    operation makes sense for the types involved. But here the two
    operations - "bitwise or" (or logical or) and "choice" can apply to to
    the same types of operands. That's what makes it a very poor choice of syntax.

    (For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator,
    thus it does not have this confusion.)


    From the high-level languages I used in my life I was almost
    always "missing" something with conditional expressions. I
    don't want separate and restricted syntaxes (plural!) in "C"
    (for statements and expressions respectively), for example.
    Some are lacking conditional expressions completely. Others
    support the syntax with a 'fi' end-terminator and simplify
    structures (and add to maintainability) supporting 'else-if'.
    And few allow 'if' expressions on the left-hand side of an
    assignment. (Algol 68 happens to support everything I need.
    Unfortunately it's a language I never used professionally.)

    I'm positive that folks who use languages that support those
    syntactic forms wouldn't like to miss them. (Me for sure.)

    I've nothing (much) against the operation - it's the choice of operator
    that is wrong.


    ("disastrous syntax" - I'm still laughing... :-)




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 9 09:24:44 2024
    On 08/11/2024 17:37, Janis Papanagnou wrote:
    On 03.11.2024 18:00, David Brown wrote:

    or using the respective alternative forms with ( a | b | c) ,
    or ( a | b ) where no 'ELSE' is required. (And there's also
    the 'ELIF' and the '|:' as alternative form available.)



    BTW, the same symbols can also be used as an alternative form
    of the 'case' statement; the semantic distinction is made by
    context, e.g. the types involved in the construct.

    You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int?

    I've always discriminated on the number of terms between the two |s:
    either 1, or more than 1.

    It would be uncommon to select one-of-N when N is only 1! It does make
    for an untidy exception in the language, but which has never bothered me
    (I don't think I've even thought about it until now.)

    Bart, out of interest; have you invented that syntax for your
    language yourself of borrowed it from another language (like
    Algol 68)?

    It was heavily inspired by the syntax (not the semantics) of Algol68,
    even though I'd never used it at that point.

    I like that it solved the annoying begin-end aspect of Algol60/Pascal
    syntax where you have to write the clunky:

    if cond then begin s1; s2 end else begin s3; s4 end;

    You see it also with braces:

    if (cond) {s1; s2; } else { s3; s4; }

    With Algol68 it became:

    IF cond THEN s1; s2 ELSE s3; s4 FI;

    I enhanced it by not needing stropping (and so not allowing embedded
    spaces within names); allowing redundant semicolons while at the same
    time, turning newlines into semicolons when a line obviously didn't
    continue; plus allowing ordinary 'end' or 'end if' to be used as well as
    'fi'.

    My version then can look like this, a bit less forbidding than Algol68:

    if cond then
    s1
    s2
    else
    s3
    s4
    end



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 14:57:05 2024
    On 08.11.2024 23:24, Bart wrote:
    On 08/11/2024 17:37, Janis Papanagnou wrote:

    BTW, the same symbols can also be used as an alternative form
    of the 'case' statement; the semantic distinction is made by
    context, e.g. the types involved in the construct.

    You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int?

    I've always discriminated on the number of terms between the two |s:
    either 1, or more than 1.

    I suppose in a [historic] "C" like language it's impossible to
    distinguish on type here (given that there was no 'bool' type
    [in former times] in "C"). - But I'm not quite sure whether
    you're speaking here about your "C"-like language or some other
    language you implemented.


    Bart, out of interest; have you invented that syntax for your
    language yourself of borrowed it from another language (like
    Algol 68)?

    It was heavily inspired by the syntax (not the semantics) of Algol68,

    (Sure.)

    even though I'd never used it at that point.

    I like that it solved the annoying begin-end aspect of Algol60/Pascal
    syntax where you have to write the clunky:
    [snip examples]

    Well, annoying would be a strong word [for me] here, but yes,
    that's what I also find suboptimal. Quite some languages have
    adopted the if/fi or if/end forms (and for good reasons, IMO).


    I enhanced it by not needing stropping (and so not allowing embedded
    spaces within names); allowing redundant semicolons while at the same
    time, turning newlines into semicolons when a line obviously didn't
    continue; plus allowing ordinary 'end' or 'end if' to be used as well as 'fi'.

    My version then can look like this, a bit less forbidding than Algol68:

    if cond then
    s1
    s2
    else
    s3
    s4
    end

    (Looks a lot more like a scripting language without semicolons.)

    Not sure what you mean by "less forbidding", though. - Algol 68
    never appeared to me to restrict me. And it allows more flexible
    and coherent application of its concepts (and in a safe way) than
    in a lot other common languages.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 15:51:54 2024
    On 08.11.2024 19:18, David Brown wrote:
    On 08/11/2024 18:37, Janis Papanagnou wrote:

    The | operator means "or" in the OP's language (AFAIK - only he actually knows the language). So "(a | b | c)" in that language will sometimes
    mean the same as "(a | b | c)" in C, and sometimes it will mean the same
    as "(a ? b : c)" in C.

    As said ("I can't comment on the "other use" of the same syntax"),
    I don't know Bart's language, so cannot comment on that.

    And, frankly, some personal language projects are not of interest
    to me, apart from experiences the implementer (Bart) has gotten
    from his projects that might be worthwhile to consider for other
    languages' evolution or design. This is why I got interested in
    the thread and his posts.


    There may be some clear distinguishing feature that disambiguates these
    uses. But this is a one-man language - there is no need for a clear
    syntax or grammar, documentation, consistency in the language, or a consideration for corner cases or unusual uses.

    Incidentally above syntax is what Algol 68 supports;

    Yes, he said later that Algol 68 was the inspiration for it. Algol 68
    was very successful in its day - but there are good reasons why many of
    its design choices were been left behind long ago in newer languages.

    Myself I've never seen Algol 68 code outside of education and
    specification. (But that's normal due my naturally restricted
    view on what happens all over the world. So if you have some
    examples for practical successes of Algol 68 I'd be interested
    to hear about.)

    Some design decisions of Algol 68 are arguable, indeed, and we
    can observe that from the reports those days. (But that's not
    surprising given that there have been a lot of different (and
    strong) characters, university professors and scientists from
    all over the world, in the committees and working group.) It's
    obvious that quite some members left and introduced their own
    languages; those languages were of course also not unopposed.

    I don't think, though, that this natural segregation process or
    any design decisions of some later developed languages would
    give evidence for a clear negative valuation of any specific
    language details (or for the language as a whole). Contrary,
    a lot of later languages even ignored outstanding and important
    concepts of languages these days. (The market and politics have
    their own logic and dynamics.)


    This is as opinion of course arguable. It's certainly also
    influenced where one is coming from (i.e. personal expertise
    from other languages).

    The language here is "mathematics". I would not expect anyone who even considers designing a programming language to be unfamiliar with that
    symbol.

    Mathematics, unfortunately, [too] often has several symbols for
    the same thing. (It's in that respect not very different from
    programming languages, where you can [somewhat] rely on + - * /
    but beyond that it's getting more tight.)

    Programming languages have the additional problem that you don't
    have all necessary symbols available, so language designers have
    to map them onto existing symbols. (Also Unicode in modern times
    do not solve that fact, since languages typically rely on ASCII,
    or some 8-bit extension, at most; full Unicode support, I think,
    is rare, especially on the lexical language level. Some allow
    them in strings, some in identifiers; but in language keywords?)

    BTW, in Algol 68 you can define operators, so you can define
    "OP V" or "OP ^" (for 'or' and 'and', respectively, but we cannot
    define (e.g.) "OP ú" (a middle dot, e.g. for multiplication).[*]


    The detail of what symbols are used is
    not that important to me, if it fits to the overall language
    design.

    I am quite happy with the same symbol being used for very different
    meanings in different contexts. C's use of "*" for indirection and for multiplication is rarely confusing. Using | for "bitwise or" and also
    using it for a "pipe" operator would probably be fine - only one
    operation makes sense for the types involved. But here the two
    operations - "bitwise or" (or logical or) and "choice" can apply to to
    the same types of operands. That's what makes it a very poor choice of syntax.

    Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
    and '&', respectively. But that doesn't prevent me from accepting
    other symbols like '|' to have some [mathematical] meaning, or
    even different meanings depending on context. In mathematics it's
    not different; same symbols are used in different contexts with
    different semantics. (And there's also the mentioned problem of
    non-coherent literature WRT used mathematics' symbols.)


    (For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator, thus it does not have this confusion.)

    Actually, while I like Algol 68's flexibility, there's in some
    cases (to my liking) too many variants. This had partly been
    necessary, of course, due to the (even more) restricted character
    sets (e.g. 6-bit characters) available in the 1960's.

    The two options for conditionals I consider very useful, though,
    and it also produces very legible and easily understandable code.

    [...]

    I've nothing (much) against the operation - it's the choice of operator
    that is wrong.

    Well, on opinions there's nothing more to discuss, I suppose.

    Janis

    [*] Note: I'm using the "Genie" compiler for tests.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 17:00:07 2024
    On 03.11.2024 21:00, Bart wrote:

    This was the first part of your example:

    const char * flag_to_text_A(bool b) {
    if (b == true) {
    return "It's true!";
    } else if (b == false) {
    return "It's false!";

    /I/ would question why you'd want to make the second branch conditional
    in the first place.

    You might want to read about Dijkstra's Guards; it might provide
    some answers, rationales, and insights for this question. (Don't
    get repelled or confused by the "calculate all conditions" aspect
    or the non-determinism; think more about, e.g., the safety of full specification, automated optimization runs, and other [positive]
    implications.)

    (Though if you're only focused on programmer-optimized structures
    Dijkstra's concept and ideas probably won't help you.)

    Incidentally, Dijkstra's Guards cover also an aspect of the OP's
    original question.

    Janis

    Write an 'else' there, and the issue doesn't arise.

    Because I can't see the point of deliberately writing code that usually
    takes two paths, when either:

    (1) you know that one will never be taken, or
    (2) you're not sure, but don't make any provision in case it is

    Fix that first rather relying on compiler writers to take care of your
    badly written code.
    [...]



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 17:54:44 2024
    On 04.11.2024 23:25, David Brown wrote:

    If you have a function (or construct) that returns a correct value for
    inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
    then there is no undefined behaviour no matter what the code looks like
    for values other than 1, 2 and 3. If someone calls that function with
    input 4, then /their/ code has the error - not the code that doesn't
    handle an input 4.

    Well, it's a software system design decision whether you want to
    make the caller test the preconditions for every function call,
    or let the callee take care of unexpected input, or both.

    We had always followed the convention to avoid all undefined
    situations and always define every 'else' case by some sensible
    behavior, at least writing a notice into a log-file, but also
    to "fix" the runtime situation to be able to continue operating.
    (Note, I was mainly writing server-side software where this was
    especially important.)

    That's one reason why (as elsethread mentioned) I dislike 'else'
    to handle a defined value; I prefer an explicit 'if' and use the
    else for reporting unexpected situations (that practically never
    appear, or, with the diagnostics QA-evaluated, asymptotically
    disappearing).

    (For pure binary predicates there's no errors branch, of course.)

    Janis

    PS: One of my favorite IT-gotchas is the plane crash where the
    code specified landing procedure functions for height < 50.0 ft
    and for height > 50.0 ft conditions, which mostly worked since
    the height got polled only every couple seconds, and the case
    height = 50.0 ft happened only very rarely due to the typical
    descent characteristics during landing.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From fir@3:633/280.2 to Bart on Sat Nov 9 21:08:05 2024
    To: Bart <bc@freeuk.com>

    Bart wrote:
    On 06/11/2024 07:26, Kaz Kylheku wrote:
    On 2024-11-05, Bart <bc@freeuk.com> wrote:

    Well, it started off as 2-way select, meaning constructs like this:

    x = c ? a : b;
    x := (c | a | b)

    Where one of two branches is evaluated. I extended the latter to N-way
    select:

    x := (n | a, b, c, ... | z)

    This looks quite error-prone. You have to count carefully that
    the cases match the intended values. If an entry is
    inserted, all the remaining ones shift to a higher value.

    You've basically taken a case construct and auto-generated
    the labels starting from 1.

    It's a version of Algol68's case construct:

    x := CASE n IN a, b, c OUT z ESAC

    which also has the same compact form I use. I only use the compact
    version because n is usually small, and it is intended to be used within
    an expression: print (n | "One", "Two", "Three" | "Other").

    This an actual example (from my first scripting language; not written by
    me):

    Crd[i].z := (BendAssen |P.x, P.y, P.z)

    An out-of-bounds index yields 'void' (via a '| void' part inserted by
    the compiler). This is one of my examples from that era:

    xt := (messa | 1,1,1, 2,2,2, 3,3,3)
    yt := (messa | 3,2,1, 3,2,1, 3,2,1)


    still the more c compatimle version would look better imo

    xt = {1,1,1, 2,2,2, 3,3,3}[messa];
    yt = {3,2,1, 3,2,1, 3,2,1}[messa];

    esp if maybe there would be allowed to also use [] leftside

    and

    t = {1,3, 1,2, 1,1 2,3, 2,2, 2,1, 3,3, 3,2, 3,1} [messa]

    where t is struct {x,y}

    could be maybe faster


    Algol68 didn't have 'switch', but I do, as well as a separate
    case...esac statement that is more general. Those are better for
    multi-line constructs.

    As for being error prone because values can get out of step, so is a
    function call like this:

    f(a, b, c, d, e)

    But I also have keyword arguments.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Nov 9 22:06:21 2024
    On 09/11/2024 07:54, Janis Papanagnou wrote:
    On 04.11.2024 23:25, David Brown wrote:

    If you have a function (or construct) that returns a correct value for
    inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
    then there is no undefined behaviour no matter what the code looks like
    for values other than 1, 2 and 3. If someone calls that function with
    input 4, then /their/ code has the error - not the code that doesn't
    handle an input 4.

    Well, it's a software system design decision whether you want to
    make the caller test the preconditions for every function call,
    or let the callee take care of unexpected input, or both.


    Well, I suppose it is their decision - they can do the right thing, or
    the wrong thing, or both.

    I believe I explained in previous posts why it is the /caller's/ responsibility to ensure pre-conditions are fulfilled, and why anything
    else is simply guaranteeing extra overheads while giving you less
    information for checking code correctness. But I realise that could
    have been lost in the mass of posts, so I can go through it again if you
    want.


    (On security boundaries, system call interfaces, etc., where the caller
    could be malicious or incompetent in a way that damages something other
    than their own program, you have to treat all inputs as dangerous and
    sanitize them, just like data from external sources. That's a different matter, and not the real focus here.)



    We had always followed the convention to avoid all undefined
    situations and always define every 'else' case by some sensible
    behavior, at least writing a notice into a log-file, but also
    to "fix" the runtime situation to be able to continue operating.
    (Note, I was mainly writing server-side software where this was
    especially important.)

    You can't "fix" bugs in the caller code by writing to a log file.
    Sometimes you can limit the damage, however.

    If you can't trust the people writing the calling code, then that should
    be the focus of your development process - find a way to be sure that
    the caller code is right. That's where you want your conventions, or to
    focus code reviews, training, automatic test systems - whatever is
    appropriate for your team and project. Make sure callers pass correct
    data to the function, and the function can do its job properly.

    Sometimes it makes sense to specify functions differently, and accept a
    wider input. Maybe instead of saying "this function will return the
    integer square root of numbers between 0 and 10", you say "this function
    will return the integer square root if given a number between 0 and 10,
    and will log a message and return -1 for other int values". Fair enough
    - now you've got a new function where it is very easy for the caller to
    ensure the preconditions are satisfied. But be very aware of the costs
    - you have now destroyed the "purity" of the function, and lost the key mathematical relation between the input and output. (You have also made everything much less efficient.)

    In terms of development practices, for large code bases you should
    divide things up into modules with clear boundaries. And then you might
    say that the teams working on other modules that call yours are muppets
    that can't read a function specification and can't get their code right.
    So these boundary functions have to accept as wide a range of inputs
    as possible, and check them as well as possible. But you only do that
    for these externally accessible interfaces, not your internal code.


    That's one reason why (as elsethread mentioned) I dislike 'else'
    to handle a defined value; I prefer an explicit 'if' and use the
    else for reporting unexpected situations (that practically never
    appear, or, with the diagnostics QA-evaluated, asymptotically
    disappearing).

    (For pure binary predicates there's no errors branch, of course.)

    Janis

    PS: One of my favorite IT-gotchas is the plane crash where the
    code specified landing procedure functions for height < 50.0 ft
    and for height > 50.0 ft conditions, which mostly worked since
    the height got polled only every couple seconds, and the case
    height = 50.0 ft happened only very rarely due to the typical
    descent characteristics during landing.





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 9 23:21:47 2024
    On 09/11/2024 03:57, Janis Papanagnou wrote:
    On 08.11.2024 23:24, Bart wrote:
    On 08/11/2024 17:37, Janis Papanagnou wrote:

    BTW, the same symbols can also be used as an alternative form
    of the 'case' statement; the semantic distinction is made by
    context, e.g. the types involved in the construct.

    You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int? >>
    I've always discriminated on the number of terms between the two |s:
    either 1, or more than 1.

    I suppose in a [historic] "C" like language it's impossible to
    distinguish on type here (given that there was no 'bool' type
    [in former times] in "C"). - But I'm not quite sure whether
    you're speaking here about your "C"-like language or some other
    language you implemented.

    I currently have three HLL implementations:

    * For my C subset language (originally I had some enhancements, now
    dropped)

    * For my 'M' systems language inspired by A68 syntax

    * For my 'Q' scripting language, with the same syntax, more or less

    The remark was about those last two.

    if cond then
    s1
    s2
    else
    s3
    s4
    end

    (Looks a lot more like a scripting language without semicolons.)

    This is what I've long suspected: that people associate clear, pseudo-code-like syntax with scripting languages.

    'Serious' ones apparently need to look the business with a lot of extra punctuation. The more clutter the better!

    By that criteria, C++ is obviously more advanced than C:

    C: #include <stdio.h>
    printf("A=%d B=%d\n", a, b);

    C++ #include <iostream>
    std::cout << "A=" << a << " " << "B=" << b << std::endl;

    Maybe Zig even more so (normally you'd create a shorter alias to that
    print):

    Zig: @import("std").debug.print("A={d} B={d}\n", .{a, b});

    By that measure, mine probably looks like a toy:

    M: println =a, =b





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Nov 10 03:27:13 2024
    On 09/11/2024 05:51, Janis Papanagnou wrote:
    On 08.11.2024 19:18, David Brown wrote:
    On 08/11/2024 18:37, Janis Papanagnou wrote:


    The language here is "mathematics". I would not expect anyone who even
    considers designing a programming language to be unfamiliar with that
    symbol.

    Mathematics, unfortunately, [too] often has several symbols for
    the same thing. (It's in that respect not very different from
    programming languages, where you can [somewhat] rely on + - * /
    but beyond that it's getting more tight.)

    Programming languages have the additional problem that you don't
    have all necessary symbols available, so language designers have
    to map them onto existing symbols. (Also Unicode in modern times
    do not solve that fact, since languages typically rely on ASCII,
    or some 8-bit extension, at most; full Unicode support, I think,
    is rare, especially on the lexical language level. Some allow
    them in strings, some in identifiers; but in language keywords?)


    Sure, I appreciate all this. We must do the best we can - I am simply
    saying that using | for this operation is far from the best choice.

    BTW, in Algol 68 you can define operators, so you can define
    "OP V" or "OP ^" (for 'or' and 'and', respectively, but we cannot
    define (e.g.) "OP ú" (a middle dot, e.g. for multiplication).[*]


    The detail of what symbols are used is
    not that important to me, if it fits to the overall language
    design.

    I am quite happy with the same symbol being used for very different
    meanings in different contexts. C's use of "*" for indirection and for
    multiplication is rarely confusing. Using | for "bitwise or" and also
    using it for a "pipe" operator would probably be fine - only one
    operation makes sense for the types involved. But here the two
    operations - "bitwise or" (or logical or) and "choice" can apply to to
    the same types of operands. That's what makes it a very poor choice of
    syntax.

    Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
    and '&', respectively. But that doesn't prevent me from accepting
    other symbols like '|' to have some [mathematical] meaning, or
    even different meanings depending on context. In mathematics it's
    not different; same symbols are used in different contexts with
    different semantics. (And there's also the mentioned problem of
    non-coherent literature WRT used mathematics' symbols.)


    We are - unfortunately, perhaps - constrained by common keyboards and
    ASCII (for the most part). "v" and "^" are poor choices for "or" and
    "and" - "∨" and "∧" would be much nicer, but are hard to type. For
    better or worse, the programming world has settled on "|" and "&" as
    practical alternatives. ("+" and "." are often used in boolean logic,
    and can be typed on normal keyboards, but would quickly be confused with
    other uses of those symbols.)


    (For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator,
    thus it does not have this confusion.)

    Actually, while I like Algol 68's flexibility, there's in some
    cases (to my liking) too many variants. This had partly been
    necessary, of course, due to the (even more) restricted character
    sets (e.g. 6-bit characters) available in the 1960's.

    The two options for conditionals I consider very useful, though,
    and it also produces very legible and easily understandable code.

    [...]

    I've nothing (much) against the operation - it's the choice of operator
    that is wrong.

    Well, on opinions there's nothing more to discuss, I suppose.


    Opinions can be justified, and that discussion can be interesting.
    Purely subjective opinion is less interesting.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 15:01:44 2024
    On 09.11.2024 11:08, fir wrote:
    Bart wrote:
    On 06/11/2024 07:26, Kaz Kylheku wrote:
    On 2024-11-05, Bart <bc@freeuk.com> wrote:

    [...] I extended the latter to N-way select:

    x := (n | a, b, c, ... | z)

    This looks quite error-prone. You have to count carefully that
    the cases match the intended values. If an entry is
    inserted, all the remaining ones shift to a higher value.

    You've basically taken a case construct and auto-generated
    the labels starting from 1.

    It's a version of Algol68's case construct:

    x := CASE n IN a, b, c OUT z ESAC

    which also has the same compact form I use. I only use the compact
    version because n is usually small, and it is intended to be used within
    an expression: print (n | "One", "Two", "Three" | "Other").

    [...]

    An out-of-bounds index yields 'void' (via a '| void' part inserted by
    the compiler). This is one of my examples from that era:

    xt := (messa | 1,1,1, 2,2,2, 3,3,3)
    yt := (messa | 3,2,1, 3,2,1, 3,2,1)


    still the more c compatimle version would look better imo

    xt = {1,1,1, 2,2,2, 3,3,3}[messa];
    yt = {3,2,1, 3,2,1, 3,2,1}[messa];

    [...]

    It might look better - which of course lies in the eyes of the
    beholder - but this would actually need more guaranteed context
    or explicit tests (whether "messa" is within defined bounds) to
    become a safe construct; which then again makes it more clumsy.

    Above you also write about the syntax (which included the 'else'
    case) that "This looks quite error-prone." and that you have to
    "count carefully". Why do you think the "C-like" syntax is less
    error prone and that you wouldn't have to count?

    The biggest problem with such old switch semantics is, IMO, that
    you have to map them on sequence numbers [1..N], or use them just
    in contexts where you naturally have such selectors given. (Not
    that the "C-like" suggestion would address that inherent issue.)

    In "C" I occasionally used a {...}[...] or "..."[...] syntax,
    but rather in this form: {...}[... % n] , where 'n' is the
    determined (constant) number of elements.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 15:22:21 2024
    On 09.11.2024 12:06, David Brown wrote:
    On 09/11/2024 07:54, Janis Papanagnou wrote:

    Well, it's a software system design decision whether you want to
    make the caller test the preconditions for every function call,
    or let the callee take care of unexpected input, or both.


    Well, I suppose it is their decision - they can do the right thing, or
    the wrong thing, or both.

    I believe I explained in previous posts why it is the /caller's/ responsibility to ensure pre-conditions are fulfilled, and why anything
    else is simply guaranteeing extra overheads while giving you less
    information for checking code correctness. But I realise that could
    have been lost in the mass of posts, so I can go through it again if you want.

    I haven't read all the posts, or rather, I just skipped most posts;
    it's too time consuming.

    Since you explicitly elaborated - thanks! - I will read this one...

    [...]

    (On security boundaries, system call interfaces, etc., where the caller
    could be malicious or incompetent in a way that damages something other
    than their own program, you have to treat all inputs as dangerous and sanitize them, just like data from external sources. That's a different matter, and not the real focus here.)

    We had always followed the convention to avoid all undefined
    situations and always define every 'else' case by some sensible
    behavior, at least writing a notice into a log-file, but also
    to "fix" the runtime situation to be able to continue operating.
    (Note, I was mainly writing server-side software where this was
    especially important.)

    You can't "fix" bugs in the caller code by writing to a log file.
    Sometimes you can limit the damage, however.

    I spoke more generally of fixing situations (not only bugs).


    If you can't trust the people writing the calling code, then that should
    be the focus of your development process - find a way to be sure that
    the caller code is right. That's where you want your conventions, or to focus code reviews, training, automatic test systems - whatever is appropriate for your team and project. Make sure callers pass correct
    data to the function, and the function can do its job properly.

    Yes.


    Sometimes it makes sense to specify functions differently, and accept a
    wider input. Maybe instead of saying "this function will return the
    integer square root of numbers between 0 and 10", you say "this function
    will return the integer square root if given a number between 0 and 10,
    and will log a message and return -1 for other int values". Fair enough
    - now you've got a new function where it is very easy for the caller to ensure the preconditions are satisfied. But be very aware of the costs
    - you have now destroyed the "purity" of the function, and lost the key mathematical relation between the input and output. (You have also made everything much less efficient.)

    I disagree in the "much less" generalization. I also think that when
    weighing performance versus safety my preferences might be different;
    I'm only speaking about a "rule of thumb", not about the actual (IMO) necessity(!) to make this decisions depending on the project context.

    [...]

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 16:05:02 2024
    On 09.11.2024 17:27, David Brown wrote:
    On 09/11/2024 05:51, Janis Papanagnou wrote:
    [...]

    Sure, I appreciate all this. We must do the best we can - I am simply
    saying that using | for this operation is far from the best choice.

    That's also what I understood. - My point is that preferences (and
    opinions) differ. (And I haven't seen any convincing rationale.)

    Frankly, we're confronted with so much rubbish syntax (in various
    languages, even in the ones we have to or even like to use) that
    I'm at least astonished about your [strong appearing] opinion here.


    Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
    and '&', respectively. But that doesn't prevent me from accepting
    other symbols like '|' to have some [mathematical] meaning, or
    even different meanings depending on context. In mathematics it's
    not different; same symbols are used in different contexts with
    different semantics. (And there's also the mentioned problem of
    non-coherent literature WRT used mathematics' symbols.)


    We are - unfortunately, perhaps - constrained by common keyboards and
    ASCII (for the most part). "v" and "^" are poor choices for "or" and
    "and" - "∨" and "∧" would be much nicer, but are hard to type.

    That was the key what I wanted to express. (I used the approximated
    symbols only for convenience.) - But, as a fact, the symbols I used
    (an alpha-letter and a punctuation character) can [in Algol 68] be
    effectively used as valid operators but the more appropriate Unicode
    characters can't. (In the Genie compiler the 'v' must be used as 'V',
    though.)

    (Yes, it's a pity that we are constrained by keyboards, but not only
    by that. And international use and cooperation makes sensible general applicable solutions not easier.)

    For
    better or worse, the programming world has settled on "|" and "&" as practical alternatives.

    Only a subset of the languages; nowadays vastly those that took "C" -
    to my very astonishment! - as a design paragon.

    Personally I prefer 'and' and 'or' to '&&' and '||', or '&' and '|'.
    (And the others, "∧" and "∨", are out for said reasons.)

    The symbol '|' I associate more with alternatives (BNF, shell syntax,
    etc.). But in Unix shell also with pipes (in former Unixes '^', BTW).
    And I have no problem with it if used as a separator in a conditional,
    where "separator" is of course not the formally appropriate term.

    ("+" and "." are often used in boolean logic,
    and can be typed on normal keyboards, but would quickly be confused with other uses of those symbols.)

    [...]

    Well, on opinions there's nothing more to discuss, I suppose.

    Opinions can be justified, and that discussion can be interesting.
    Purely subjective opinion is less interesting.

    Sure. Your's appreciated as well.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 10 17:00:19 2024
    Bart <bc@freeuk.com> wrote:
    On 05/11/2024 19:53, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 05/11/2024 12:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means >>>>> branching, even if notionally, on one-of-N possible code paths.

    OK.

    The whole construct may or may not return a value. If it does, then one >>>>> of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    What's easier to implement in a language: to have a conditional need for >>> an 'else' branch, which is dependent on the compiler performing some
    arbitrarily complex levels of analysis on some arbitrarily complex set
    of expressions...

    ...or to just always require 'else', with a dummy value if necessary?

    Well, frequently it is easier to do bad job, than a good one.

    I assume that you consider the simple solution the 'bad' one?

    You wrote about _always_ requiring 'else' regardless if it is
    needed or not. Yes, I consider this bad.

    I'd would consider a much elaborate one putting the onus on external
    tools, and still having an unpredictable result to be the poor of the two.

    You want to create a language that is easily compilable, no matter how complex the input.

    Normally time spent _using_ compiler should be bigger than time
    spending writing compiler. If compiler gets enough use, it
    justifies some complexity.

    With the simple solution, the worst that can happen is that you have to write a dummy 'else' branch, perhaps with a dummy zero value.

    If control never reaches that point, it will never be executed (at
    worse, it may need to skip an instruction).

    But if the compiler is clever enough (optionally clever, it is not a requirement!), then it could eliminate that code.

    A bonus is that when debugging, you can comment out all or part of the previous lines, but the 'else' now catches those untested cases.

    I am mainly concerned with clarity and correctness of source code.
    Dummy 'else' doing something may hide errors. Dummy 'else' signaling
    error means that something which could be compile time error is
    only detected at runtime.

    Compiler that detects most errors of this sort is IMO better than
    compiler which makes no effort to detect them. And clearly, once
    problem is formulated in sufficiently general way, it becomes
    unsolvable. So I do not expect general solution, but expect
    resonable effort.

    normally you do not need very complex analysis:

    I don't want to do any analysis at all! I just want a mechanical
    translation as effortlessly as possible.

    I don't like unbalanced code within a function because it's wrong and
    can cause problems.

    Well, I demand more from compiler than you do...

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 10 17:57:26 2024
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 20:39, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Then we disagree on what 'multi-way' select might mean. I think it means >>>>> branching, even if notionally, on one-of-N possible code paths.

    OK.

    I appreciate this is what Bart means by that phrase, but I don't agree
    with it. I'm not sure if that is covered by "OK" or not!

    You may prefer your own definition, but Bart's is resonable one.

    The only argument I can make here is that I have not seen "multi-way
    select" as a defined phrase with a particular established meaning.

    There is well-defined concept appearing when studing control structures.
    I am not sure if "multi-way select" is usual name for it, but with
    Bart explanation it is very clear that he meant this concept. And
    even without his explanation I would assume that he meant this concept.

    The whole construct may or may not return a value. If it does, then one >>>>> of the N paths must be a default path.


    You need to cover all input values. This is possible when there
    is reasonably small number of possibilities. For example, switch on
    char variable which covers all possible values does not need default
    path. Default is needed only when number of possibilities is too
    large to explicitely give all of them. And some languages allow
    ranges, so that you may be able to cover all values with small
    number of ranges.


    I think this is all very dependent on what you mean by "all input values". >>>
    Supposing I declare this function:

    // Return the integer square root of numbers between 0 and 10
    int small_int_sqrt(int x);


    To me, the range of "all input values" is integers from 0 to 10. I
    could implement it as :

    int small_int_sqrt(int x) {
    if (x == 0) return 0;
    if (x < 4) return 1;
    if (x < 9) return 2;
    if (x < 16) return 3;
    unreachable();
    }

    If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
    /their/ fault and /their/ problem. I said nothing about what would
    happen in those cases.

    But some people seem to feel that "all input values" means every
    possible value of the input types, and thus that a function like this
    should return a value even when there is no correct value in and no
    correct value out.

    Well, some languages treat types more seriously than C. In Pascal
    type of your input would be 0..10 and all input values would be
    handled. Sure, when domain is too complicated to express in type
    than it could be documented restriction. Still, it makes sense to
    signal error if value goes outside handled rage, so in a sense all
    values of input type are handled: either you get valid answer or
    clear error.

    No, it does not make sense to do that. Just because the C language does
    not currently (maybe once C++ gets contracts, C will copy them) have a
    way to specify input sets other than by types, does not mean that
    functions in C always have a domain matching all possible combinations
    of bits in the underlying representation of the parameter's types.

    It might be a useful fault-finding aid temporarily to add error messages
    for inputs that are invalid but can physically be squeezed into the parameters. That won't stop people making incorrect declarations of the function and passing completely different parameter types to it, or
    finding other ways to break the requirements of the function.

    And in general there is no way to check the validity of the inputs - you usually have no choice but to trust the caller. It's only in simple
    cases, like the example above, that it would be feasible at all.


    There are, of course, situations where the person calling the function
    is likely to be incompetent, malicious, or both, and where there can be serious consequences for what you might prefer to consider as invalid
    input values.

    You apparently exclude possibility of competent persons making a
    mistake. AFAIK industry statistic shows that code develeped by
    good developers using rigorous process still contains substantial
    number of bugs. So, it makes sense to have as much as possible
    verified mechanically. Which in common practice means depending on
    type checks. In less common practice you may have some theorem
    proving framework checking assertions about input arguments,
    then the assertions take role of types.

    You have that for things like OS system calls - it's no
    different than dealing with user inputs or data from external sources.
    But you handle that by extending the function - increase the range of
    valid inputs and appropriate outputs. You no longer have a function
    that takes a number between 0 and 10 and returns the integer square root
    - you now have a function that takes a number between -(2 ^ 31 + 1) and
    (2 ^ 31) and returns the integer square root if the input is in the
    range 0 to 10 or halts the program with an error message for other
    inputs in the wider range. It's a different function, with a wider set
    of inputs - and again, it is specified to give particular results for particular inputs.

    It make sense to extend definition when such extention converts
    function which use can be verified only by informal process into
    one with formally verified use.

    I certainly would
    be quite unhappy with code above. It is possible that I would still
    use it as a compromise (say if it was desirable to have single
    prototype but handle points in spaces of various dimensions),
    but my first attempt would be something like:

    typedef struct {int p[2];} two_int;
    ....


    I think you'd quickly find that limiting and awkward in C (but it might
    be appropriate in other languages).

    Your snippet handled only two element arrays. If that is right assumption
    for the problem, then typedef above expresses it in IMO resonable
    way. Yes, it is more characters to write than usual C idioms.
    My main "trouble" is that usually I want to handle variable sized
    arrays. In such case beside pointer there would be size argument.
    I would probably use variably modified type in such case.

    But don't misunderstand me - I am
    all in favour of finding ways in code that make input requirements
    clearer or enforceable within the language - never put anything in
    comments if you can do it in code. You could reasonably do this in C
    for the first example :


    // Do not use this directly
    extern int small_int_sqrt_implementation(int x);


    // Return the integer square root of numbers between 0 and 10
    static inline int small_int_sqrt(int x) {
    assert(x >= 0 && x <= 10);
    return small_int_sqrt_implementation(x);
    }

    Hmm, why extern implementation and static wrapper? I would do
    the opposite.

    A function should accept all input values - once you have made clear
    what the acceptable input values can be. A "default" case is just a
    short-cut for conveniently handling a wide range of valid input values - >>> it is never a tool for handling /invalid/ input values.

    Well, default can signal error which frequently is right handling
    of invalid input values.


    Will that somehow fix the bug in the code that calls the function?

    It can be a useful debugging and testing aid, certainly, but it does not make the code "correct" or "safe" in any sense.

    There is concept of "partial correctness": code if it finishes returns
    correct value. A variation of this is: code if it finishes without
    signaling error returns correct values. Such condition may be
    much easier to verify than "full correctness" and in many case
    is almost as useful. In particular, mathematicians are _very_
    unhappy when program return incorrect results. But they are used
    to programs which can not deliver results, either because of
    lack or resources or because needed case was not implemented.

    When dealing with math formulas there are frequently various
    restrictions on parameters, like we can only divide by nonzero
    quantity. By signaling error when restrictions are not
    satisfied we ensure that sucessful completition means that
    restrictions were satisfied. Of course that alone does not
    mean that result is correct, but correctness of "general"
    case is usually _much_ easier to ensure. In other words,
    failing restrictions are major source of errors, and signaling
    errors effectively eliminates it.

    In world of prefect programmers, they would check restrictions
    before calling any function depending on them, or prove that
    restrictions on arguments to a function imply correctness of
    calls made by the function. But world is imperfect and in
    real world extra runtime checks are quite useful.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 18:16:22 2024
    On 09.11.2024 13:21, Bart wrote:
    On 09/11/2024 03:57, Janis Papanagnou wrote:

    [...] - But I'm not quite sure whether
    you're speaking here about your "C"-like language or some other
    language you implemented.

    I currently have three HLL implementations:

    * For my C subset language (originally I had some enhancements, now
    dropped)

    * For my 'M' systems language inspired by A68 syntax

    * For my 'Q' scripting language, with the same syntax, more or less

    The remark was about those last two.

    if cond then
    s1
    s2
    else
    s3
    s4
    end

    (Looks a lot more like a scripting language without semicolons.)

    This is what I've long suspected: that people associate clear, pseudo-code-like syntax with scripting languages.

    Most posts from you that I saw were addressing your "C"-like
    language, so I was confused about the actual focus of your post.

    It's helpful to give some hint if posted code is intended as
    pseudo-code. That wasn't clear to me. So thanks for clarifying.

    BTW, I don't consider scripting languages as "bad" - I'm actually
    doing quite a lot scripting. - My comment doesn't contain any
    valuation and also didn't intend to insinuate one.

    Janis

    [...]


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Nov 11 02:13:21 2024
    On 10/11/2024 05:22, Janis Papanagnou wrote:
    On 09.11.2024 12:06, David Brown wrote:
    On 09/11/2024 07:54, Janis Papanagnou wrote:

    Well, it's a software system design decision whether you want to
    make the caller test the preconditions for every function call,
    or let the callee take care of unexpected input, or both.


    Well, I suppose it is their decision - they can do the right thing, or
    the wrong thing, or both.

    I believe I explained in previous posts why it is the /caller's/
    responsibility to ensure pre-conditions are fulfilled, and why anything
    else is simply guaranteeing extra overheads while giving you less
    information for checking code correctness. But I realise that could
    have been lost in the mass of posts, so I can go through it again if you
    want.

    I haven't read all the posts, or rather, I just skipped most posts;
    it's too time consuming.

    I should probably have skipped /writing/ the posts - it was too time
    consuming :-)


    Since you explicitly elaborated - thanks! - I will read this one...

    [...]

    (On security boundaries, system call interfaces, etc., where the caller
    could be malicious or incompetent in a way that damages something other
    than their own program, you have to treat all inputs as dangerous and
    sanitize them, just like data from external sources. That's a different
    matter, and not the real focus here.)

    We had always followed the convention to avoid all undefined
    situations and always define every 'else' case by some sensible
    behavior, at least writing a notice into a log-file, but also
    to "fix" the runtime situation to be able to continue operating.
    (Note, I was mainly writing server-side software where this was
    especially important.)

    You can't "fix" bugs in the caller code by writing to a log file.
    Sometimes you can limit the damage, however.

    I spoke more generally of fixing situations (not only bugs).

    OK. It can certainly help with /finding/ bugs, that can then be fixed
    later.



    If you can't trust the people writing the calling code, then that should
    be the focus of your development process - find a way to be sure that
    the caller code is right. That's where you want your conventions, or to
    focus code reviews, training, automatic test systems - whatever is
    appropriate for your team and project. Make sure callers pass correct
    data to the function, and the function can do its job properly.

    Yes.


    Sometimes it makes sense to specify functions differently, and accept a
    wider input. Maybe instead of saying "this function will return the
    integer square root of numbers between 0 and 10", you say "this function
    will return the integer square root if given a number between 0 and 10,
    and will log a message and return -1 for other int values". Fair enough
    - now you've got a new function where it is very easy for the caller to
    ensure the preconditions are satisfied. But be very aware of the costs
    - you have now destroyed the "purity" of the function, and lost the key
    mathematical relation between the input and output. (You have also made
    everything much less efficient.)

    I disagree in the "much less" generalization. I also think that when
    weighing performance versus safety my preferences might be different;
    I'm only speaking about a "rule of thumb", not about the actual (IMO) necessity(!) to make this decisions depending on the project context.


    My preferences are very much weighted towards correctness, not
    efficiency. That includes /knowing/ that things are correct, not just
    passing some tests. And key to that is knowing facts about the code
    that can be used to reason about it. If you have a function that has
    clear and specific pre-conditions, you know what you have to do in order
    to use it correctly. It can then give clear and specific
    post-conditions, and you can use these to reason further about your
    code. On the other hand, if the function can, in practice, take any
    input then you have learned little. And if it can do all sorts of
    different things - log a message, return an arbitrary "default" value,
    etc., - then you have nothing to work with for proving or verifying the
    rest of your code.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Nov 11 02:38:25 2024
    On 10/11/2024 07:57, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 20:39, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:


    It might be a useful fault-finding aid temporarily to add error messages
    for inputs that are invalid but can physically be squeezed into the
    parameters. That won't stop people making incorrect declarations of the
    function and passing completely different parameter types to it, or
    finding other ways to break the requirements of the function.

    And in general there is no way to check the validity of the inputs - you
    usually have no choice but to trust the caller. It's only in simple
    cases, like the example above, that it would be feasible at all.


    There are, of course, situations where the person calling the function
    is likely to be incompetent, malicious, or both, and where there can be
    serious consequences for what you might prefer to consider as invalid
    input values.

    You apparently exclude possibility of competent persons making a
    mistake.

    I didn't do so intentionally. I wasn't trying to be exhaustive here. I
    have several times mentioned that extra checks can be very helpful in fault-finding and debugging - good programmers also make mistakes and
    need to debug their code.

    AFAIK industry statistic shows that code develeped by
    good developers using rigorous process still contains substantial
    number of bugs. So, it makes sense to have as much as possible
    verified mechanically. Which in common practice means depending on
    type checks. In less common practice you may have some theorem
    proving framework checking assertions about input arguments,
    then the assertions take role of types.

    Type checks can be extremely helpful, and strong typing greatly reduces
    the errors in released code by catching them early (at compile time).
    And temporary run-time checks are also helpful during development or debugging.

    But extra run-time checks are costly (and I don't mean just in run-time performance, which is only an issue in a minority of situations). They
    mean more code - which means more scope for errors, and more code that
    must be checked and maintained. Usually this code can't be tested well
    in final products - precisely because it is there to handle a situation
    that never occurs.


    But don't misunderstand me - I am
    all in favour of finding ways in code that make input requirements
    clearer or enforceable within the language - never put anything in
    comments if you can do it in code. You could reasonably do this in C
    for the first example :


    // Do not use this directly
    extern int small_int_sqrt_implementation(int x);


    // Return the integer square root of numbers between 0 and 10
    static inline int small_int_sqrt(int x) {
    assert(x >= 0 && x <= 10);
    return small_int_sqrt_implementation(x);
    }

    Hmm, why extern implementation and static wrapper? I would do
    the opposite.

    I wrote it the way you might have it in a header - the run-time check disappears when it is disabled (or if the compiler can see that the
    check always passes). The real function implementation is hidden away
    in an implementation module.


    A function should accept all input values - once you have made clear
    what the acceptable input values can be. A "default" case is just a
    short-cut for conveniently handling a wide range of valid input values - >>>> it is never a tool for handling /invalid/ input values.

    Well, default can signal error which frequently is right handling
    of invalid input values.


    Will that somehow fix the bug in the code that calls the function?

    It can be a useful debugging and testing aid, certainly, but it does not
    make the code "correct" or "safe" in any sense.

    There is concept of "partial correctness": code if it finishes returns correct value. A variation of this is: code if it finishes without
    signaling error returns correct values. Such condition may be
    much easier to verify than "full correctness" and in many case
    is almost as useful. In particular, mathematicians are _very_
    unhappy when program return incorrect results. But they are used
    to programs which can not deliver results, either because of
    lack or resources or because needed case was not implemented.

    When dealing with math formulas there are frequently various
    restrictions on parameters, like we can only divide by nonzero
    quantity. By signaling error when restrictions are not
    satisfied we ensure that sucessful completition means that
    restrictions were satisfied. Of course that alone does not
    mean that result is correct, but correctness of "general"
    case is usually _much_ easier to ensure. In other words,
    failing restrictions are major source of errors, and signaling
    errors effectively eliminates it.


    Yes, out-of-band signalling in some way is a useful way to indicate a
    problem, and can allow parameter checking without losing the useful
    results of a function. This is the principle behind exceptions in many languages - then functions either return normally with correct results,
    or you have a clearly abnormal situation.

    In world of prefect programmers, they would check restrictions
    before calling any function depending on them, or prove that
    restrictions on arguments to a function imply correctness of
    calls made by the function. But world is imperfect and in
    real world extra runtime checks are quite useful.


    Runtime checks in a function can be useful if you know the calling code
    might not be perfect and the function is going to take responsibility
    for identifying that situation. Programmers will often be writing both
    the caller and callee code, and put temporary debugging and test checks wherever it is most convenient.

    But I think being too enthusiastic about putting checks in the wrong
    place - the callee function - can hide the real problems, or make the
    callee code writer less careful about getting their part of the code
    correct.

    Real-world programmers are imperfect - that does not mean their code has
    to be.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Tue Nov 12 06:09:08 2024
    David Brown <david.brown@hesbynett.no> wrote:
    On 10/11/2024 07:57, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 20:39, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 05/11/2024 13:42, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:


    Type checks can be extremely helpful, and strong typing greatly reduces
    the errors in released code by catching them early (at compile time).
    And temporary run-time checks are also helpful during development or debugging.

    But extra run-time checks are costly (and I don't mean just in run-time performance, which is only an issue in a minority of situations). They
    mean more code - which means more scope for errors, and more code that
    must be checked and maintained. Usually this code can't be tested well
    in final products - precisely because it is there to handle a situation
    that never occurs.

    It depends. gcc used to have several accessors macros which could
    perform checks. They were turned of during "production use" (mainly
    because checks increased runtime), but were "always" present in
    source code. "Source cost" was moderate, checking code took hundreds,
    moje be low thousends of lines in headres definitng the macros.
    Actual use of macros was the same as if macros did no checking,
    so there was minimal increase in source complexity.

    Concerning testing, things exposed in exported interface frequently
    can be tested with reasonable effort. The main issue is generating
    apropriate arguments and possibly replicating global state (but
    I normally have global state only when strictly necessary).

    A function should accept all input values - once you have made clear >>>>> what the acceptable input values can be. A "default" case is just a >>>>> short-cut for conveniently handling a wide range of valid input values - >>>>> it is never a tool for handling /invalid/ input values.

    Well, default can signal error which frequently is right handling
    of invalid input values.


    Will that somehow fix the bug in the code that calls the function?

    It can be a useful debugging and testing aid, certainly, but it does not >>> make the code "correct" or "safe" in any sense.

    There is concept of "partial correctness": code if it finishes returns
    correct value. A variation of this is: code if it finishes without
    signaling error returns correct values. Such condition may be
    much easier to verify than "full correctness" and in many case
    is almost as useful. In particular, mathematicians are _very_
    unhappy when program return incorrect results. But they are used
    to programs which can not deliver results, either because of
    lack or resources or because needed case was not implemented.

    When dealing with math formulas there are frequently various
    restrictions on parameters, like we can only divide by nonzero
    quantity. By signaling error when restrictions are not
    satisfied we ensure that sucessful completition means that
    restrictions were satisfied. Of course that alone does not
    mean that result is correct, but correctness of "general"
    case is usually _much_ easier to ensure. In other words,
    failing restrictions are major source of errors, and signaling
    errors effectively eliminates it.


    Yes, out-of-band signalling in some way is a useful way to indicate a problem, and can allow parameter checking without losing the useful
    results of a function. This is the principle behind exceptions in many languages - then functions either return normally with correct results,
    or you have a clearly abnormal situation.

    In world of prefect programmers, they would check restrictions
    before calling any function depending on them, or prove that
    restrictions on arguments to a function imply correctness of
    calls made by the function. But world is imperfect and in
    real world extra runtime checks are quite useful.


    Runtime checks in a function can be useful if you know the calling code might not be perfect and the function is going to take responsibility
    for identifying that situation. Programmers will often be writing both
    the caller and callee code, and put temporary debugging and test checks wherever it is most convenient.

    But I think being too enthusiastic about putting checks in the wrong
    place - the callee function - can hide the real problems, or make the
    callee code writer less careful about getting their part of the code correct.

    IME the opposite: not having checks in called function simply delays
    moment when error is detected. Getting errors early helps focus on
    tricky problems or misconceptions. And motivates programmers to
    be more careful

    Concerning correct place for checks: one could argue that check
    should be close to place where the result of check matters, which
    frequently is in called function. And frequently check requires
    computation that is done by called function as part of normal
    processing, but would be extra code in the caller.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 12 08:24:02 2024
    On 10/11/2024 06:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    I assume that you consider the simple solution the 'bad' one?

    You wrote about _always_ requiring 'else' regardless if it is
    needed or not. Yes, I consider this bad.

    It is 'needed' by the language because of its rules. It might not be
    needed by a particular function because the author knows that all
    expected values of the 2**64 range of most scalar parameters have been covered.

    The language doesn't know.

    But the rule only applies to value-returning statements; you can choose
    not to use such statements, but more conventional ones like those in C.

    However, the language will still consider the last statement of a value-returning function to be such a statement. So either that one
    needs 'else' (perhaps in multiple branches), or you instead need a dummy 'return x' at the end of the function, one which is never executed.

    I don't think that's too onerous, and it is safer than somehow asking
    the language to disable the requirement. (How would that be done, by
    some special keyword? Then you'd just be writing that keyword instead of 'return'!)


    I'd would consider a much elaborate one putting the onus on external
    tools, and still having an unpredictable result to be the poor of the two. >>
    You want to create a language that is easily compilable, no matter how
    complex the input.

    Normally time spent _using_ compiler should be bigger than time
    spending writing compiler. If compiler gets enough use, it
    justifies some complexity.

    That doesn't add up: the more the compiler gets used, the slower it
    should get?!

    The sort of analysis you're implying I don't think belongs in the kind
    of compiler I prefer. Even if it did, it would be later on in the
    process than the point where the above restriction is checked, so
    wouldn't exist in one of my compilers anyway.

    I don't like open-ended tasks like this where compilation time could end
    up being anything. If you need to keep recompiling the same module, then
    you don't want to repeat that work each time.


    I am mainly concerned with clarity and correctness of source code.

    So am I. I try to keep my syntax clean and uncluttered.

    Dummy 'else' doing something may hide errors.

    So can 'unreachable'.

    Dummy 'else' signaling
    error means that something which could be compile time error is
    only detected at runtime.

    Compiler that detects most errors of this sort is IMO better than
    compiler which makes no effort to detect them. And clearly, once
    problem is formulated in sufficiently general way, it becomes
    unsolvable. So I do not expect general solution, but expect
    resonable effort.

    So how would David Brown's example work:

    int F(int n) {
    if (n==1) return 10;
    if (n==2) return 20;
    }

    /You/ know that values -2**31 to 0 and 3 to 2**31-1 are impossible; the compiler doesn't. It's likely to tell you that you may run into the end
    of the function.

    So what do you want the compiler to here? If I try it:

    func F(int n)int =
    if n=1 then return 10 fi
    if n=2 then return 20 fi
    end

    It says 'else needed' (in that last statement). I can also shut it up
    like this:

    func F(int n)int = # int is i64 here
    if n=1 then return 10 fi
    if n=2 then return 20 fi
    0
    end

    Since now that last statement is the '0' value (any int value wil do).
    What should my compiler report instead? What analysis should it be
    doing? What would that save me from typing?


    normally you do not need very complex analysis:

    I don't want to do any analysis at all! I just want a mechanical
    translation as effortlessly as possible.

    I don't like unbalanced code within a function because it's wrong and
    can cause problems.

    Well, I demand more from compiler than you do...

    Perhaps you're happy for it to be bigger and slower too. Most of my
    projects build more or less instantly. Here 'ms' is a version that runs programs directly from source (the first 'ms' is 'ms.exe' and subsequent
    ones are 'ms.m' the lead module):

    c:\bx>ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms hello
    Hello World! 21:00:45

    This builds and runs 15 successive generations of itself in memory
    before building and running hello.m; it took 1 second in all. (Now try
    that with gcc!)

    Here:

    c:\cx>tm \bx\mm -runp cc sql
    Compiling cc.m to <pcl>
    Compiling sql.c to sql.exe

    This compiles my C compiler from source but then it /interprets/ the IR produced. This interpreted compiler took 6 seconds to build the 250Kloc
    test file, and it's a very slow interpreter (it's used for testing and debugging).

    (gcc -O0 took a bit longer to build sql.c! About 7 seconds but it is
    using a heftier windows.h.)

    If I run the C compiler from source as native code (\bx\ms cc sql) then building the compiler *and* sql.c takes 1/3 of a second.

    You can't do this stuff with the compilers David Brown uses; I'm
    guessing you can't do it with your prefered ones either.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 12 20:43:54 2024
    On 11/11/2024 20:09, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:


    Runtime checks in a function can be useful if you know the calling code
    might not be perfect and the function is going to take responsibility
    for identifying that situation. Programmers will often be writing both
    the caller and callee code, and put temporary debugging and test checks
    wherever it is most convenient.

    But I think being too enthusiastic about putting checks in the wrong
    place - the callee function - can hide the real problems, or make the
    callee code writer less careful about getting their part of the code
    correct.

    IME the opposite: not having checks in called function simply delays
    moment when error is detected. Getting errors early helps focus on
    tricky problems or misconceptions. And motivates programmers to
    be more careful

    I am always in favour of finding errors at the earliest opportunity -
    suitable compiler (and even editor/IDE) warnings, strong types, static assertions, etc., are vital tools. Having temporary extra checks at appropriate points in the code are often useful for debugging.

    I don't share your feeling about what motivates programmers to be more
    careful - however, I have no evidence to back that up.


    Concerning correct place for checks: one could argue that check
    should be close to place where the result of check matters, which
    frequently is in called function.

    No, there I disagree. The correct place for the checks should be close
    to where the error is, and that is in the /calling/ code. If the called function is correctly written, reviewed, tested, documented and
    considered "finished", why would it be appropriate to add extra code to
    that in order to test and debug some completely different part of the code?

    The place where the result of the check /really/ matters, is the calling
    code. And that is also the place where you can most easily find the
    error, since the error is in the calling code, not the called function.
    And it is most likely to be the code that you are working on at the time
    - the called function is already written and tested.

    And frequently check requires
    computation that is done by called function as part of normal
    processing, but would be extra code in the caller.


    It is more likely to be the opposite in practice.

    And for much of the time, the called function has no real practical way
    to check the parameters anyway. A function that takes a pointer
    parameter - not an uncommon situation - generally has no way to check
    the validity of the pointer. It can't check that the pointer actually
    points to useful source data or an appropriate place to store data.

    All it can do is check for a null pointer, which is usually a fairly
    useless thing to do (unless the specifications for the function make the pointer optional). After all, on most (but not all) systems you already
    have a "free" null pointer check - if the caller code has screwed up and passed a null pointer when it should not have done, the program will
    quickly crash when the pointer is used for access. Many compilers
    provide a way to annotate function declarations to say that a pointer
    must not be null, and can then spot at least some such errors at compile
    time. And of course the calling code will very often be passing the
    address of an object in the call - since that can't be null, a check in
    the function is pointless.

    Once you get to more complex data structures, the possibility for the
    caller to check the parameters gets steadily less realistic.

    So now your practice of having functions "always" check their parameters leaves the people writing calling code with a false sense of security - usually you /don't/ check the parameters, you only ever do simple checks
    that that called could (and should!) do if they were realistic. You've
    got the maintenance and cognitive overload of extra source code for your various "asserts" and other check, regardless of any run-time costs
    (which are often irrelevant, but occasionally very important).


    You will note that much of this - for both sides of the argument - uses
    words like "often", "generally" or "frequently". It is important to appreciate that programming spans a very wide range of situations, and I
    don't want to be too categorical about things. I have already said
    there are situations when parameter checking in called functions can
    make sense. I've no doubt that for some people and some types of
    coding, such cases are a lot more common than what I see in my coding.

    Note also that when you can use tools to automate checks, such as
    "sanitize" options in compilers or different languages that have more
    in-built checks, the balance differs. You will generally pay a run-time
    cost for those checks, but you don't have the same kind of source-level
    costs - your code is still clean, clear, and amenable to correctness
    checking, without hiding the functionality of the code in a mass of unnecessary explicit checks. This is particularly good for debugging,
    and the run-time costs might not be important. (But if run-time costs
    are not important, there's a good chance that C is not the best language
    to be using in the first place.)






    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sat Nov 16 05:50:43 2024
    David Brown <david.brown@hesbynett.no> wrote:
    On 11/11/2024 20:09, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:

    Concerning correct place for checks: one could argue that check
    should be close to place where the result of check matters, which
    frequently is in called function.

    No, there I disagree. The correct place for the checks should be close
    to where the error is, and that is in the /calling/ code. If the called function is correctly written, reviewed, tested, documented and
    considered "finished", why would it be appropriate to add extra code to
    that in order to test and debug some completely different part of the code?

    The place where the result of the check /really/ matters, is the calling code. And that is also the place where you can most easily find the
    error, since the error is in the calling code, not the called function.
    And it is most likely to be the code that you are working on at the time
    - the called function is already written and tested.

    And frequently check requires
    computation that is done by called function as part of normal
    processing, but would be extra code in the caller.


    It is more likely to be the opposite in practice.

    And for much of the time, the called function has no real practical way
    to check the parameters anyway. A function that takes a pointer
    parameter - not an uncommon situation - generally has no way to check
    the validity of the pointer. It can't check that the pointer actually points to useful source data or an appropriate place to store data.

    All it can do is check for a null pointer, which is usually a fairly
    useless thing to do (unless the specifications for the function make the pointer optional). After all, on most (but not all) systems you already have a "free" null pointer check - if the caller code has screwed up and passed a null pointer when it should not have done, the program will
    quickly crash when the pointer is used for access. Many compilers
    provide a way to annotate function declarations to say that a pointer
    must not be null, and can then spot at least some such errors at compile time. And of course the calling code will very often be passing the
    address of an object in the call - since that can't be null, a check in
    the function is pointless.

    Well, in a sense pointers are easy: if you do not play nasty tricks
    with casts then type checks do significant part of checking. Of
    course, pointer may be uninitialized (but compiler warnings help a lot
    here), memory may be overwritten, etc. But overwritten memory is
    rather special, if you checked that content of memory is correct,
    but it is overwritten after the check, then earlier check does not
    help. Anyway, main point is ensuring that pointed to data satisfies
    expected conditions.

    Once you get to more complex data structures, the possibility for the
    caller to check the parameters gets steadily less realistic.

    So now your practice of having functions "always" check their parameters leaves the people writing calling code with a false sense of security - usually you /don't/ check the parameters, you only ever do simple checks that that called could (and should!) do if they were realistic. You've
    got the maintenance and cognitive overload of extra source code for your various "asserts" and other check, regardless of any run-time costs
    (which are often irrelevant, but occasionally very important).


    You will note that much of this - for both sides of the argument - uses words like "often", "generally" or "frequently". It is important to appreciate that programming spans a very wide range of situations, and I don't want to be too categorical about things. I have already said
    there are situations when parameter checking in called functions can
    make sense. I've no doubt that for some people and some types of
    coding, such cases are a lot more common than what I see in my coding.

    Note also that when you can use tools to automate checks, such as
    "sanitize" options in compilers or different languages that have more in-built checks, the balance differs. You will generally pay a run-time cost for those checks, but you don't have the same kind of source-level costs - your code is still clean, clear, and amenable to correctness checking, without hiding the functionality of the code in a mass of unnecessary explicit checks. This is particularly good for debugging,
    and the run-time costs might not be important. (But if run-time costs
    are not important, there's a good chance that C is not the best language
    to be using in the first place.)

    Our experience differs. As a silly example consider a parser
    which produces parse tree. Caller is supposed to pass syntactically
    correct string as an argument. However, checking syntactic corretnetness requires almost the same effort as producing parse tree, so it
    ususal that parser both checks correctness and produces the result.
    I have computations that are quite different than parsing but
    in some cases share the same characteristic: checking correctness of
    arguments requires complex computation similar to producing
    actual result. More freqently, called routine can check various
    invariants which with high probablity can detect errors. Doing
    the same check in caller is inpractical.

    Most of my coding is in different languages than C. One of languages
    that I use essentially forces programmer to insert checks in
    some places. For example unions are tagged and one can use
    specific variant only after checking that this is the current
    variant. Similarly, fall-trough control structures may lead
    to type error at compile time. But signalling error is considered
    type safe. So code which checks for unhandled case and signals
    errors is accepted as type correct. Unhandled cases frequently
    lead to type errors. There is some overhead, but IMO it is accepable.
    The language in question is garbage collected, so many memory
    related problems go away.

    Frequently checks come as natural byproduct of computations. When
    handling tree like structures in C IME usualy simplest code code
    is reqursive with base case being the null pointer. When base
    case should not occur we get check instead of computation.
    Skipping such checks also put cognitive load on the reader:
    normal pattern has corresponding case, so reader does not know
    if the case was ommited by accident or it can not occur. Comment
    may clarify this, but error check is equally clear.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Stefan Ram@3:633/280.2 to All on Sat Nov 16 20:42:49 2024
    Dan Purgert <dan@djph.net> wrote or quoted:
    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Stefan Ram (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sun Nov 17 01:51:34 2024
    On 16/11/2024 09:42, Stefan Ram wrote:
    Dan Purgert <dan@djph.net> wrote or quoted:
    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    FGS please turn down the 'hip lingo' generator down a few notches!



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sun Nov 17 02:14:07 2024
    On 11/16/24 04:42, Stefan Ram wrote:
    ....
    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }

    Nice. It did take a little while for me to figure out what was wrong,
    but since I knew that something was wrong, I did eventually find it -
    without first running the program.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Lew Pitcher@3:633/280.2 to All on Sun Nov 17 02:37:24 2024
    On Sat, 16 Nov 2024 09:42:49 +0000, Stefan Ram wrote:

    Dan Purgert <dan@djph.net> wrote or quoted:
    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }

    If I read your code correctly, you have actually included not one,
    but TWO curveballs. Well done!

    --
    Lew Pitcher
    "In Skills We Trust"

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Nov 17 03:29:17 2024
    On 15/11/2024 19:50, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 11/11/2024 20:09, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:

    Concerning correct place for checks: one could argue that check
    should be close to place where the result of check matters, which
    frequently is in called function.

    No, there I disagree. The correct place for the checks should be close
    to where the error is, and that is in the /calling/ code. If the called
    function is correctly written, reviewed, tested, documented and
    considered "finished", why would it be appropriate to add extra code to
    that in order to test and debug some completely different part of the code? >>
    The place where the result of the check /really/ matters, is the calling
    code. And that is also the place where you can most easily find the
    error, since the error is in the calling code, not the called function.
    And it is most likely to be the code that you are working on at the time
    - the called function is already written and tested.

    And frequently check requires
    computation that is done by called function as part of normal
    processing, but would be extra code in the caller.


    It is more likely to be the opposite in practice.

    And for much of the time, the called function has no real practical way
    to check the parameters anyway. A function that takes a pointer
    parameter - not an uncommon situation - generally has no way to check
    the validity of the pointer. It can't check that the pointer actually
    points to useful source data or an appropriate place to store data.

    All it can do is check for a null pointer, which is usually a fairly
    useless thing to do (unless the specifications for the function make the
    pointer optional). After all, on most (but not all) systems you already
    have a "free" null pointer check - if the caller code has screwed up and
    passed a null pointer when it should not have done, the program will
    quickly crash when the pointer is used for access. Many compilers
    provide a way to annotate function declarations to say that a pointer
    must not be null, and can then spot at least some such errors at compile
    time. And of course the calling code will very often be passing the
    address of an object in the call - since that can't be null, a check in
    the function is pointless.

    Well, in a sense pointers are easy: if you do not play nasty tricks
    with casts then type checks do significant part of checking. Of
    course, pointer may be uninitialized (but compiler warnings help a lot
    here), memory may be overwritten, etc. But overwritten memory is
    rather special, if you checked that content of memory is correct,
    but it is overwritten after the check, then earlier check does not
    help. Anyway, main point is ensuring that pointed to data satisfies
    expected conditions.


    That does not match reality. Pointers are far and away the biggest
    source of errors in C code. Use after free, buffer overflows, mixups of
    who "owns" the pointer - the scope for errors is boundless. You are
    correct that type systems can catch many potential types of errors - unfortunately, people /do/ play nasty tricks with type checks.
    Conversions of pointer types are found all over the place in C
    programming, especially conversions back and forth with void* pointers.

    All this means that invalid pointer parameters are very much a real
    issue - but are typically impossible to check in the called function.

    The way you avoid getting errors in your pointers is being careful about having the right data in the first place, so you only call functions
    with valid parameters. You do this by having careful control about the ownership and lifetime of pointers, and what they point to, keeping conventions in the names of your pointers and functions to indicate who
    owns what, and so on. And you use sanitizers and similar tools during
    testing and debugging to distinguish between tests that worked by luck,
    and ones that worked reliably. (And of course you may consider other languages than C that help you express your requirements in a clearer
    manner or with better automatic checking.)

    Put the same effort and due diligence into the rest of your code, and
    suddenly you find your checks for other kinds of parameters in functions
    are irrelevant as you are now making sure you call functions with
    appropriate valid inputs.


    Once you get to more complex data structures, the possibility for the
    caller to check the parameters gets steadily less realistic.

    So now your practice of having functions "always" check their parameters
    leaves the people writing calling code with a false sense of security -
    usually you /don't/ check the parameters, you only ever do simple checks
    that that called could (and should!) do if they were realistic. You've
    got the maintenance and cognitive overload of extra source code for your
    various "asserts" and other check, regardless of any run-time costs
    (which are often irrelevant, but occasionally very important).


    You will note that much of this - for both sides of the argument - uses
    words like "often", "generally" or "frequently". It is important to
    appreciate that programming spans a very wide range of situations, and I
    don't want to be too categorical about things. I have already said
    there are situations when parameter checking in called functions can
    make sense. I've no doubt that for some people and some types of
    coding, such cases are a lot more common than what I see in my coding.

    Note also that when you can use tools to automate checks, such as
    "sanitize" options in compilers or different languages that have more
    in-built checks, the balance differs. You will generally pay a run-time
    cost for those checks, but you don't have the same kind of source-level
    costs - your code is still clean, clear, and amenable to correctness
    checking, without hiding the functionality of the code in a mass of
    unnecessary explicit checks. This is particularly good for debugging,
    and the run-time costs might not be important. (But if run-time costs
    are not important, there's a good chance that C is not the best language
    to be using in the first place.)

    Our experience differs. As a silly example consider a parser
    which produces parse tree. Caller is supposed to pass syntactically
    correct string as an argument. However, checking syntactic corretnetness requires almost the same effort as producing parse tree, so it
    ususal that parser both checks correctness and produces the result.

    The trick here is to avoid producing a syntactically invalid string in
    the first place. Solve the issue at the point where there is a mistake
    in the code!

    (If you are talking about a string that comes from outside the code in
    some way, then of course you need to check it - and if that is most conveniently done during the rest of parsing, then that is fair enough.)

    I have computations that are quite different than parsing but
    in some cases share the same characteristic: checking correctness of arguments requires complex computation similar to producing
    actual result. More freqently, called routine can check various
    invariants which with high probablity can detect errors. Doing
    the same check in caller is inpractical.

    I think you are misunderstanding me - maybe I have been unclear. I am
    saying that it is the /caller's/ responsibility to make sure that the parameters it passes are correct, not the /callee's/ responsibility.
    That does not mean that the caller has to add checks to get the
    parameters right - it means the caller has to use correct parameters.


    Think of this like walking near a cliff-edge. Checking parameters
    before the call is like having a barrier at the edge of the cliff. My recommendation is that you know where the cliff edge is, and don't walk
    there. Checking parameters in the called function is like having a
    crash mat at the bottom of the cliff for people who blindly walk off it.


    Most of my coding is in different languages than C. One of languages
    that I use essentially forces programmer to insert checks in
    some places. For example unions are tagged and one can use
    specific variant only after checking that this is the current
    variant. Similarly, fall-trough control structures may lead
    to type error at compile time. But signalling error is considered
    type safe. So code which checks for unhandled case and signals
    errors is accepted as type correct. Unhandled cases frequently
    lead to type errors. There is some overhead, but IMO it is accepable.
    The language in question is garbage collected, so many memory
    related problems go away.

    Frequently checks come as natural byproduct of computations. When
    handling tree like structures in C IME usualy simplest code code
    is reqursive with base case being the null pointer. When base
    case should not occur we get check instead of computation.
    Skipping such checks also put cognitive load on the reader:
    normal pattern has corresponding case, so reader does not know
    if the case was ommited by accident or it can not occur. Comment
    may clarify this, but error check is equally clear.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Nov 17 03:38:37 2024
    On 16/11/2024 15:51, Bart wrote:
    On 16/11/2024 09:42, Stefan Ram wrote:
    Dan Purgert <dan@djph.net> wrote or quoted:
    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    ÿÿ My bad if the following instruction structure's already been hashed
    ÿÿ out in this thread, but I haven't been following the whole convo!

    ÿÿ In my C 101 classes, after we've covered "if" and "else",
    ÿÿ I always throw this program up on the screen and hit the newbies
    ÿÿ with this curveball: "What's this bad boy going to spit out?".

    FGS please turn down the 'hip lingo' generator down a few notches!



    I wonder what happened to Stefan. He used to make perfectly good posts.
    Then he disappeared for a bit, and came back with this new "style".

    Given that this "new" Stefan can write posts with interesting C content,
    such as this one, and has retained his ugly coding layout and
    non-standard Usenet format, I have to assume it's still the same person
    behind the posts.

    Is he using some "translate to hip lingo" tool? Or has he had a stroke
    or brain tumour that has rendered him incapable of writing text like an
    adult while still being able to write C code?



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Mon Nov 18 00:51:26 2024
    Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:

    On Sat, 16 Nov 2024 09:42:49 +0000, Stefan Ram wrote:

    Dan Purgert <dan@djph.net> wrote or quoted:

    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }

    If I read your code correctly, you have actually included not one,
    but TWO curveballs. Well done!

    What's the second curveball?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Tue Nov 19 12:53:05 2024
    Bart <bc@freeuk.com> wrote:
    On 10/11/2024 06:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    I'd would consider a much elaborate one putting the onus on external
    tools, and still having an unpredictable result to be the poor of the two. >>>
    You want to create a language that is easily compilable, no matter how
    complex the input.

    Normally time spent _using_ compiler should be bigger than time
    spending writing compiler. If compiler gets enough use, it
    justifies some complexity.

    That doesn't add up: the more the compiler gets used, the slower it
    should get?!

    More complicated does not mean slower. Binary search or hash tables
    are more complicated than linear search, but for larger data may
    be much faster. Similarly, compiler may be simplified by using
    simpler but slower methods and more complicated compiler may use
    faster methods. This is particularly relevant here: simple compiler
    may keep list of cases or ranges and linarly scan those. More
    advanced one may use a say a tree structure.

    More generaly, I want to minimize time spent by the programmer,
    that is _sum over all iterations leading to correct program_ of
    compile time and "think time". Compiler that compiles slower,
    but allows less iterations due to better diagnostics may win.
    Also, humans perceive 0.1s delay almost like no delay at all.
    So it does not matter if single compilation step is 0.1s or
    0.1ms. Modern computers can do a lot of work in 0.1s.

    The sort of analysis you're implying I don't think belongs in the kind
    of compiler I prefer. Even if it did, it would be later on in the
    process than the point where the above restriction is checked, so
    wouldn't exist in one of my compilers anyway.

    Sure, you design your compiler as you like.

    I don't like open-ended tasks like this where compilation time could end
    up being anything. If you need to keep recompiling the same module, then
    you don't want to repeat that work each time.

    Yes. This may lead to some complexity. Simple approach is to
    avoid obviously useless recompilation ('make' is doing this).
    More complicated approach may keep some intermediate data and
    try to "validate" them first. If previous analysis is valid,
    then it can be reused. If something significant changes, than
    it needs to be re-done. But many changes only have very local
    effect, so at least theoretically re-using analyses could
    save substantial time.

    Concerning open-ended, may attitude is that compiler should make
    effort which is open-ended in the sense that when new method is
    discovered then compiler may be extended and do more work.
    OTOH in "single true compiler" world, compiler may say "this is
    too difficult, giving up". Of course, when trying something
    very hard compiler is likely to run out of memory or user will
    stop it. But compiler may give up earlier. Of course, this
    is unacceptable for a standarized language, when people move
    programs between different compiler. If compiler can legally
    reject a program because of its limitation and is doing this
    with significant probabliity, than portablity between compilers
    is severly limited. But if there is a way to disable extra
    checks, then this may work. This is one of reasones why
    'gcc' has so many options: users that want it can get stronger
    checking, but if they want 'gcc' will accept lousy code
    too.

    I am mainly concerned with clarity and correctness of source code.

    So am I. I try to keep my syntax clean and uncluttered.

    Dummy 'else' doing something may hide errors.

    So can 'unreachable'.

    Dummy 'else' signaling
    error means that something which could be compile time error is
    only detected at runtime.

    Compiler that detects most errors of this sort is IMO better than
    compiler which makes no effort to detect them. And clearly, once
    problem is formulated in sufficiently general way, it becomes
    unsolvable. So I do not expect general solution, but expect
    resonable effort.

    So how would David Brown's example work:

    int F(int n) {
    if (n==1) return 10;
    if (n==2) return 20;
    }

    /You/ know that values -2**31 to 0 and 3 to 2**31-1 are impossible; the compiler doesn't. It's likely to tell you that you may run into the end
    of the function.

    So what do you want the compiler to here? If I try it:

    func F(int n)int =
    if n=1 then return 10 fi
    if n=2 then return 20 fi
    end

    It says 'else needed' (in that last statement). I can also shut it up
    like this:

    func F(int n)int = # int is i64 here
    if n=1 then return 10 fi
    if n=2 then return 20 fi
    0
    end

    Since now that last statement is the '0' value (any int value wil do).
    What should my compiler report instead? What analysis should it be
    doing? What would that save me from typing?

    Currently in typed language that I use literal translation of
    the example hits a hole in checks, that is the code is accepted.

    Concerning needed analyses: one thing needed is representation of
    type, either Pascal range type or enumeration type (the example
    is _very_ unatural because in modern programming magic numbers
    are avoided and there would be some symbolic representation
    adding meaning to the numbers). Second, compiler must recognize
    that this is a "multiway switch" and collect conditions. Once
    you have such representation (which may be desirable for other
    reasons) it is easy to determine set of handled values. More
    precisely, in this example we just have small number of discrete
    values. More ambitious compiler may have list of ranges.
    If type also specifies list of values or list of ranges, then
    it is easy to check if all values of the type are handled.

    normally you do not need very complex analysis:

    I don't want to do any analysis at all! I just want a mechanical
    translation as effortlessly as possible.

    I don't like unbalanced code within a function because it's wrong and
    can cause problems.

    Well, I demand more from compiler than you do...

    Perhaps you're happy for it to be bigger and slower too. Most of my
    projects build more or less instantly. Here 'ms' is a version that runs programs directly from source (the first 'ms' is 'ms.exe' and subsequent ones are 'ms.m' the lead module):

    c:\bx>ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms hello
    Hello World! 21:00:45

    This builds and runs 15 successive generations of itself in memory
    before building and running hello.m; it took 1 second in all. (Now try
    that with gcc!)

    Here:

    c:\cx>tm \bx\mm -runp cc sql
    Compiling cc.m to <pcl>
    Compiling sql.c to sql.exe

    This compiles my C compiler from source but then it /interprets/ the IR produced. This interpreted compiler took 6 seconds to build the 250Kloc
    test file, and it's a very slow interpreter (it's used for testing and debugging).

    (gcc -O0 took a bit longer to build sql.c! About 7 seconds but it is
    using a heftier windows.h.)

    If I run the C compiler from source as native code (\bx\ms cc sql) then building the compiler *and* sql.c takes 1/3 of a second.

    You can't do this stuff with the compilers David Brown uses; I'm
    guessing you can't do it with your prefered ones either.

    To recompile the typed system I use (about 0.4M lines) on new fast
    machine I need about 53s. But that is kind of cheating:
    - this time is for parallel build using 20 logical cores
    - the compiler is not in the language it compiles (but in untyped
    vesion of it)
    - actuall compilation of the compiler is small part of total
    compile time
    On slow machine compile time can be as large as 40 minutes.

    An untyped system that I use has about 0.5M lines and recompiles
    itself in 16s on the same machine. This one uses single core.
    On slow machine compile time may be closer to 2 minutes.
    Again, compiler compile time is only a part of build time.
    Actualy, one time-intensive part is creating index for included
    documentation. Another is C compilation for a library file
    (system has image-processing functions and low-level part of
    image processing is done in C). Recomplation starts from
    minimal version of the system, rebuilding this minimal
    version takes 3.3s.

    Note that in both cases line counts are from 'wc'. Both systems
    contain substantial amount of documentation, I tried to compensate
    for this, but size measured in terms of LOC (that is excluding
    comments, empty lines, non-code files) would be significantly
    smaller.

    Anyway, I do not need cascaded recompilation than you present.
    Both system above have incermental compilation, the second one
    at statement/function level: it offers interactive prompt
    which takes a statement from the user, compiles it and immediately
    executes. Such statement may define a function or perform compilation.
    Even on _very_ slow machine there is no noticable delay due to
    compilation, unless you feed the system with some oversized statement
    or function (presumably from a file).

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Tue Nov 19 16:37:31 2024
    On 10.11.2024 16:13, David Brown wrote:
    [...]

    My preferences are very much weighted towards correctness, not
    efficiency. That includes /knowing/ that things are correct, not just passing some tests. [...]

    I agree with you. But given what you write I'm also sure you know
    what's achievable in theory, what's an avid wish, and what's really
    possible. Yet there's also projects that don't seem to care, where
    speedy delivery is the primary goal. Guaranteeing formal correctness
    had never been an issue in the industry contexts I worked in, and I
    was always glad when I had a good test environment, with a good test
    coverage, and continuous refinement of tests. Informal documentation,
    factual checks of the arguments, and actual tests was what kept the
    quality of our project deliveries at a high level.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Tue Nov 19 17:25:27 2024
    On 16.11.2024 17:38, David Brown wrote:

    I wonder what happened to Stefan. He used to make perfectly good posts.
    Then he disappeared for a bit, and came back with this new "style".

    Given that this "new" Stefan can write posts with interesting C content,
    such as this one, and has retained his ugly coding layout and
    non-standard Usenet format, I have to assume it's still the same person behind the posts.

    Sorry that I cannot resist asking what you consider "non-standard
    Usenet format", given that your posts don't consider line length.
    (Did the "standards" change during the past three decades maybe?
    Do we use only those parts of the "standards" that we like and
    ignore others? Or does it boil down to Netiquette is no standard?)

    Janis, just curious and no offense intended :-)


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 19 19:19:18 2024
    On 19/11/2024 06:37, Janis Papanagnou wrote:
    On 10.11.2024 16:13, David Brown wrote:
    [...]

    My preferences are very much weighted towards correctness, not
    efficiency. That includes /knowing/ that things are correct, not just
    passing some tests. [...]

    I agree with you. But given what you write I'm also sure you know
    what's achievable in theory, what's an avid wish, and what's really
    possible.

    Sure. I've done my fair share of "write-test-debug" cycling for writing
    code - that's almost inevitable when interacting with something else
    (hardware devices, other programs, users, etc.) that are poorly
    specified. At the other end of the scale, you have things such as race conditions, where is no option but to make sure the code is written
    correctly.

    The original context of this discussion was about small self-contained functions, where correctness is very much achievable in practice - /if/
    you understand that it is something worth aiming at.

    Yet there's also projects that don't seem to care, where
    speedy delivery is the primary goal. Guaranteeing formal correctness
    had never been an issue in the industry contexts I worked in, and I
    was always glad when I had a good test environment, with a good test coverage, and continuous refinement of tests. Informal documentation,
    factual checks of the arguments, and actual tests was what kept the
    quality of our project deliveries at a high level.


    There are a great variety of projects, and the development style differs wildly. Ultimately, you want a cost-benefit balance that makes sense
    for what you are doing, and true formal proof methods are only
    cost-effective in very niche circumstances. In my work, I have rarely
    used any kind of formal methods - but I constantly have the principles
    in mind. When I call a function, I can see that the parameters I use
    are valid - and /could/ be proven valid. I know what the outputs of the function are, and how they fit in with the calling code - and I use that
    to know the validity of the next function called. If I can't see such
    things, it's time to re-factor the code to improve clarity.

    Of course testing is important, at many levels. But the time to test
    your code is when you are confident that it is correct - testing is not
    an alternative to writing code that is as clearly correct as you are
    able to make it.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 19 19:30:19 2024
    On 19/11/2024 07:25, Janis Papanagnou wrote:
    On 16.11.2024 17:38, David Brown wrote:

    I wonder what happened to Stefan. He used to make perfectly good posts.
    Then he disappeared for a bit, and came back with this new "style".

    Given that this "new" Stefan can write posts with interesting C content,
    such as this one, and has retained his ugly coding layout and
    non-standard Usenet format, I have to assume it's still the same person
    behind the posts.

    Sorry that I cannot resist asking what you consider "non-standard
    Usenet format", given that your posts don't consider line length.
    (Did the "standards" change during the past three decades maybe?
    Do we use only those parts of the "standards" that we like and
    ignore others? Or does it boil down to Netiquette is no standard?)

    Janis, just curious and no offense intended :-)


    I hadn't even considered taking offence! And if you are right that my
    line length is wrong, I am glad to be told.

    AFAIK, my posts /do/ follow line length standards. You are using
    Thunderbird like me, I believe - select one of my posts and use ctrl-U
    to see the source, and the lines are split appropriately. But depending
    on the details of posts and clients, and the way lines are split
    (manually or automatically), lines are not always displayed with a 72 character width.

    Stefan's posting format has extra indentation for his prose, but
    additional quoted material (such as code) is outdented. Perhaps that
    does not count as "non-standard Usenet format", but it is certainly a formatting style that is highly unusual and characteristic.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Nov 19 22:21:51 2024
    On Tue, 19 Nov 2024 07:25:27 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    On 16.11.2024 17:38, David Brown wrote:

    I wonder what happened to Stefan. He used to make perfectly good
    posts. Then he disappeared for a bit, and came back with this new
    "style".

    Given that this "new" Stefan can write posts with interesting C
    content, such as this one, and has retained his ugly coding layout
    and non-standard Usenet format, I have to assume it's still the
    same person behind the posts.

    Sorry that I cannot resist asking what you consider "non-standard
    Usenet format", given that your posts don't consider line length.
    (Did the "standards" change during the past three decades maybe?
    Do we use only those parts of the "standards" that we like and
    ignore others? Or does it boil down to Netiquette is no standard?)


    It's not that 'X-No-Archive: Yes' and 'Archive: no' headers used by
    Stefan Ram are not standard. They are just very unusual. He also has 'X-No-Archive-Readme' header that indicates that he expects that Usenet
    servers will interpret his headers in a way that no real world
    automatic server software would do. It looks like he expects individual treatment by human being.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Wed Nov 20 00:29:06 2024
    On 19.11.2024 09:19, David Brown wrote:
    [...]

    There are a great variety of projects, [...]

    I don't want the theme to get out of hand, so just one amendment to...

    Of course testing is important, at many levels. But the time to test
    your code is when you are confident that it is correct - testing is not
    an alternative to writing code that is as clearly correct as you are
    able to make it.

    Sound like early days practice, where code is written, "defined" at
    some point as "correct", and then tests written (sometimes written
    by the same folks who implemented the code) to prove that the code
    is doing the expected, or the tests have been spared because it was
    "clear" that the code is "correct" (sort of).

    Since the 1990's we've had other principles, yes, "on many levels"
    (as you started your paragraph). At all levels there's some sort of specification (or description) that defined the expected outcome
    and behavior; tests [of levels higher than unit-tests] are written
    if not in parallel then usually by separate groups. The decoupling
    is important, the "first implement, then test" serializing certainly
    not.

    Of course every responsible programmer tries to create correct code,
    supported by own experience and by projects' regulatory means. But
    that doesn't guarantee correct code. Neither do test guarantee that.
    But tests have been, IME, more effective in supporting correctness
    than being "confident that it is correct" (as you say).

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Wed Nov 20 00:41:51 2024
    On 16.11.2024 16:14, James Kuyper wrote:
    On 11/16/24 04:42, Stefan Ram wrote:
    ...
    [...]

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    That's indeed a nice example. Where you get fooled by treachery
    "trustiness" of formatting[*]. - In syntax we trust! [**]


    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }

    Nice. It did take a little while for me to figure out what was wrong,
    but since I knew that something was wrong, I did eventually find it -
    without first running the program.

    Same here. :-)

    Janis

    [*] Why do I have to think of Python now? - Never mind. Better
    let sleeping dogs lie.

    [**] As far as I am concerned.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 20 02:51:33 2024
    On 19/11/2024 01:53, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 10/11/2024 06:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    I'd would consider a much elaborate one putting the onus on external
    tools, and still having an unpredictable result to be the poor of the two. >>>>
    You want to create a language that is easily compilable, no matter how >>>> complex the input.

    Normally time spent _using_ compiler should be bigger than time
    spending writing compiler. If compiler gets enough use, it
    justifies some complexity.

    That doesn't add up: the more the compiler gets used, the slower it
    should get?!

    More complicated does not mean slower. Binary search or hash tables
    are more complicated than linear search, but for larger data may
    be much faster.

    That's not the complexity I had in mind. The 100-200MB sizes of
    LLVM-based compilers are not because they use hash-tables over linear
    search.

    More generaly, I want to minimize time spent by the programmer,
    that is _sum over all iterations leading to correct program_ of
    compile time and "think time". Compiler that compiles slower,
    but allows less iterations due to better diagnostics may win.
    Also, humans perceive 0.1s delay almost like no delay at all.
    So it does not matter if single compilation step is 0.1s or
    0.1ms. Modern computers can do a lot of work in 0.1s.

    What's the context of this 0.1 seconds? Do you consider it long or short?

    My tools can generally build my apps from scratch in 0.1 seconds; big compilers tend to take a lot longer. Only Tiny C is in that ballpark.

    So I'm failing to see your point here. Maybe you picked up that 0.1
    seconds from an earlier post of mine and are suggesting I ought to be
    able to do a lot more analysis within that time?

    Yes. This may lead to some complexity. Simple approach is to
    avoid obviously useless recompilation ('make' is doing this).
    More complicated approach may keep some intermediate data and
    try to "validate" them first. If previous analysis is valid,
    then it can be reused. If something significant changes, than
    it needs to be re-done. But many changes only have very local
    effect, so at least theoretically re-using analyses could
    save substantial time.

    I consider compilation: turning textual source code into a form that can
    be run, typically binary native code, to be a completely routine task
    that should be as simple and as quick as flicking a light switch.

    While anything else that might be a deep analysis of that program I
    consider to be a quite different task. I'm not saying there is no place
    for it, but I don't agree it should be integrated into every compiler
    and always invoked.

    Since now that last statement is the '0' value (any int value wil do).
    What should my compiler report instead? What analysis should it be
    doing? What would that save me from typing?

    Currently in typed language that I use literal translation of
    the example hits a hole in checks, that is the code is accepted.

    Concerning needed analyses: one thing needed is representation of
    type, either Pascal range type or enumeration type (the example
    is _very_ unatural because in modern programming magic numbers
    are avoided and there would be some symbolic representation
    adding meaning to the numbers). Second, compiler must recognize
    that this is a "multiway switch" and collect conditions.

    The example came from C. Even if written as a switch, C switches do not
    return values (and also are hard to even analyse as to which branch is
    which).

    In my languages, switches can return values, and a switch written as the
    last statement of a function is considered to do so, even if each branch
    uses an explicit 'return'. Then, it will consider a missing ELSE a 'hole'.

    It will not do any analysis of the range other than what is necessary to implement switch (duplicate values, span of values, range-checking when
    using jump tables).

    So the language may require you to supply a dummy 'else x' or 'return
    x'; so what?

    The alternative appears to be one of:

    * Instead of 'else' or 'return', to write 'unreachable', which puts some
    trust, not in the programmer, but some person calling your function
    who does not have sight of the source code, to avoid calling it with
    invalid arguments

    * Or relying on the variable capabilities of a compiler 'A', which might
    sometimes be able to determine that some point is not reached, but
    sometimes it can't. But when you use compiler 'B', it might have a
    different result.

    I'll stick with my scheme, thanks!

    Once
    you have such representation (which may be desirable for other
    reasons) it is easy to determine set of handled values. More
    precisely, in this example we just have small number of discrete
    values. More ambitious compiler may have list of ranges.
    If type also specifies list of values or list of ranges, then
    it is easy to check if all values of the type are handled.

    The types are tyically plain integers, with ranges from 2**8 to 2**64.
    The ranges associated with application needs will be more arbitrary.

    If talking about a language with ranged integer types, then there might
    be more point to it, but that is itself a can of worms. (It's hard to do without getting halfway to implementing Ada.)


    You can't do this stuff with the compilers David Brown uses; I'm
    guessing you can't do it with your prefered ones either.

    To recompile the typed system I use (about 0.4M lines) on new fast
    machine I need about 53s. But that is kind of cheating:
    - this time is for parallel build using 20 logical cores
    - the compiler is not in the language it compiles (but in untyped
    vesion of it)
    - actuall compilation of the compiler is small part of total
    compile time
    On slow machine compile time can be as large as 40 minutes.

    40 minutes for 400K lines? That's 160 lines per second; how old is this machine? Is the compiler written in Python?


    An untyped system that I use has about 0.5M lines and recompiles
    itself in 16s on the same machine. This one uses single core.
    On slow machine compile time may be closer to 2 minutes.

    So 4K to 30Klps.

    Again, compiler compile time is only a part of build time.
    Actualy, one time-intensive part is creating index for included documentation.

    Which is not going to be part of a routine build.

    Another is C compilation for a library file
    (system has image-processing functions and low-level part of
    image processing is done in C). Recomplation starts from
    minimal version of the system, rebuilding this minimal
    version takes 3.3s.

    My language tools work on a whole program, where a 'program' is a single
    EXE or DLL file (or a single OBJ file in some cases).

    A 'build' then turns N source files into 1 binary file. This is the task
    I am talking about.

    A complete application may have several such binaries and a bunch of
    other stuff. Maybe some source code is generated by a script. This part
    is open-ended.

    However each of my current projects is a single, self-contained binary
    by design.

    Anyway, I do not need cascaded recompilation than you present.
    Both system above have incermental compilation, the second one
    at statement/function level: it offers interactive prompt
    which takes a statement from the user, compiles it and immediately
    executes. Such statement may define a function or perform compilation.
    Even on _very_ slow machine there is no noticable delay due to
    compilation, unless you feed the system with some oversized statement
    or function (presumably from a file).

    This sounds like a REPL system. There, each line is a new part of the
    program which is processed, executed and discarded. In that regard, it
    is not really what I am talking about, which is AOT compilation of a
    program represented by a bunch of source files.

    Or can a new line redefine something, perhaps a function definition, previously entered amongst the last 100,000 lines? Can a new line
    require compilation of something typed 50,000 lines ago?

    What happens if you change the type of a global; are you saying that
    none of the program codes needs revising?

    What I do relies purely on raw compilation speed. No tricks are needed.
    No incrementatal compilation is needed (the 'granularity' is a
    'program': a single EXE/DLL file, as mentioned above).

    You can change any single part, either local or global, and the file
    thing is recompiled in an instant.

    However, a 0.5M line project may take a second (unoptimised compiler),
    but it would also generate a 5MB executable, which is quite sizeable.

    Optimising my compiler and choosing to run the interpreter might reduce
    that to half a second (to get to where the app starts to executed). That
    could be done now. Other optimisations could be done while to reduce it further, but ATM they are not needed.

    The only real example I have is an SQLite3 test, a 250Kloc C program
    (but which which has lots of comments and conditional code; preprocessed
    it's 85Kloc).

    My C compiler can run that from source. It takes 0.22 seconds to compile 250Kloc/8MB of source to in-memory native code. Or I can run from source
    via an interpreter, then it takes 1/6th of a second to get from C source
    to IL code:

    c:\cx>cc -runp sql
    Compiling sql.c to 'pcl' # PCL is the name of my IL
    Compile to PCL takes: 157 ms
    SQLite version 3.25.3 2018-11-05 20:37:38
    Enter ".help" for usage hints.
    Connected to a transient in-memory database.
    Use ".open FILENAME" to reopen on a persistent database.
    sqlite> .quit

    Another example, building 40Kloc interpreter from source then running it
    in memory:

    c:\qx>tm \bx\mm -run qq hello
    Compiling qq.m to memory
    Hello, World! 19-Nov-2024 15:38:47
    TM: 0.11

    c:\qx>tm qq hello
    Hello, World! 19-Nov-2024 15:38:49
    TM: 0.05

    The second version runs a precompiled EXE. So building from source added
    only 90ms. Or I can use the interpreter like (so interpreting an
    interpreter) to get an 0.08 second timing.

    No tricks are needed. The only thing that might be a cheat here is using
    OS file-caching. But nearly always, you will be building source files
    that have either just been edited, or will have been compiled a few
    seconds before.

    An untyped system

    What do you mean by an untyped system? To me it usually means
    dynamically typed.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Wed Nov 20 03:11:51 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 19/11/2024 01:53, Waldek Hebisch wrote:

    More complicated does not mean slower. Binary search or hash tables
    are more complicated than linear search, but for larger data may
    be much faster.

    That's not the complexity I had in mind. The 100-200MB sizes of
    LLVM-based compilers are not because they use hash-tables over linear >search.

    You still have this irrational obsession with the amount of disk
    space consumed by a compiler suite - one that is useful to a massive
    number of developers (esp. compared with the user-base of your
    compiler).

    The amount of disk space consumed by a compilation suite is
    a meaningless statistic. 10MByte disks are a relic of the
    distant past.


    My tools can generally build my apps from scratch in 0.1 seconds; big >compilers tend to take a lot longer. Only Tiny C is in that ballpark.

    And Tiny C is useless for the majority of real-world applications.

    How many people are using your compiler to build production applications?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 20 03:43:00 2024
    On 19/11/2024 16:11, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 19/11/2024 01:53, Waldek Hebisch wrote:

    More complicated does not mean slower. Binary search or hash tables
    are more complicated than linear search, but for larger data may
    be much faster.

    That's not the complexity I had in mind. The 100-200MB sizes of
    LLVM-based compilers are not because they use hash-tables over linear
    search.

    You still have this irrational obsession with the amount of disk
    space consumed by a compiler suite - one that is useful to a massive
    number of developers (esp. compared with the user-base of your
    compiler).

    The amount of disk space consumed by a compilation suite is
    a meaningless statistic. 10MByte disks are a relic of the
    distant past.

    Yes is. But what is NOT meaningless is everything else that goes with
    it: vast complexity, and slow compile times, and that's just for the
    apps you build with the tool. Building LLVM itself can be challenging.


    My tools can generally build my apps from scratch in 0.1 seconds; big
    compilers tend to take a lot longer. Only Tiny C is in that ballpark.

    And Tiny C is useless for the majority of real-world applications.

    How many people are using your compiler to build production applications?

    It doesn't matter. It's enough to illustrate that routine compilation
    CAN be done at up to 100 times faster than those big tools and with a
    program that could fit on a floppy. Presumably at a significant power
    saving as well, as that seems to be a big thing these days.

    If a simple implementation has trouble with big applications, then that
    would need to be looked at.

    But I suspect the trouble doesn't lie within the small compiler.
    Probably those big compilers have had to be endlessly tweaked over
    decades to deal myriad small problems, perhaps bugs and corner cases
    within the C language, or need to compile legacy code that is too
    fragile to fix, all sorts of stuff.

    Or, where the compilers were not specially modded, then codebases would
    have headers with conditional blocks that special-case particular
    compilers with tweaks to get around the idiosyncrasies of each.

    Or, the apps depend on C extensions implemented only by a big compiler.

    The end result is that when some upstart comes along with a new,
    streamlined compiler, it will not be able build that codebase.

    But, try creating a NEW real-world application that is primarily
    developed and tested with Tiny C, then you will see two revelations:

    * It *will* build with Tiny C with no problems, unsurprisingly

    * It will also build with any of your big compilers because the code is necessarily conservative.

    Congratulations, you now have a much healthier codebase that works cross-compiler without all those #ifdef blocks.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Wed Nov 20 04:31:06 2024
    On 19/11/2024 14:29, Janis Papanagnou wrote:
    On 19.11.2024 09:19, David Brown wrote:
    [...]

    There are a great variety of projects, [...]

    I don't want the theme to get out of hand, so just one amendment to...

    Of course testing is important, at many levels. But the time to test
    your code is when you are confident that it is correct - testing is not
    an alternative to writing code that is as clearly correct as you are
    able to make it.

    Sound like early days practice, where code is written, "defined" at
    some point as "correct", and then tests written (sometimes written
    by the same folks who implemented the code) to prove that the code
    is doing the expected, or the tests have been spared because it was
    "clear" that the code is "correct" (sort of).

    Since the 1990's we've had other principles, yes, "on many levels"
    (as you started your paragraph). At all levels there's some sort of specification (or description) that defined the expected outcome
    and behavior; tests [of levels higher than unit-tests] are written
    if not in parallel then usually by separate groups. The decoupling
    is important, the "first implement, then test" serializing certainly
    not.

    Of course every responsible programmer tries to create correct code, supported by own experience and by projects' regulatory means. But
    that doesn't guarantee correct code. Neither do test guarantee that.
    But tests have been, IME, more effective in supporting correctness
    than being "confident that it is correct" (as you say).


    Both activities are about reducing the risk of incorrect code getting
    through. In some cases, one of them is more practical or more effective
    than the other, while in other situations you want to combine them.

    My argument has never been against testing, nor have I claimed that programmers can be trusted to write infallible code!

    All I have been arguing against is the idea of blindly putting in
    validity tests for parameters in functions, as though it were a habit
    that by itself leads to fewer bugs in code.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 20 06:11:03 2024
    On 19/11/2024 15:51, Bart wrote:
    On 19/11/2024 01:53, Waldek Hebisch wrote:

    Another example, building 40Kloc interpreter from source then running it
    in memory:

    ÿ c:\qx>tm \bx\mm -run qq hello
    ÿ Compiling qq.m to memory
    ÿ Hello, World! 19-Nov-2024 15:38:47
    ÿ TM: 0.11

    ÿ c:\qx>tm qq hello
    ÿ Hello, World! 19-Nov-2024 15:38:49
    ÿ TM: 0.05

    The second version runs a precompiled EXE. So building from source added only 90ms.

    Sorry, that should be 60ms. Running that interpreter from source only
    takes 1/16th of a second longer not 1/11th of a second.

    BTW I didn't remark on the range of your (WH's) figures. They spanned 40 minutes for a build to instant, but it's not clear for which languages
    they are, which tools are used and which machines. Or how much work they
    have to do to get those faster times, or what work they don't do: I'm guessing it's not processing 0.5M lines for that fastest time.

    So it was hard to formulate a response.

    All my timings are either for C or my systems language, running on one
    core on the same PC.

    For something that you can compare on your own machines, this is a test
    using a one-file version of Lua adapted from https://github.com/edubart/minilua.

    Timings and EXE sizes are:

    Seconds KB

    gcc -O0 -s 3.4 372
    gcc -Os -s 8.5 241
    gcc -O2 -s 11.7 328
    gcc -O3 -s 14.4 378
    tcc 0.9.27 0.14 384
    cc 0.16 315 (My new C compiler)
    cc 0.09 - (Compile to intepretable IL)
    cc 0.11 - (Compile to IL then runnable in-mem code)
    mcc 0.28 355 (My old C compiler uses intermediate ASM)

    Since this is one file (of some tens of 1000s of KB; -E output varies),
    any mod involves recompiling the whole thing.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Mark Bourne@3:633/280.2 to All on Wed Nov 20 07:51:47 2024
    Bart wrote:
    On 10/11/2024 06:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    I'd would consider a much elaborate one putting the onus on external
    tools, and still having an unpredictable result to be the poor of the
    two.

    You want to create a language that is easily compilable, no matter how
    complex the input.

    Normally time spent _using_ compiler should be bigger than time
    spending writing compiler.ÿ If compiler gets enough use, it
    justifies some complexity.

    That doesn't add up: the more the compiler gets used, the slower it
    should get?!

    I may have misunderstood, but I don't think Waldek's comment was a claim
    about how long a single compilation should take / how slow the compiler
    should be made to be. I think it was a statement about the total amount
    of time all users of a compiler can be expected to spend using it in comparison to the time compiler developers spend writing it.

    If a compiler is used by a significant number of people, the total
    amount of time users spend using it is far larger than the total amount
    of time developers spend writing it, regardless of how long a single compilation takes. So overall it's worth the compiler developers
    putting in extra effort to make the compiler more useful, provide better diagnostics, etc. rather than just doing whatever's easiest for them.
    That may only save each user a relatively small amount of time, but
    aggregated over all users of the compiler it adds up to a lot of time saved.

    When a compiler is used by only a small number of people (or even just
    one), it's not worth the compiler developer putting a lot of effort into
    it, when it's only going to save a small number of people a small amount
    of time.

    --
    Mark.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Wed Nov 20 09:40:45 2024
    Bart <bc@freeuk.com> wrote:
    On 19/11/2024 01:53, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 10/11/2024 06:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    I'd would consider a much elaborate one putting the onus on external >>>>> tools, and still having an unpredictable result to be the poor of the two.

    You want to create a language that is easily compilable, no matter how >>>>> complex the input.

    Normally time spent _using_ compiler should be bigger than time
    spending writing compiler. If compiler gets enough use, it
    justifies some complexity.

    That doesn't add up: the more the compiler gets used, the slower it
    should get?!

    More complicated does not mean slower. Binary search or hash tables
    are more complicated than linear search, but for larger data may
    be much faster.

    That's not the complexity I had in mind. The 100-200MB sizes of
    LLVM-based compilers are not because they use hash-tables over linear search.

    It is related: both gcc anf LLVM are doing analyses that in the
    past were deemed inpracticaly expensive (both in time and in space).
    Those analyses work now thanks to smart algorithms that
    significantly reduced resource usage. I know that you consider
    this too expensive. But the point is that there are also things
    which are easy to program and are slow, but acceptable for some
    people. You can speed up such things adding complexity to the
    compiler.

    More generaly, I want to minimize time spent by the programmer,
    that is _sum over all iterations leading to correct program_ of
    compile time and "think time". Compiler that compiles slower,
    but allows less iterations due to better diagnostics may win.
    Also, humans perceive 0.1s delay almost like no delay at all.
    So it does not matter if single compilation step is 0.1s or
    0.1ms. Modern computers can do a lot of work in 0.1s.

    What's the context of this 0.1 seconds? Do you consider it long or short?

    Context is interactive response. It means "pretty fast for interactive
    use".

    My tools can generally build my apps from scratch in 0.1 seconds; big compilers tend to take a lot longer. Only Tiny C is in that ballpark.

    So I'm failing to see your point here. Maybe you picked up that 0.1
    seconds from an earlier post of mine and are suggesting I ought to be
    able to do a lot more analysis within that time?

    This 0.1s is old thing. My point is that if you are compiling simple
    change, than you should be able to do more in this time. In normal developement source file bigger than 10000 lines are relatively
    rare, so once you get in range of 50000-100000 lines per second
    making compiler faster is of marginal utility.

    Yes. This may lead to some complexity. Simple approach is to
    avoid obviously useless recompilation ('make' is doing this).
    More complicated approach may keep some intermediate data and
    try to "validate" them first. If previous analysis is valid,
    then it can be reused. If something significant changes, than
    it needs to be re-done. But many changes only have very local
    effect, so at least theoretically re-using analyses could
    save substantial time.

    I consider compilation: turning textual source code into a form that can
    be run, typically binary native code, to be a completely routine task
    that should be as simple and as quick as flicking a light switch.

    While anything else that might be a deep analysis of that program I
    consider to be a quite different task. I'm not saying there is no place
    for it, but I don't agree it should be integrated into every compiler
    and always invoked.

    We clearly differ in question of what is routine. Creating usable
    executable is rare task, once executable is created it can be used
    for long time. OTOH developement is routine and for this one wants
    to know if a change is correct. Extra analyses and diagonstics
    help here. And since normal developement works in cycles there is
    a lot of possiblity to re-use results between cycles.

    Since now that last statement is the '0' value (any int value wil do).
    What should my compiler report instead? What analysis should it be
    doing? What would that save me from typing?

    Currently in typed language that I use literal translation of
    the example hits a hole in checks, that is the code is accepted.

    Concerning needed analyses: one thing needed is representation of
    type, either Pascal range type or enumeration type (the example
    is _very_ unatural because in modern programming magic numbers
    are avoided and there would be some symbolic representation
    adding meaning to the numbers). Second, compiler must recognize
    that this is a "multiway switch" and collect conditions.

    The example came from C. Even if written as a switch, C switches do not return values (and also are hard to even analyse as to which branch is which).

    In my languages, switches can return values, and a switch written as the last statement of a function is considered to do so, even if each branch uses an explicit 'return'. Then, it will consider a missing ELSE a 'hole'.

    It will not do any analysis of the range other than what is necessary to implement switch (duplicate values, span of values, range-checking when using jump tables).

    So the language may require you to supply a dummy 'else x' or 'return
    x'; so what?

    The alternative appears to be one of:

    * Instead of 'else' or 'return', to write 'unreachable', which puts some
    trust, not in the programmer, but some person calling your function
    who does not have sight of the source code, to avoid calling it with
    invalid arguments

    Already simple thing would be an improvement: make compiler aware of
    error routine (if you do not have it add one) so that when you
    signal error compiler will know that there is no need for normal
    return value.

    Once
    you have such representation (which may be desirable for other
    reasons) it is easy to determine set of handled values. More
    precisely, in this example we just have small number of discrete
    values. More ambitious compiler may have list of ranges.
    If type also specifies list of values or list of ranges, then
    it is easy to check if all values of the type are handled.

    The types are tyically plain integers, with ranges from 2**8 to 2**64.
    The ranges associated with application needs will be more arbitrary.

    If talking about a language with ranged integer types, then there might
    be more point to it, but that is itself a can of worms. (It's hard to do without getting halfway to implementing Ada.)

    C has 'enum'. And a lot of languages treat such types much more
    seriously than C.

    You can't do this stuff with the compilers David Brown uses; I'm
    guessing you can't do it with your prefered ones either.

    To recompile the typed system I use (about 0.4M lines) on new fast
    machine I need about 53s. But that is kind of cheating:
    - this time is for parallel build using 20 logical cores
    - the compiler is not in the language it compiles (but in untyped
    vesion of it)
    - actuall compilation of the compiler is small part of total
    compile time
    On slow machine compile time can be as large as 40 minutes.

    40 minutes for 400K lines? That's 160 lines per second; how old is this machine? Is the compiler written in Python?

    This is simple compiler doing rather complex analyses and time used by
    them may grow exponentialy. Compiler is written in untyped version
    of language it compiles and generates Lisp (so actual machine code
    is generated by Lisp).

    Concerning slowness, few years old Atoms are quite slow.

    An untyped system that I use has about 0.5M lines and recompiles
    itself in 16s on the same machine. This one uses single core.
    On slow machine compile time may be closer to 2 minutes.

    So 4K to 30Klps.

    Closer to 50Klps, as there are other things taking time.

    Again, compiler compile time is only a part of build time.
    Actualy, one time-intensive part is creating index for included
    documentation.

    Which is not going to be part of a routine build.

    In a sense build is not routine. Build is done for two purposes:
    - to install working system from sources, that includes
    documentaion
    - to check that build works properly after changes, this also
    should check documentaion build.

    Normal developement goes without rebuilding the system.

    Another is C compilation for a library file
    (system has image-processing functions and low-level part of
    image processing is done in C). Recomplation starts from
    minimal version of the system, rebuilding this minimal
    version takes 3.3s.

    My language tools work on a whole program, where a 'program' is a single
    EXE or DLL file (or a single OBJ file in some cases).

    A 'build' then turns N source files into 1 binary file. This is the task
    I am talking about.

    I know. But this is not what I do. Build produces mutiple
    artifacts, some of them executable, some are loadable code (but _not_
    in form recogized by operating system), some essentially non-executable
    (like documentation).

    A complete application may have several such binaries and a bunch of
    other stuff. Maybe some source code is generated by a script. This part
    is open-ended.

    However each of my current projects is a single, self-contained binary
    by design.

    Anyway, I do not need cascaded recompilation than you present.
    Both system above have incermental compilation, the second one
    at statement/function level: it offers interactive prompt
    which takes a statement from the user, compiles it and immediately
    executes. Such statement may define a function or perform compilation.
    Even on _very_ slow machine there is no noticable delay due to
    compilation, unless you feed the system with some oversized statement
    or function (presumably from a file).

    This sounds like a REPL system. There, each line is a new part of the program which is processed, executed and discarded.

    First, I am writing about two different systems. Both have REPL.
    Lines typed at REPL are "discarded", but their effect may last
    long time.

    In that regard, it
    is not really what I am talking about, which is AOT compilation of a
    program represented by a bunch of source files.

    Untyped system is intended for "image based developement", you
    compile bunch of routines to memory and dump the result to an
    "image" file. You can load the image file later and use previously
    compiled routines. This system also has second compiler which
    outputs assembler file, and after using assembler you get object
    file. If you insist compilation, assembly and linking can be
    done by a single invocation of the compiler (which calls assembler
    and linker behind the scene). But this is not normal use,
    it is mainly used during system build to build base executable
    which is later extended with extra functionality (like compilers
    for extra languages) in saved images.

    Typed system distingush "library compilation" and "user compilation".
    "Library compilation" is done with module granularity and produces
    loadable module.

    Compilation is really AOT, you need to compile befor use.
    Compiled functions may be replaced by new definitions, but in
    absence of new definition compiled code is used without change.

    Or can a new line redefine something, perhaps a function definition, previously entered amongst the last 100,000 lines? Can a new line
    require compilation of something typed 50,000 lines ago?

    What happens if you change the type of a global; are you saying that
    none of the program codes needs revising?

    In typed system there are no global "library" variables, all data
    is encapsulated in modules and normally accessed in abstract way,
    by calling apropriate functions. So, in "clean" code you
    can recompile a single module and the whole system works.
    There is potential trouble with user variables, if data layout
    (representation) changes, old values will lead to trouble.
    There is potential trouble if you remove exported function.
    All previously compiled modules will assume that such function
    is present and you will get runtime error when other modules
    attempt to call such a function. For efficiency functions
    from "core" modules may be inlined, if you make change to
    of core modules you may need to recompile the whole system.
    Similarly, some modules depend on structure of data in other
    modules, if you change data layout you need to recompile
    everything which depends on it (which as I wrote normally is
    a single module, but may be more). In other words, if you
    change data layout or module interfaces, than you may
    need to recompile several modules. But during normal
    developement this is much less frequent than changes which
    affect only single module.

    As an example, I changed representation of multidimensional arrays,
    that required rebuild the whole system. OTOH most changes
    are either bug fixes or replacing existing routine by a faster
    one or adding new functionality. In those 3 cases there is
    no change in interface seen by non-changed part. There are
    also changes to module interfaces, those affect multiple
    modules, but are less frequent.

    Untyped (or if you prefer dynamicaly typed) system just acts
    on what is in variables, if you put nonsense there you will
    get error or possibly crash.

    An untyped system

    What do you mean by an untyped system? To me it usually means
    dynamically typed.

    Well, "untyped" is shorter and in a sense more relevant for
    compiler. '+' is treated as a function call to a function
    named '+' which performs actual work starting from dispatch
    on type tags. OTOH 'fi_+' assume that it is given (tagged)
    integers and is compiled to inline code which in case when
    one argument is a constant may reduce to one or zero instructions
    (zero instructions means that addition may be done as part
    of address mode of load or store). At even lower level
    there is '_add' which adds two things treating them as
    machine integers.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Wed Nov 20 10:41:34 2024
    Bart <bc@freeuk.com> wrote:
    On 19/11/2024 15:51, Bart wrote:
    On 19/11/2024 01:53, Waldek Hebisch wrote:

    Another example, building 40Kloc interpreter from source then running it
    in memory:

    ÿ c:\qx>tm \bx\mm -run qq hello
    ÿ Compiling qq.m to memory
    ÿ Hello, World! 19-Nov-2024 15:38:47
    ÿ TM: 0.11

    ÿ c:\qx>tm qq hello
    ÿ Hello, World! 19-Nov-2024 15:38:49
    ÿ TM: 0.05

    The second version runs a precompiled EXE. So building from source added
    only 90ms.

    Sorry, that should be 60ms. Running that interpreter from source only
    takes 1/16th of a second longer not 1/11th of a second.

    BTW I didn't remark on the range of your (WH's) figures. They spanned 40 minutes for a build to instant, but it's not clear for which languages
    they are, which tools are used and which machines. Or how much work they
    have to do to get those faster times, or what work they don't do: I'm guessing it's not processing 0.5M lines for that fastest time.

    As I wrote, there are 2 different system, if interesed you can fetch
    them from github. Build time is just running make, one (typed
    system) was

    time make -j 20 > mlogg 2>&1

    so build used up to 20 jobs, output went to a file (I am not sure
    if it was important in this case, but there is 15MB of messages
    and terminal emulator could take some time to print them).
    Of course, this after all dependencies were installed and after
    running 'configure'. Note that parallel build saves substantial
    time, otherwise it probably would be somewhat more than 6 minutes.

    For untyped system it was

    time make > mlogg 2>&1

    Shortest time was

    time make stamp_new_corepop > mlogg3 2>&1

    this rebuild only one crucial binary (that involves about 100K wc
    lines). This is mixed language project, there is runtime support in
    C (hard to say how much as a single file contains functions for
    several OS-es but conditionals choose only one OS), assembler files
    which are macro-processed and passed to assembler. There are
    header files which are included during multiple compilations.

    My point was that with machines available to me and with my
    developement process "full build" time is not a problem.
    With typed system normal thing is to rebuild a single module, and
    for some modules it takes several seconds (most are of order of
    a second). It would be nice to have faster compile time.
    OTOH my "think time" frequently is much longer than this,
    so compiler doing less checking could lead to longer time
    overall.

    So it was hard to formulate a response.

    All my timings are either for C or my systems language, running on one
    core on the same PC.

    I do not think I will use your system language. And for C compiler
    at least currently it does not make big difference to me if your
    compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
    What matters more is support of debugging output, supporting
    targets that I need (like ARM or Risc-V), good diagnostics
    and optimization. I recently installed TinyC on small Risc-V
    machine, I think that available memory (64MB all, about 20MB available
    to user programs) is too small to run gcc or clang.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 20 11:16:50 2024
    On 19/11/2024 22:40, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    It is related: both gcc anf LLVM are doing analyses that in the
    past were deemed inpracticaly expensive (both in time and in space).
    Those analyses work now thanks to smart algorithms that
    significantly reduced resource usage. I know that you consider
    this too expensive.

    How long would LLVM take to compile itself on one core? (Here I'm not
    even sure what LLVM is; it you download the binary, it's about 2.5GB,
    but a typical LLVM compiler might 100+ MB. But I guess it will be while
    in either case.)

    I have product now that is like a mini-LLVM backend. It can build into a standalone library of under 0.2MB, which can directy produce EXEs, or it
    can interpret. Building that product from scratch takes 60ms.

    That is my kind of product

    What's the context of this 0.1 seconds? Do you consider it long or short?

    Context is interactive response. It means "pretty fast for interactive
    use".

    It's less than the time to press and release the Enter key.


    My tools can generally build my apps from scratch in 0.1 seconds; big
    compilers tend to take a lot longer. Only Tiny C is in that ballpark.

    So I'm failing to see your point here. Maybe you picked up that 0.1
    seconds from an earlier post of mine and are suggesting I ought to be
    able to do a lot more analysis within that time?

    This 0.1s is old thing. My point is that if you are compiling simple
    change, than you should be able to do more in this time. In normal developement source file bigger than 10000 lines are relatively
    rare, so once you get in range of 50000-100000 lines per second
    making compiler faster is of marginal utility.

    I *AM* doing more in that time! It just happens to be stuff you appear
    to have no interest in:

    * I write whole-program compilers: you always process all source files
    of an application. The faster the compiler, the bigger the scale of app
    it becomes practical on.

    * That means no headaches with dependencies (it goes in hand with a
    decent module scheme)

    * I can change one tiny corner of a the program, say add an /optional/ argument to a function, which requires compiling all call-sites across
    the program, and the next compilation will take care of everything

    * If I were to do more with optimisation (there is lots that can be done without getting into the heavy stuff), it automatically applies to the
    whole program

    * I can choose to run applications from source code, without generating discrete binary files, just like a script language

    * I can choose (with my new backend) to interpret programs in this
    static language. (Interpretation gives better debugging opportunities)

    * I don't need to faff around with object files or linkers

    Module-based independent compilation and having to link 'object files'
    is stone-age stuff.


    We clearly differ in question of what is routine. Creating usable
    executable is rare task, once executable is created it can be used
    for long time. OTOH developement is routine and for this one wants
    to know if a change is correct.

    I take it then that you have some other way of doing test runs of a
    program without creating an executable?

    It's difficult to tell from your comments.

    Already simple thing would be an improvement: make compiler aware of
    error routine (if you do not have it add one) so that when you
    signal error compiler will know that there is no need for normal
    return value.

    OK, but what does that buy me? Saving a few bytes for a return
    instruction in a function? My largest program, which is 0.4MB, already
    only occupies 0.005% of the machines 8GB.

    Which is not going to be part of a routine build.

    In a sense build is not routine. Build is done for two purposes:
    - to install working system from sources, that includes
    documentaion
    - to check that build works properly after changes, this also
    should check documentaion build.

    Normal developement goes without rebuilding the system.

    We must be talking at cross-purposes then.

    Either you're developing using interpreted code, or you must have some
    means of converting source code to native code, but for some reason you
    don't use 'compile' or 'build' to describe that process.

    Or maybe your REPL/incremental process can run for days doing
    incremental changes without doing a full compile. It seems quite mysterious.

    I might run my compiler hundreds of times a day (at 0.1 seconds a time,
    600 builds would occupy one whole minute in the day!). I often do it for frivolous purposes, such as trying to get some output lined up just
    right. Or just to make sure something has been recompiled since it's so
    quick it's hard to tell.


    I know. But this is not what I do. Build produces mutiple
    artifacts, some of them executable, some are loadable code (but _not_
    in form recogized by operating system), some essentially non-executable
    (like documentation).

    So, 'build' means something different to you. I use 'build' just as a
    change from writing 'compile'.

    This sounds like a REPL system. There, each line is a new part of the
    program which is processed, executed and discarded.

    First, I am writing about two different systems. Both have REPL.
    Lines typed at REPL are "discarded", but their effect may last
    long time.

    My last big app used a compiled core but most user-facing functionality
    was done using an add-on script language. This meant I could develop
    such modules from within a working application, which provided a rich, persistent environment.

    Changes to the core program required a rebuild and a restart.

    However the whole thing was an application, not a language.


    What happens if you change the type of a global; are you saying that
    none of the program codes needs revising?

    In typed system there are no global "library" variables, all data
    is encapsulated in modules and normally accessed in abstract way,
    by calling apropriate functions. So, in "clean" code you
    can recompile a single module and the whole system works.

    I used module-at-time compilation until 10-12 years ago. The module
    scheme had to be upgraded at the same time, but it took several goes to
    get it right.

    Now I wouldn't go back. Who cares about compiling a single module that
    may or may not affect a bunch of others? Just compile the lot!

    If a project's scale becomes too big, then it should be split into
    independent program units, for example a core EXE file and a bunch of
    DLLs; that's the new granularity. Or a lot of functionality can be
    off-loaded to scripts, as I used to do.

    (My scripting language code still needs bytecode compilation, and I also
    use whole-program units there, but the bytecode compiler goes up to 2Mlps.)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 20 12:33:09 2024
    On 19/11/2024 23:41, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    BTW I didn't remark on the range of your (WH's) figures. They spanned 40
    minutes for a build to instant, but it's not clear for which languages
    they are, which tools are used and which machines. Or how much work they
    have to do to get those faster times, or what work they don't do: I'm
    guessing it's not processing 0.5M lines for that fastest time.

    As I wrote, there are 2 different system, if interesed you can fetch
    them from github.

    Do you have a link? Probably I won't attempt to build but I can see what
    it looks like.

    I do not think I will use your system language. And for C compiler
    at least currently it does not make big difference to me if your
    compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
    What matters more is support of debugging output, supporting
    targets that I need (like ARM or Risc-V), good diagnostics
    and optimization.

    It's funny how nobody seems to care about the speed of compilers (which
    can vary by 100:1), but for the generated programs, the 2:1 speedup you
    might get by optimising it is vital!

    Here I might borrow one of your arguments and suggest such a speed-up is
    only necessary on a rare production build.

    I recently installed TinyC on small Risc-V
    machine, I think that available memory (64MB all, about 20MB available
    to user programs) is too small to run gcc or clang.


    Only 20,000KB? My first compilers worked on 64KB systems, not all of
    which was available either.

    None of my recent products will do so now, but they will still fit on a
    floppy disk.

    BTW why don't you use a cross-compiler? That's what David Brown would say.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Dan Purgert@3:633/280.2 to All on Wed Nov 20 23:31:35 2024
    On 2024-11-16, Stefan Ram wrote:
    Dan Purgert <dan@djph.net> wrote or quoted:
    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    I honestly lost the plot ages ago; not sure if it was either!


    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Segfaults? :D


    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }

    oooh, that's way better at making a point of the hazard than mine was.

    .... almost needed to engage my rubber duckie, before I realized I was
    mentally auto-correcting the 'english()' function while reading it.


    --
    |_|O|_|
    |_|_|O| Github: https://github.com/dpurgert
    |O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Thu Nov 21 00:42:14 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 19/11/2024 23:41, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:



    It's funny how nobody seems to care about the speed of compilers (which
    can vary by 100:1), but for the generated programs, the 2:1 speedup you >might get by optimising it is vital!

    I don't consider it funny at all, rather it is simply the way things
    should be. One compiles once. One's customer runs the resulting
    executable perhaps millions of times.


    Here I might borrow one of your arguments and suggest such a speed-up is >only necessary on a rare production build.

    And again, you've clearly never worked with any significantly
    large project. Like for instance an operating system.


    I recently installed TinyC on small Risc-V
    machine, I think that available memory (64MB all, about 20MB available
    to user programs) is too small to run gcc or clang.


    Only 20,000KB? My first compilers worked on 64KB systems, not all of
    which was available either.

    My first compilers worked on 4KW PDP-8. Not that I have any
    interest in _ever_ working in such a constrained environment
    ever again.


    None of my recent products will do so now, but they will still fit on a >floppy disk.

    And, nobody cares.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 00:44:08 2024
    Bart <bc@freeuk.com> wrote:
    On 19/11/2024 23:41, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    BTW I didn't remark on the range of your (WH's) figures. They spanned 40 >>> minutes for a build to instant, but it's not clear for which languages
    they are, which tools are used and which machines. Or how much work they >>> have to do to get those faster times, or what work they don't do: I'm
    guessing it's not processing 0.5M lines for that fastest time.

    As I wrote, there are 2 different system, if interesed you can fetch
    them from github.

    Do you have a link? Probably I won't attempt to build but I can see what
    it looks like.

    I do not think I will use your system language. And for C compiler
    at least currently it does not make big difference to me if your
    compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
    What matters more is support of debugging output, supporting
    targets that I need (like ARM or Risc-V), good diagnostics
    and optimization.

    It's funny how nobody seems to care about the speed of compilers (which
    can vary by 100:1), but for the generated programs, the 2:1 speedup you might get by optimising it is vital!

    Here I might borrow one of your arguments and suggest such a speed-up is only necessary on a rare production build.

    Well, there are some good arguments for using optimizing compulation
    during developement:
    - test what will be deliverd
    - in gcc important diagnostics like info about uninitialized variables
    are available only when you turn on optimization
    - with separate compilation compile time usually is acceptable

    I have some extra factors:
    - C files on which I am doing developement are frequently quite
    small and compile time is reasonable
    - C code is usually in slowly changing base part and is recompiled
    only rarely

    I recently installed TinyC on small Risc-V
    machine, I think that available memory (64MB all, about 20MB available
    to user programs) is too small to run gcc or clang.


    Only 20,000KB? My first compilers worked on 64KB systems, not all of
    which was available either.

    I used compilers on ZX Spectrum, so I know that compiler is possible
    on such a machine. More to the point, gcc-1.42 worked quite well
    in 4MB machine, at that time 20MB would be quite big and could support
    several users doing compilation. But porting gcc-1.42 to Risc-V
    is more work that I am willing to do (at least now, I could do this
    if I get infinite amount of free time).

    None of my recent products will do so now, but they will still fit on a floppy disk.

    BTW why don't you use a cross-compiler? That's what David Brown would say.

    I did use cross-compiler to compile TinyC. Sometimes native compiler
    is more convenient, I have non-C code which is hard to cross-build
    and I need to link this code with C code. In cases like this doing
    everthing natively is simplest thing to do (some folks use emulators,
    but when it works native build is simpler). Second, one reason
    to build natively is to test that native build works. In early
    days of Linux I tried few times to recompile C library, and my
    trials failed. Later I learned that at that time Linux C library
    for i386 was cross-compiled on a Sparc machine. Apparently native
    build was not tested and tended to fail. To be clear: that was long
    ago, AFAIK now C library is build natively and IIRC I recompiled
    it few times (I rarely have reason to do this). Third reason
    to have native compiler is that machines of this class used to
    come with C compiler, it was a shame not to have any C compiler
    there, so I got one...

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Thu Nov 21 01:21:35 2024
    On 20/11/2024 13:42, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 19/11/2024 23:41, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:



    It's funny how nobody seems to care about the speed of compilers (which
    can vary by 100:1), but for the generated programs, the 2:1 speedup you
    might get by optimising it is vital!

    I don't consider it funny at all, rather it is simply the way things
    should be. One compiles once.

    Hmm, someone else who develops software, either without needing to
    compile code in order to test it, or they write a 1M-line app and it
    compiles and runs perfectly first time!

    Sounds like some more gaslighting going on: people develop huge
    applications, using slow, cumbersome compilers where max optimisations
    are permanently enabled, and yet they have instant edit-compile-run
    cycles or they apparently don't need to bother with a compiler at all!

    One's customer runs the resulting
    executable perhaps millions of times.

    Sure. That's when you run a production build. I can even do that myself
    on some programs (the ones where my C transpiler still works) and pass
    it through gcc-O3. Then it might run 30% faster.

    However, each of the 1000s of compilations before that point are pretty
    much instant.


    Here I might borrow one of your arguments and suggest such a speed-up is
    only necessary on a rare production build.

    And again, you've clearly never worked with any significantly
    large project. Like for instance an operating system.

    No. And? That's like telling somebody who likes to devise their own
    bicycles that they've never worked on a really large conveyance, like a
    jumbo jet. Unfortunately a bike as big, heavy, expensive and cumbersome
    as an airliner is not really practical.

    Besides, in the 1980s the tools and apps I did write were probably
    larger than the OS. All I can remember is that the OS provided a file
    system and a text display to allow you to launch the application you
    really wanted.

    The funny is that it is with large projects that edit-compile-run
    turnaround times become more significant. I've heard horror-stories of
    such builds taking minutes or even hours. But everybody here seems to
    have found some magic workaround where compilation times even on -O3
    don't matter at all.


    machine, I think that available memory (64MB all, about 20MB available
    to user programs) is too small to run gcc or clang.


    Only 20,000KB? My first compilers worked on 64KB systems, not all of
    which was available either.

    My first compilers worked on 4KW PDP-8. Not that I have any
    interest in _ever_ working in such a constrained environment
    ever again.

    There could be some lessons to be learned however. Since the amount of
    bloat now around is becoming ridiculous.


    None of my recent products will do so now, but they will still fit on a
    floppy disk.

    And, nobody cares.

    You obviously don't.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 01:38:57 2024
    Bart <bc@freeuk.com> wrote:
    On 19/11/2024 22:40, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    It is related: both gcc anf LLVM are doing analyses that in the
    past were deemed inpracticaly expensive (both in time and in space).
    Those analyses work now thanks to smart algorithms that
    significantly reduced resource usage. I know that you consider
    this too expensive.

    How long would LLVM take to compile itself on one core? (Here I'm not
    even sure what LLVM is; it you download the binary, it's about 2.5GB,
    but a typical LLVM compiler might 100+ MB. But I guess it will be while
    in either case.)

    I do not know, but I would expect some hours. I did compile not
    so recent gcc version, it was 6.5 min clock time, about 70 min
    CPU time. Recent gcc is bigger and LLVM is of comparable size.

    I have product now that is like a mini-LLVM backend. It can build into a standalone library of under 0.2MB, which can directy produce EXEs, or it
    can interpret. Building that product from scratch takes 60ms.

    That is my kind of product

    What's the context of this 0.1 seconds? Do you consider it long or short? >>
    Context is interactive response. It means "pretty fast for interactive
    use".

    It's less than the time to press and release the Enter key.


    My tools can generally build my apps from scratch in 0.1 seconds; big
    compilers tend to take a lot longer. Only Tiny C is in that ballpark.

    So I'm failing to see your point here. Maybe you picked up that 0.1
    seconds from an earlier post of mine and are suggesting I ought to be
    able to do a lot more analysis within that time?

    This 0.1s is old thing. My point is that if you are compiling simple
    change, than you should be able to do more in this time. In normal
    developement source file bigger than 10000 lines are relatively
    rare, so once you get in range of 50000-100000 lines per second
    making compiler faster is of marginal utility.

    I *AM* doing more in that time! It just happens to be stuff you appear
    to have no interest in:

    * I write whole-program compilers: you always process all source files
    of an application. The faster the compiler, the bigger the scale of app
    it becomes practical on.

    * That means no headaches with dependencies (it goes in hand with a
    decent module scheme)

    * I can change one tiny corner of a the program, say add an /optional/ argument to a function, which requires compiling all call-sites across
    the program, and the next compilation will take care of everything

    * If I were to do more with optimisation (there is lots that can be done without getting into the heavy stuff), it automatically applies to the
    whole program

    * I can choose to run applications from source code, without generating discrete binary files, just like a script language

    * I can choose (with my new backend) to interpret programs in this
    static language. (Interpretation gives better debugging opportunities)

    * I don't need to faff around with object files or linkers

    Module-based independent compilation and having to link 'object files'
    is stone-age stuff.

    I am not aware of a computer made from stone (silcon is product of
    quite advanced metalurgy). And while you have aversion to object
    files you wrote that you do independent compilation. Only you
    insist that result of independent compilation must be a DLL.
    How this is different from folks that compile each module to
    a separate DLL?

    We clearly differ in question of what is routine. Creating usable
    executable is rare task, once executable is created it can be used
    for long time. OTOH developement is routine and for this one wants
    to know if a change is correct.

    I take it then that you have some other way of doing test runs of a
    program without creating an executable?

    It's difficult to tell from your comments.

    Already simple thing would be an improvement: make compiler aware of
    error routine (if you do not have it add one) so that when you
    signal error compiler will know that there is no need for normal
    return value.

    OK, but what does that buy me? Saving a few bytes for a return
    instruction in a function? My largest program, which is 0.4MB, already
    only occupies 0.005% of the machines 8GB.

    What it buys is clear expressin of intent, easily checkable by the compiler/runtime. That is when you do not signal error compiler
    will complain. And if you hit such case at runtime due to a bug
    you will have clear info.

    Which is not going to be part of a routine build.

    In a sense build is not routine. Build is done for two purposes:
    - to install working system from sources, that includes
    documentaion
    - to check that build works properly after changes, this also
    should check documentaion build.

    Normal developement goes without rebuilding the system.

    We must be talking at cross-purposes then.

    Either you're developing using interpreted code, or you must have some
    means of converting source code to native code, but for some reason you don't use 'compile' or 'build' to describe that process.

    Or maybe your REPL/incremental process can run for days doing
    incremental changes without doing a full compile.

    Yes.

    It seems quite mysterious.

    There is nothing misterious here. In typed system each module has
    a vector (one dimensional array) called domain vector containg amoung
    other references to called function. All inter-module calls are
    indirect ones, they take thing to call from the domain vector. When
    module starts execution references point to a runtime routine doing
    similar work to dynamic linker. The first call goes to runtime
    support routine which finds needed code and replaces reference in
    the domain vector.

    When a module is recompiled references is domain vectors are
    reinitialized to point to runtimne. So searches are run again
    and if needed pick new routine.

    Note that there is a global table keeping info (including types)
    about all exported routines from all modules. This table is used
    when compileing a module and also by the search process at runtime.

    The effect is that after recompilation of a single module I have
    runnuble executable in memory including code of the new module.
    If you wonder about compiling the same module many times: system
    has garbage collector and unused code is garbage collected.
    So, when old version is replaced by new one the old becomes a
    garbage and will be collected in due time.

    The other system is similar in principle, but there is no need
    for runtime search and domain vectors.

    I might run my compiler hundreds of times a day (at 0.1 seconds a time,
    600 builds would occupy one whole minute in the day!). I often do it for frivolous purposes, such as trying to get some output lined up just
    right. Or just to make sure something has been recompiled since it's so quick it's hard to tell.


    I know. But this is not what I do. Build produces mutiple
    artifacts, some of them executable, some are loadable code (but _not_
    in form recogized by operating system), some essentially non-executable
    (like documentation).

    So, 'build' means something different to you. I use 'build' just as a
    change from writing 'compile'.

    Build means creating new fully-functional system. That involves
    possibly multiple compilations and whatever else is needed.

    This sounds like a REPL system. There, each line is a new part of the
    program which is processed, executed and discarded.

    First, I am writing about two different systems. Both have REPL.
    Lines typed at REPL are "discarded", but their effect may last
    long time.

    My last big app used a compiled core but most user-facing functionality
    was done using an add-on script language. This meant I could develop
    such modules from within a working application, which provided a rich, persistent environment.

    Changes to the core program required a rebuild and a restart.

    However the whole thing was an application, not a language.

    Well, the typed system is an application, which however offers
    extention language and majority of application code is written
    in this language. And this language is compiled, first to Lisp
    and then Lisp to machine code (some Lisp compilers compile to
    bytecode, some compile via C but it is best to use Lisp compiler
    compiling Lisp directly to machine code).

    The second system is four languages + collection of "standard"
    routines. There is significantly more than just compiler
    (for example text editor with capability to send e-mail),
    but languages are at the center.

    What happens if you change the type of a global; are you saying that
    none of the program codes needs revising?

    In typed system there are no global "library" variables, all data
    is encapsulated in modules and normally accessed in abstract way,
    by calling apropriate functions. So, in "clean" code you
    can recompile a single module and the whole system works.

    I used module-at-time compilation until 10-12 years ago. The module
    scheme had to be upgraded at the same time, but it took several goes to
    get it right.

    Now I wouldn't go back. Who cares about compiling a single module that
    may or may not affect a bunch of others? Just compile the lot!

    If a project's scale becomes too big, then it should be split into independent program units, for example a core EXE file and a bunch of
    DLLs; that's the new granularity. Or a lot of functionality can be off-loaded to scripts, as I used to do.

    (My scripting language code still needs bytecode compilation, and I also
    use whole-program units there, but the bytecode compiler goes up to 2Mlps.)

    In both cases spirit is similar to scripting languages. Just
    languages are compiled to machine code and have features supporting
    large scale programming.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 01:49:08 2024
    Bart <bc@freeuk.com> wrote:
    On 19/11/2024 23:41, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    BTW I didn't remark on the range of your (WH's) figures. They spanned 40 >>> minutes for a build to instant, but it's not clear for which languages
    they are, which tools are used and which machines. Or how much work they >>> have to do to get those faster times, or what work they don't do: I'm
    guessing it's not processing 0.5M lines for that fastest time.

    As I wrote, there are 2 different system, if interesed you can fetch
    them from github.

    Do you have a link? Probably I won't attempt to build but I can see what
    it looks like.

    Forgot to put links in another message:

    https://github.com/fricas/fricas

    and

    https://github.com/hebisch/poplog


    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Thu Nov 21 03:00:52 2024
    On 19.11.2024 18:31, David Brown wrote:
    [...]

    All I have been arguing against is the idea of blindly putting in
    validity tests for parameters in functions, as though it were a habit
    that by itself leads to fewer bugs in code.

    Fair enough.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Thu Nov 21 03:15:20 2024
    On 20/11/2024 02:33, Bart wrote:
    On 19/11/2024 23:41, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:


    I do not think I will use your system language.ÿ And for C compiler
    at least currently it does not make big difference to me if your
    compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
    What matters more is support of debugging output, supporting
    targets that I need (like ARM or Risc-V), good diagnostics
    and optimization.

    It's funny how nobody seems to care about the speed of compilers (which
    can vary by 100:1), but for the generated programs, the 2:1 speedup you might get by optimising it is vital!

    To understand this, you need to understand the benefits of a program
    running quickly. Let's look at the main ones:

    1. If it is a run-to-finish program, it will finish faster, and you have
    less time waiting for it. A compiler will fall into this category.

    2. If it is a run-continuously (or run often) program, it will use a
    smaller proportion of the computer's resources, less electricity, less
    heat generated, less fan noise, etc. That covers things like your email client, or your OS - things running all the time.

    3. If it is a dedicated embedded system, faster programs can mean
    smaller, cheaper, and lower power processors or microcontrollers for the
    given task. That applies to the countless embedded systems that
    surround us (far outweighing the number of "normal" computers), and the devices I make.

    4. For some programs, running faster means you can have higher quality
    in a similar time-frame. That applies to things like simulators, static analysers, automatic test coverage setups, and of course games.

    5. For interactive programs, running faster makes them nicer to use.

    There is usually a point where a program is "fast enough" - going faster
    makes no difference. No one is ever going to care if a compilation
    takes 1 second or 0.1 seconds, for example.


    It doesn't take much thought to realise that for most developers, the
    speed of their compiler is not actually a major concern in comparison to
    the speed of other programs. And for everyone other than developers, it
    is of no concern at all.

    While writing code, and testing and debugging it, a given build might
    only be run a few times, and compile speed is a bit more relevant.
    Generally, however, most programs are run far more often, and for far
    longer, than their compilation time. (If not, then you should most
    likely have used a higher level language instead of a compiled low-level language.) So compile time is relatively speaking of much lower
    priority than the speed of the result.

    I think it's clear that everyone prefers faster rather than slower. But generally, people want /better/ rather than just faster. One of the
    factors of "better" for compilers is that the resulting executable runs faster, and that is certainly worth a very significant cost in compile time.


    And as usual, you miss out the fact that toy compilers - like yours, or
    TinyC - miss all the other features developers want from their tools. I
    want debugging information, static error checking, good diagnostics,
    support for modern language versions (that's primarily C++ rather than
    C), useful extensions, compact code, correct code generation, and most importantly of all, support for the target devices I want. I wouldn't
    care if your compiler can run at a billion lines per second and gcc took
    an hour to compile - I still wouldn't be interested in your compiler
    because it does not generate code for the devices I use. Even if it
    did, it would be useless to me, because I can trust the code gcc
    generates and I cannot trust the code your tool generates. And even if
    your tool did everything else I need, and you could convince me that it
    is something a professional could rely on, I'd still use gcc for the
    better quality generated code, because that translates to money saved
    for my customers.


    BTW why don't you use a cross-compiler? That's what David Brown would say.


    That is almost certainly what he normally does. It can still be fun to
    play around with things like TinyC, even if it is of no practical use
    for the real development.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 05:31:53 2024
    David Brown <david.brown@hesbynett.no> wrote:
    On 15/11/2024 19:50, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 11/11/2024 20:09, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:

    Concerning correct place for checks: one could argue that check
    should be close to place where the result of check matters, which
    frequently is in called function.

    No, there I disagree. The correct place for the checks should be close
    to where the error is, and that is in the /calling/ code. If the called >>> function is correctly written, reviewed, tested, documented and
    considered "finished", why would it be appropriate to add extra code to
    that in order to test and debug some completely different part of the code? >>>
    The place where the result of the check /really/ matters, is the calling >>> code. And that is also the place where you can most easily find the
    error, since the error is in the calling code, not the called function.
    And it is most likely to be the code that you are working on at the time >>> - the called function is already written and tested.

    And frequently check requires
    computation that is done by called function as part of normal
    processing, but would be extra code in the caller.


    It is more likely to be the opposite in practice.

    And for much of the time, the called function has no real practical way
    to check the parameters anyway. A function that takes a pointer
    parameter - not an uncommon situation - generally has no way to check
    the validity of the pointer. It can't check that the pointer actually
    points to useful source data or an appropriate place to store data.

    All it can do is check for a null pointer, which is usually a fairly
    useless thing to do (unless the specifications for the function make the >>> pointer optional). After all, on most (but not all) systems you already >>> have a "free" null pointer check - if the caller code has screwed up and >>> passed a null pointer when it should not have done, the program will
    quickly crash when the pointer is used for access. Many compilers
    provide a way to annotate function declarations to say that a pointer
    must not be null, and can then spot at least some such errors at compile >>> time. And of course the calling code will very often be passing the
    address of an object in the call - since that can't be null, a check in
    the function is pointless.

    Well, in a sense pointers are easy: if you do not play nasty tricks
    with casts then type checks do significant part of checking. Of
    course, pointer may be uninitialized (but compiler warnings help a lot
    here), memory may be overwritten, etc. But overwritten memory is
    rather special, if you checked that content of memory is correct,
    but it is overwritten after the check, then earlier check does not
    help. Anyway, main point is ensuring that pointed to data satisfies
    expected conditions.


    That does not match reality. Pointers are far and away the biggest
    source of errors in C code. Use after free, buffer overflows, mixups of
    who "owns" the pointer - the scope for errors is boundless. You are
    correct that type systems can catch many potential types of errors - unfortunately, people /do/ play nasty tricks with type checks.
    Conversions of pointer types are found all over the place in C
    programming, especially conversions back and forth with void* pointers.

    Well, I worked with gcc code. gcc has its own garbages collector,
    so there were no ownership troubles or use after free. There were
    some possibility of buffer overflows, but since most data structures
    that I was using were trees or lists it was limited. gcc did use
    casts, but those were mainly between between pointer to union and
    pointers to variants. Unions had tag (at the same place in all
    variants), there were accessor macros which checked that the tag
    corresponds to expected variant. It certainly took some effort to
    develp the gcc infrastructure, I just benefited from it. Earlier
    versions of gcc did not have garbage collector (and probably also
    did not have checking macros).

    Also, you say that pointers are source of errors. In gcc source
    usualy was bad semantics, that is some function did something
    else than it should. This could manifest as a failed tag check
    (IME most frequent case), segfault or wrong generated code.
    And troblesome cases were the wrong code cases.

    My personal codes were much smaller. In one production case
    all allocated memory was put in a linked list and freed in
    bulk at end of processing. In my embeded code I do not
    use dynamic allocation. In other case C routines are called
    from garbage collected language, so most or all pointers are
    "owned" by garbage collected language and C routines should
    not and can not free them. In still another cases pointer
    usage follows relatively simple design pattern and is not
    a problem.

    You may have more tricky cases than the cases I handle using
    manual memory management and can not (or do not want) use garbage
    collector. I do not know how much checking infrastructure do
    you have. Simply I reported my experience.and how I interpret
    it: I may get a segfault, but segfault itself is a minor
    trouble. In particular many segfaults can be corrected almost
    immediately. Bigger trouble is when actual problem is logic
    error. In non-C coding in garbage-collected language "pointer
    errors" that you mention go away, but logic errors are still
    there.

    All this means that invalid pointer parameters are very much a real
    issue - but are typically impossible to check in the called function.

    In gcc you could get pointer to wrong variant of a union, but called
    function could detect it looking at the tag. One could cast
    a point to completely differnt type, but this would be gross
    error which was rare.

    The way you avoid getting errors in your pointers is being careful about having the right data in the first place, so you only call functions
    with valid parameters. You do this by having careful control about the ownership and lifetime of pointers, and what they point to, keeping conventions in the names of your pointers and functions to indicate who
    owns what, and so on. And you use sanitizers and similar tools during testing and debugging to distinguish between tests that worked by luck,
    and ones that worked reliably. (And of course you may consider other languages than C that help you express your requirements in a clearer
    manner or with better automatic checking.)

    Yes, of course.

    Put the same effort and due diligence into the rest of your code, and suddenly you find your checks for other kinds of parameters in functions
    are irrelevant as you are now making sure you call functions with appropriate valid inputs.

    It depends on the domain (also see below).

    Once you get to more complex data structures, the possibility for the
    caller to check the parameters gets steadily less realistic.

    So now your practice of having functions "always" check their parameters >>> leaves the people writing calling code with a false sense of security -
    usually you /don't/ check the parameters, you only ever do simple checks >>> that that called could (and should!) do if they were realistic. You've
    got the maintenance and cognitive overload of extra source code for your >>> various "asserts" and other check, regardless of any run-time costs
    (which are often irrelevant, but occasionally very important).


    You will note that much of this - for both sides of the argument - uses
    words like "often", "generally" or "frequently". It is important to
    appreciate that programming spans a very wide range of situations, and I >>> don't want to be too categorical about things. I have already said
    there are situations when parameter checking in called functions can
    make sense. I've no doubt that for some people and some types of
    coding, such cases are a lot more common than what I see in my coding.

    Note also that when you can use tools to automate checks, such as
    "sanitize" options in compilers or different languages that have more
    in-built checks, the balance differs. You will generally pay a run-time >>> cost for those checks, but you don't have the same kind of source-level
    costs - your code is still clean, clear, and amenable to correctness
    checking, without hiding the functionality of the code in a mass of
    unnecessary explicit checks. This is particularly good for debugging,
    and the run-time costs might not be important. (But if run-time costs
    are not important, there's a good chance that C is not the best language >>> to be using in the first place.)

    Our experience differs. As a silly example consider a parser
    which produces parse tree. Caller is supposed to pass syntactically
    correct string as an argument. However, checking syntactic corretnetness
    requires almost the same effort as producing parse tree, so it
    ususal that parser both checks correctness and produces the result.

    The trick here is to avoid producing a syntactically invalid string in
    the first place. Solve the issue at the point where there is a mistake
    in the code!

    (If you are talking about a string that comes from outside the code in
    some way, then of course you need to check it - and if that is most conveniently done during the rest of parsing, then that is fair enough.)

    Imagne about 1000 modules containing about 15000 functions. The
    modules for a library and any exportd function (about 7000) is
    potentially user-accessible. Function transform data and do not
    know where their argument came from: user or other library
    function. Processing in principle is quite well defined, so
    one could formulate validity conditions for inputs and outputs.
    But the conditions do no compose in a simple way. More precisly,
    in many cases when given function received correct data and is
    doing right thing, then all functions it calls will receive
    correct arguments. But trouble is, what if the function is
    wrong? Natural answer: write correct code solves nothing.
    Of course, one makes effort to write correct code, but bugs
    still appear. So, there are internal checks. And failing
    check frequently is in called function, because it can
    detect error. Of course, if detecting error in caller
    were easy, the caller would do the check. But frequently
    it is not easy. Look at partially made up example.
    We have a mathematical problem that could be transformed to
    solving linear equations. In general, a system of linear
    equations may have no solution. But one may be able to
    prove that that equations coming from a specific problem
    are always solvable. So we write a routine that transforms
    input into a system of linear equations. Equation solver
    returns information if system is solvable and in case of
    solvable system also description of solutions. Taking
    literaly your advice, we would just access solutions
    (we proved that system is solvable so solutions must be
    there!). But in system I use and develop as written
    one can not "just access solutions" without first checking
    (explicitely or implicitely) return value for possibility
    of no solution. And what happens when there is no solution?
    Implicit check will signal error and if check is explicit
    the only sensible thing to do is also signaling error.
    My point here that there is natural place to put extra
    check. If the check fails you know that there is a bug
    (or possibly the input data was wrong). And if there is
    a bug that is the earliest practical place to discover it.

    BTW: While I did not give complete example, this is frequent
    approach to solving math problem.

    I have computations that are quite different than parsing but
    in some cases share the same characteristic: checking correctness of
    arguments requires complex computation similar to producing
    actual result. More freqently, called routine can check various
    invariants which with high probablity can detect errors. Doing
    the same check in caller is inpractical.

    I think you are misunderstanding me - maybe I have been unclear. I am saying that it is the /caller's/ responsibility to make sure that the parameters it passes are correct, not the /callee's/ responsibility.
    That does not mean that the caller has to add checks to get the
    parameters right - it means the caller has to use correct parameters.

    In this sense I agree. Simply life shows that checks are needed
    and there are frequently natural places to put checks. And frequently
    those natural places are far from origin of the data.

    Think of this like walking near a cliff-edge. Checking parameters
    before the call is like having a barrier at the edge of the cliff. My recommendation is that you know where the cliff edge is, and don't walk there.

    That is easy case. The worst problems are ones where you do not
    know that there is cliff edge. With real cliff edge once you fall
    the trouble will be obvious (either to you or to people who find you).
    In programming you may be getting wrong results and do not know
    this possibly making problem worse. I simply advocate early
    detection of troubles.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Thu Nov 21 07:17:39 2024
    On 20/11/2024 16:15, David Brown wrote:
    On 20/11/2024 02:33, Bart wrote:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1
    speedup you might get by optimising it is vital!

    To understand this, you need to understand the benefits of a program
    running quickly.

    As I said, people are preoccupied with that for programs in general. But
    when it comes to compilers, it doesn't apply! Clearly, you are implying
    that those benefits don't matter when the program is a compiler.

    ÿ Let's look at the main ones:

    <snip>

    OK. I guess you missed the bits here and in another post, where I
    suggested that enabling optimisation is fine for production builds.

    For the routines ones that I do 100s of times a day, where test runs are generally very short, then I don't want to hang about waiting for a
    compiler that is taking 30 times longer than necessary for no good reason.


    There is usually a point where a program is "fast enough" - going faster makes no difference.ÿ No one is ever going to care if a compilation
    takes 1 second or 0.1 seconds, for example.

    If you look at all the interactions people have with technology, with
    GUI apps, even with mechanical things, a 1 second latency is generally disastrous.

    A one-second delay between pressing a key and seeing a character appear
    on a display or any other feedback, would drive most people up to wall.
    But 0.1 is perfectly fine.


    It doesn't take much thought to realise that for most developers, the
    speed of their compiler is not actually a major concern in comparison to
    the speed of other programs.

    Most developers are stuck with what there is. Naturally they will make
    the best of it. Usually by finding 100 ways or 100 reasons to avoid
    running the compiler.

    While writing code, and testing and debugging it, a given build might
    only be run a few times, and compile speed is a bit more relevant. Generally, however, most programs are run far more often, and for far longer, than their compilation time.

    Developing code is the critical bit.

    Even when a test run takes a bit longer as you need to set things up,
    when you do need to change something and run it again, you don't want
    any pointless delay.

    Neither do you want to waste /your/ time pandering to a compiler's
    slowness by writing makefiles and defining dependencies. Or even
    splitting things up into tiny modules. I don't want to care about that
    at all. Here's my bunch of source files, just build the damn thing, and
    do it now!

    And as usual, you miss out the fact that toy compilers - like yours, or TinyC - miss all the other features developers want from their tools.ÿ I want debugging information, static error checking, good diagnostics,
    support for modern language versions (that's primarily C++ rather than
    C), useful extensions, compact code, correct code generation, and most importantly of all, support for the target devices I want.

    Sure. But then I'm sure you're aware that most scripting languages
    include a compilation stage where source code might be translated to
    bytecode.

    I guess you're OK with that being as fast as possible so that there is
    no noticeable delay. But I also guess that all those features go out the window, yet people don't seem to care in that case.

    My whole-program compilers (even my C one now) can run programs from
    source code just a like a scripting language.

    So a fast, mechanical compiler than does little checking is good in one
    case, but not in another (specifically, anything created by Bart).



    ÿ I wouldn't
    care if your compiler can run at a billion lines per second and gcc took
    an hour to compile - I still wouldn't be interested in your compiler
    because it does not generate code for the devices I use.ÿ Even if it
    did, it would be useless to me, because I can trust the code gcc
    generates and I cannot trust the code your tool generates.

    Suppose I had a large C source file, mechanically generated via a
    compiler from another language so that it was fully verified.

    It took a fraction of a second to generate it, all that's needed is a mechanical translation to native code. In that case you can keep your
    compiler that takes one hour to do analyses I don't need; I'll take the million line per second one. (A billion lines is not viable, one million
    is.)


    ÿ And even if
    your tool did everything else I need, and you could convince me that it
    is something a professional could rely on, I'd still use gcc for the
    better quality generated code, because that translates to money saved
    for my customers.

    Where have I said you should use my compiler? I'm simply making a case
    for the existence of very fast, baseline tools that do the minimum
    necessary with as little effort or footprint as necessary.

    Here's an interesting test: I took sql.c (a 250Kloc sqlite3 test
    program), and compiled it first to NASM-compatible assembly, and then to
    my own assembly code.

    I compiled the latter with my assembler and it took 1/6th for a second
    (for some 0.3M lines).

    How long do you think NASM took? It was nearly 8 minutes. Or a blazing
    5 minutes if you used -O0 (do only one pass).

    No doubt you will argue that NASM is superior to my product, although
    I'm not sure how much deep analysis you can do of assembly code. And you
    will castigate me for giving it over-large inputs. However that is the
    task that needs to be done here.

    It clearly has a bug, but if I hadn't mentioned it, I'd like to have
    known how sycophantic you would have been towards that product just to
    be able to belittle mine.

    The NASM bug only starts to become obvious above 20Kloc or so. I wonder
    how many more subtle bugs exist in big products that result in
    significantly slower performance, but are not picked up because people
    like you /don't care/. You will just buy a faster machine or chop your application up into even smaller bits.



    BTW why don't you use a cross-compiler? That's what David Brown would
    say.


    That is almost certainly what he normally does.ÿ It can still be fun to
    play around with things like TinyC, even if it is of no practical use
    for the real development.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Thu Nov 21 10:29:44 2024
    On 20/11/2024 14:38, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Either you're developing using interpreted code, or you must have some
    means of converting source code to native code, but for some reason you
    don't use 'compile' or 'build' to describe that process.

    Or maybe your REPL/incremental process can run for days doing
    incremental changes without doing a full compile.

    Yes.

    It seems quite mysterious.

    There is nothing misterious here. In typed system each module has
    a vector (one dimensional array) called domain vector containg amoung
    other references to called function. All inter-module calls are
    indirect ones, they take thing to call from the domain vector. When
    module starts execution references point to a runtime routine doing
    similar work to dynamic linker. The first call goes to runtime
    support routine which finds needed code and replaces reference in
    the domain vector.

    When a module is recompiled references is domain vectors are
    reinitialized to point to runtimne. So searches are run again
    and if needed pick new routine.

    Note that there is a global table keeping info (including types)
    about all exported routines from all modules. This table is used
    when compileing a module and also by the search process at runtime.

    The effect is that after recompilation of a single module I have
    runnuble executable in memory including code of the new module.
    If you wonder about compiling the same module many times: system
    has garbage collector and unused code is garbage collected.
    So, when old version is replaced by new one the old becomes a
    garbage and will be collected in due time.

    This sounds an intriguing kind of system to implement.

    That is, where program source, code and data structures are kept
    resident, individual functions and variables can be changed, and any
    other functions that might be affected are recompiled, but no others.

    This has some similarities to what I was doing in the 1990s with
    hot-loadable and -modifible scripts. So a lot more dynamic than the
    stuff I do now.

    The problem is that my current applications are simply too small for it
    to be worth the complexity. Most of them build 100% from scratch in
    under 0.1 seconds, especially if working within a resident application
    (my timings include Windows process start/end overheads.)

    If I was routinely working with programs that were 10 times the scale
    (so needing to wait 0.5 to 1 seconds), then it might be something I'd consider. Or I might just buy a faster machine; my current PC was pretty
    much the cheapest in the shop in 2021.

    The other system is similar in principle, but there is no need
    for runtime search and domain vectors.

    I might run my compiler hundreds of times a day (at 0.1 seconds a time,
    600 builds would occupy one whole minute in the day!). I often do it for
    frivolous purposes, such as trying to get some output lined up just
    right. Or just to make sure something has been recompiled since it's so
    quick it's hard to tell.


    I know. But this is not what I do. Build produces mutiple
    artifacts, some of them executable, some are loadable code (but _not_
    in form recogized by operating system), some essentially non-executable
    (like documentation).

    So, 'build' means something different to you. I use 'build' just as a
    change from writing 'compile'.

    Build means creating new fully-functional system. That involves
    possibly multiple compilations and whatever else is needed.

    I would call that something else, perhaps based around 'Make' (nothing
    to do with Linux 'make' tools).

    Here is the result of such a process for one of my 1999 apps:

    G:\m7>dir
    10/03/1999 00:57 45,056 M7.DAT
    17/10/2002 19:22 370,288 M7.EXE
    11/10/2021 21:05 7,432 M7.INI
    17/10/2002 19:27 705,376 M7.PCA
    10/03/1999 00:59 8,541 M7CMD.INI

    The PCA files contains a few dozen scripts (at that time, they were
    compiled to bytecode). This was a distribution layout, created a batch
    file, and ending up a floppy, or ater FTP-ed to a web-site.

    This is not routine building of either than M7.EXE program unit, or
    those scripts which are compiled independently from inside M7.EXE.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Fri Nov 22 00:00:04 2024
    On 20/11/2024 21:17, Bart wrote:
    On 20/11/2024 16:15, David Brown wrote:
    On 20/11/2024 02:33, Bart wrote:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1
    speedup you might get by optimising it is vital!

    To understand this, you need to understand the benefits of a program
    running quickly.

    As I said, people are preoccupied with that for programs in general. But when it comes to compilers, it doesn't apply! Clearly, you are implying
    that those benefits don't matter when the program is a compiler.

    No - you are still stuck with your preconceived ideas, rather than ever bothering reading and thinking.

    As I have said many times, people will always be happier if their
    compiler runs faster - as long as that does not happen at the cost of
    the functionality and features.

    Thus I expect that whoever compiles the gcc binaries that I use
    (occasionally that is myself, but like every other programmer I usually
    use pre-built compilers), uses a good compiler with solid optimisation
    enabled when building the compiler. And I expect that the gcc (and clang/llvm) developers put effort into making their tools fast - but
    that they prioritise correctness first, then features, and only then
    look at the speed of the tools and their memory usage. (And I don't
    expect disk space to be of the remotest concern to them.)



    ÿ Let's look at the main ones:

    <snip>

    OK. I guess you missed the bits here and in another post, where I
    suggested that enabling optimisation is fine for production builds.


    I saw it. But maybe you missed the bit when the discussion was about
    serious software developers. Waldek explained, and I've covered it
    countless times in the past, but since you didn't pay attention then,
    there is little point in repeating it now.

    For the routines ones that I do 100s of times a day, where test runs are generally very short, then I don't want to hang about waiting for a
    compiler that is taking 30 times longer than necessary for no good reason.


    Your development process sounds bad in so many ways it is hard to know
    where to start. I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's, convinced yourself at the
    time that you were better at software development than anyone else, and
    have been stuck in that mode and the same methodology for the last 50
    years without ever considering that you could learn something new from
    other people.


    There is usually a point where a program is "fast enough" - going
    faster makes no difference.ÿ No one is ever going to care if a
    compilation takes 1 second or 0.1 seconds, for example.

    If you look at all the interactions people have with technology, with
    GUI apps, even with mechanical things, a 1 second latency is generally disastrous.

    A one-second delay between pressing a key and seeing a character appear
    on a display or any other feedback, would drive most people up to wall.
    But 0.1 is perfectly fine.


    As I said, no one is ever going to care if a compilation takes 1 second
    or 0.1 seconds.


    It doesn't take much thought to realise that for most developers, the
    speed of their compiler is not actually a major concern in comparison
    to the speed of other programs.

    Most developers are stuck with what there is. Naturally they will make
    the best of it. Usually by finding 100 ways or 100 reasons to avoid
    running the compiler.


    So your advice is that developers should be stuck with what they have -
    the imaginary compilers from your nightmares that take hours to run -
    and that they should make a point of always running them as often as
    possible? And presumably you also advise doing so on a bargain basement single-core computer from at least 15 years ago?

    People who do software development seriously are like anyone else who
    does something seriously - they want the best tools for the job, within budget. And if they are being paid for the task, their employer will
    expect efficiency in return for the budget.

    Which do you think an employer (or amateur programmer) would prefer?

    a) A compiler that runs in 0.1 seconds with little static checking
    b) A compiler that runs in 10 seconds but spots errors saving 6 hours debugging time


    Developers don't want to waste time unnecessarily. Good build tools
    means you get all the benefits of good compilers, without wasting time re-doing the same compilations when nothing has changed.

    I can't understand why you think that's a bad thing - what is the point
    of re-doing a build step when nothing has changed? And a build tool
    file is also the place to hold the details of how to do the build -
    compiler versions, flags, list of sources, varieties of output files, additional pre- or post-processing actions, and so on. I couldn't
    imagine working with anything beyond a "hello, world" without a build tool.

    While writing code, and testing and debugging it, a given build might
    only be run a few times, and compile speed is a bit more relevant.
    Generally, however, most programs are run far more often, and for far
    longer, than their compilation time.

    Developing code is the critical bit.


    Yes.

    I might spend an hour or two writing code (including planing,
    organising, reading references, etc.) and then 5 seconds building it.
    Then there might be anything from a few minutes to a few hours testing
    or debugging. How could that process be improved by a faster compile?
    Even for the most intense code-compile-debug cycles, building rarely
    takes longer than stretching my fingers or taking a mouthful of coffee.

    But using a good compiler saves a substantial amount of developer time
    because I can write better code with a better structure, I can rely on
    the optimisation it does (instead of "hand-optimising" code to get the efficiency I need), and good static checking and good diagnostic
    messages help me fix mistakes before test and debug cycles.

    Even when a test run takes a bit longer as you need to set things up,
    when you do need to change something and run it again, you don't want
    any pointless delay.

    Neither do you want to waste /your/ time pandering to a compiler's
    slowness by writing makefiles and defining dependencies.

    That is not what "make" is for. Speed is a convenient by-product of
    good project management and build tools.

    Or even
    splitting things up into tiny modules.

    Speed is not the reason people write modular, structured code.

    I don't want to care about that
    at all. Here's my bunch of source files, just build the damn thing, and
    do it now!


    You apparently don't want to care about anything much.


    <snip the rest to save time>


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 22 02:20:22 2024
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    For the routines ones that I do 100s of times a day, where test runs
    are generally very short, then I don't want to hang about waiting for
    a compiler that is taking 30 times longer than necessary for no good
    reason.


    Your development process sounds bad in so many ways it is hard to know
    where to start.ÿ I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's,

    1970s builds, especially on mainframes, were dominated by link times.
    You also had to keep on eye on resources (eg. allocated CPU time), as
    they were limited on time-shared systems.

    Above all, you could only do active work from a terminal that you first
    had to book, for one-hour slots.

    I'm surprised you think that my tools and working practices have any connection with the above.

    I've also eliminated linkers; you apparently still use them.

    As I said, no one is ever going to care if a compilation takes 1 second
    or 0.1 seconds.

    And yet, considerable effort IS placed on getting development tools to
    run fast:

    * Presumably, optimisation is applied to a compiler to get it faster
    than otherwise. But why bother if the difference is only a second or so?

    * Tools can now do builds in parallel across multiple cores. Again, why?
    So that 1 second becomes 20 lots of 50ms? Or what that 1 second really
    have been 20 seconds without that feature?

    * People are developing new kinds of linkers (I think there was 'gold',
    and now something else) which are touted as being several times faster
    than traditional

    * All sorts of make and other files are used to define dependency graphs between program modules. Why? Presumably to minimise time spent recompiling.

    * There various JIT compilation schemes where a rough version of an application can get up and running quickly, with 'hot' functions
    compiled and optimised on demand. Again, why?

    If people really don't care about compilation speed, why this vast effort?

    Getting development tools faster is an active field, and everyone
    benefits including you, but when I do it, it's a pointless waste of time?

    As I said, no one is ever going to care if a compilation takes 1 second
    or 0.1 seconds.

    Have you asked? You must use interactive tools like shells; I guess you wouldn't want a pointless one second delay after each command, which you
    KNOW doesn't warrant such a delay.

    That would surely slow you down if used to fluently firing off a rapid sequence of commands.

    The problem is that you don't view use of a compiler as just another interactive command.

    As I said, no one is ever going to care if a compilation takes 1 second
    or 0.1 seconds.


    Here's an actual use-case: I have a transpiler that produces a
    single-file C output of 40K lines. Tiny C can build it in 0.2 seconds.
    gcc -O0 takes 2.2 seconds. However there's no point in using gcc, as the generated code is as poor as Tiny C, so I might as well use that.

    But if I want faster code, gcc -O2 takes 11 seconds.

    For lots of routine builds used for testing, passing the intermediate C through gcc -O2 makes no sense at all. It is just a waste of time,
    destroys my train of thought, and is very frustrating.

    However, if you ran the world, then tools like gcc and its ilk would be
    the only choice!

    So your advice is that developers should be stuck

    I'm saying that most developers don't write their own tools. They will
    use off-the-shelf language implementations. If those happen to be slow,
    then there's little they can do except work within those limitations.

    Or just twiddle their thumbs.




    Which do you think an employer (or amateur programmer) would prefer?

    a) A compiler that runs in 0.1 seconds with little static checking
    b) A compiler that runs in 10 seconds but spots errors saving 6 hours debugging time

    You can have both. You can run a slow compiler that might pick up those errors.

    But sometimes you make a trivial mod (eg. change a prompt); do you
    REALLY need that deep analysis all over again? Do you still it fully optimised?

    If your answer is YES to both then there's little point in further
    discussion.



    I might spend an hour or two writing code (including planing,
    organising, reading references, etc.) and then 5 seconds building it.
    Then there might be anything from a few minutes to a few hours testing
    or debugging.

    Up to a few hours testing and debugging without need to rebuild? That
    last time I had to do that, it was a program written on punched cards
    that was submitted as an overnight job. You could compile it only once a
    day.

    And you're accusing ME of being stuck in the 70s!

    But using a good compiler saves a substantial amount of developer time

    A better language too.



    <snip the rest to save time>

    So you snipped my comments about fast bytecode compilers which do zero analysis being perfectly acceptable for scripting languages.

    And my remark about my language edging towards behaving as a scripting language.

    I can see why you wouldn't want to respond to that.

    BTW I'm doing the same with C; given this program:

    int main(void) {
    int a;
    int* p = 0;
    a = *p;
    }

    Here's what happens with my C compiler when told to interpret it:

    c:\cx>cc -i c
    Compiling c.c to c.(int)
    Error: Null ptr access

    Here's what happens with gcc:

    c:\cx>gcc c.c
    c:\cx>a
    <crashes>

    Is there some option to insert such a check with gcc? I've no idea; most people don't.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Fri Nov 22 02:50:54 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    For the routines ones that I do 100s of times a day, where test runs
    are generally very short, then I don't want to hang about waiting for
    a compiler that is taking 30 times longer than necessary for no good
    reason.


    Your development process sounds bad in so many ways it is hard to know
    where to start.  I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's,

    1970s builds, especially on mainframes, were dominated by link times.

    Which mainframe do you have experience on?

    I spent a decade writing a mainframe operating system (the largest
    application we had to compile regularly) and the link time was a
    minor fraction of the overall build time.

    It was so minor that our build system stored the object files
    so that the OS engineers only needed to recompile the object
    associated with the source file being modified rather than
    the entire OS, they'd share the rest of the object files
    with the entire OS team.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 22 03:05:58 2024
    On 21/11/2024 15:50, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    For the routines ones that I do 100s of times a day, where test runs
    are generally very short, then I don't want to hang about waiting for
    a compiler that is taking 30 times longer than necessary for no good
    reason.


    Your development process sounds bad in so many ways it is hard to know
    where to start.ÿ I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's,

    1970s builds, especially on mainframes, were dominated by link times.

    Which mainframe do you have experience on?

    I spent a decade writing a mainframe operating system (the largest application we had to compile regularly) and the link time was a
    minor fraction of the overall build time.

    It was so minor that our build system stored the object files
    so that the OS engineers only needed to recompile the object
    associated with the source file being modified rather than
    the entire OS, they'd share the rest of the object files
    with the entire OS team.


    The one I remember most was 'TKB' I think it was, running on ICL 4/72
    (360 clone). It took up most of the memory. It was used to link my small Fortran programs.

    But linking always seems to have been big deal in that era, until I had
    to write one for microcomputers, then it was a simple case of loading N
    object files and combining them into one COM file. It was as fast as
    they could be loaded off a floppy.

    (Given that the largest COM might have been a few 10s of KB, and floppy transfer time was some 20KB/s once a sector was located, it wouldn't
    have been long.)

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Fri Nov 22 03:10:38 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 21/11/2024 15:50, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    For the routines ones that I do 100s of times a day, where test runs >>>>> are generally very short, then I don't want to hang about waiting for >>>>> a compiler that is taking 30 times longer than necessary for no good >>>>> reason.


    Your development process sounds bad in so many ways it is hard to know >>>> where to start.  I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's,

    1970s builds, especially on mainframes, were dominated by link times.

    Which mainframe do you have experience on?

    I spent a decade writing a mainframe operating system (the largest
    application we had to compile regularly) and the link time was a
    minor fraction of the overall build time.

    It was so minor that our build system stored the object files
    so that the OS engineers only needed to recompile the object
    associated with the source file being modified rather than
    the entire OS, they'd share the rest of the object files
    with the entire OS team.


    The one I remember most was 'TKB' I think it was, running on ICL 4/72
    (360 clone). It took up most of the memory. It was used to link my small >Fortran programs.

    So you generalize from your one non-standard experience to the entire ecosystem.

    Typical Bart.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 22 04:22:31 2024
    On 21/11/2024 16:10, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 21/11/2024 15:50, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    For the routines ones that I do 100s of times a day, where test runs >>>>>> are generally very short, then I don't want to hang about waiting for >>>>>> a compiler that is taking 30 times longer than necessary for no good >>>>>> reason.


    Your development process sounds bad in so many ways it is hard to know >>>>> where to start.ÿ I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's,

    1970s builds, especially on mainframes, were dominated by link times.

    Which mainframe do you have experience on?

    I spent a decade writing a mainframe operating system (the largest
    application we had to compile regularly) and the link time was a
    minor fraction of the overall build time.

    It was so minor that our build system stored the object files
    so that the OS engineers only needed to recompile the object
    associated with the source file being modified rather than
    the entire OS, they'd share the rest of the object files
    with the entire OS team.


    The one I remember most was 'TKB' I think it was, running on ICL 4/72
    (360 clone). It took up most of the memory. It was used to link my small
    Fortran programs.

    So you generalize from your one non-standard experience to the entire ecosystem.

    Typical Bart.


    Typical Scott. Did you post just to do a bit of bart-bashing?

    Have you also considered that your experience of building operating
    systems might itself be non-standard?

    People quite likely used those machines to develop other applications
    than OSes. Then the dynamics could have been different.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Fri Nov 22 04:55:01 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 21/11/2024 16:10, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 21/11/2024 15:50, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    For the routines ones that I do 100s of times a day, where test runs >>>>>>> are generally very short, then I don't want to hang about waiting for >>>>>>> a compiler that is taking 30 times longer than necessary for no good >>>>>>> reason.


    Your development process sounds bad in so many ways it is hard to know >>>>>> where to start.  I think perhaps the foundation is that you taught >>>>>> yourself a bit of programming in the 1970's,

    1970s builds, especially on mainframes, were dominated by link times. >>>>
    Which mainframe do you have experience on?

    I spent a decade writing a mainframe operating system (the largest
    application we had to compile regularly) and the link time was a
    minor fraction of the overall build time.

    It was so minor that our build system stored the object files
    so that the OS engineers only needed to recompile the object
    associated with the source file being modified rather than
    the entire OS, they'd share the rest of the object files
    with the entire OS team.


    The one I remember most was 'TKB' I think it was, running on ICL 4/72
    (360 clone). It took up most of the memory. It was used to link my small >>> Fortran programs.

    So you generalize from your one non-standard experience to the entire ecosystem.

    Typical Bart.


    Typical Scott. Did you post just to do a bit of bart-bashing?

    Have you also considered that your experience of building operating
    systems might itself be non-standard?

    We had a few thousand customers building code using the same
    compilers and, when needed, linkers.

    The vast majority used COBOL, which seldom required an
    explicit link step.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 22 12:09:03 2024
    On 10/11/2024 06:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    ...or to just always require 'else', with a dummy value if necessary?

    Well, frequently it is easier to do bad job, than a good one.

    I assume that you consider the simple solution the 'bad' one?

    You wrote about _always_ requiring 'else' regardless if it is
    needed or not. Yes, I consider this bad.


    I tried the earlier C example in Rust:

    fn fred(n:i32)->i32 {
    if n==1 {return 10;}
    if n==2 {return 20;}
    }

    I get this error:

    Error(s):
    error[E0317]: if may be missing an else clause
    --> 1022687238/source.rs:5:5
    |
    3 | fn fred(n:i32)->i32 {
    | --- expected `i32` because of this return type
    4 | if n==1 {return 10;}
    5 | if n==2 {return 20;}
    | ^^^^^^^^^^^^^^^^^^^^ expected i32, found ()
    |
    = note: expected type `i32`
    found type `()`
    = note: `if` expressions without `else` evaluate to `()`
    = help: consider adding an `else` block that evaluates to the
    expected type

    error: aborting due to previous error

    So Rust here is behaving exactly the same as my language (mine just says
    'else needed').

    Rust is generally a well-regarded and well-designed language. It also
    has clear and helpful error messages.

    Presumably you would regard this as 'bad' too.

    In this case the behaviour is not the easy solution, as Rust compilers
    are even slower and more complex than big C compilers. It is just a
    language choice.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 22 22:05:02 2024
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    Your development process sounds bad in so many ways it is hard to know
    where to start.ÿ I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's,

    I did a CS degree actually. I also spent a year programming, working for
    the ARC and SRC (UK research councils).

    But since you are being so condescending, I think /your/ problem is in
    having to use C. I briefly mentioned that a 'better language' can help.

    While I don't claim that my language is particularly safe, mine is
    somewhat safer than C in its type system, and far less error prone in
    its syntax and its overall design (for example, a function's details are always defined in exactly one place, so less maintenance and fewer
    things to get wrong).

    So, half the options in your C compilers are to help get around those shortcomings.

    You also seem proud that in this example:

    int F(int n) {
    if (n==1) return 10;
    if (n==2) return 20;
    }

    You can use 'unreachable()', a new C feature, to silence compiler
    messages about running into the end of the function, something I
    considered a complete hack.

    My language requires a valid return value from the last statement. In
    that it's similar to the Rust example I posted 9 hours ago.

    Yet the gaslighting here suggested what I chose to do was completely wrong.

    And presumably you also advise doing so on a bargain basement
    single-core computer from at least 15 years ago?

    Another example of you acknowledging that compilation speed can be a
    problem. So a brute force approach to speed is what counts for you.

    If you found that it took several hours to drive 20 miles from A to B,
    your answer would be to buy a car that goes at 300mph, rather than doing endless detours along the way.

    Or another option is to think about each journey extremely carefully,
    and then only do the trip once a week!



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Fri Nov 22 23:33:29 2024
    Bart <bc@freeuk.com> wrote:

    Sure. That's when you run a production build. I can even do that myself
    on some programs (the ones where my C transpiler still works) and pass
    it through gcc-O3. Then it might run 30% faster.

    On fast machine running Dhrystone 2.2a I get:

    tcc-0.9.28rc 20000000
    gcc-12.2 -O 64184852
    gcc-12.2 -O2 83194672
    clang-14 -O 83194672
    clang-14 -O2 85763288

    so with 02 this is more than 4 times faster. Dhrystone correlated
    resonably with runtime of tight compute-intensive programs.
    Compiler started to cheat on original Dhrystone, so there are
    bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
    to make cheating harder, so I think it is still reasonable
    benchmark. Actually, difference may be much bigger, for example
    in image processing both clang and gcc can use vector intructions,
    with may give additional speedup of order 8-16.

    30% above means that you are much better than tcc or your program
    is badly behaving (I have programs that make intensive use of
    memory, here effect of optimization would be smaller, but still
    of order 2).

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Fri Nov 22 23:51:27 2024
    Bart <bc@freeuk.com> wrote:

    int main(void) {
    int a;
    int* p = 0;
    a = *p;
    }

    Here's what happens with my C compiler when told to interpret it:

    c:\cx>cc -i c
    Compiling c.c to c.(int)
    Error: Null ptr access

    Here's what happens with gcc:

    c:\cx>gcc c.c
    c:\cx>a
    <crashes>

    Is there some option to insert such a check with gcc? I've no idea; most people don't.

    I would do

    gcc -g c.c
    gdb a.out
    run

    and gdb would show me place with bad access. Things like bound
    checking array access or overflow checking makes a big difference.
    Null pointer access is reliably detected by hardware so no big
    deal. Say what you 'cc' will do with the following function:

    int
    foo(int n) {
    int a[10];
    int i;
    int res = 0;
    for(i = 0; i <= 10; i++) {
    a[i] = n + i;
    }
    for(i = 0; i <= 10; i++) {
    res += a[i];
    }
    res;
    }

    Here gcc at compile time says:

    foo.c: In function ‘foo’:
    foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
    15 | res += a[i];
    | ~^~~
    foo.c:14:18: note: within this loop
    14 | for(i = 0; i <= 10; i++) {
    | ~~^~~~~

    Of course, there are also cases like

    void
    bar(int n, int a[n]) {
    int i;
    for(i = 0; i <= n; i++) {
    a[i] = i;
    }
    }

    which are really wrong, but IIUC C standard considers them OK.
    Still, good compiler should have an option to flag them either
    at compile time or at runtime.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 23 01:11:51 2024
    On 22/11/2024 12:51, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    int main(void) {
    int a;
    int* p = 0;
    a = *p;
    }

    Here's what happens with my C compiler when told to interpret it:

    c:\cx>cc -i c
    Compiling c.c to c.(int)
    Error: Null ptr access

    Here's what happens with gcc:

    c:\cx>gcc c.c
    c:\cx>a
    <crashes>

    Is there some option to insert such a check with gcc? I've no idea; most
    people don't.

    I would do

    gcc -g c.c
    gdb a.out
    run

    and gdb would show me place with bad access. Things like bound
    checking array access or overflow checking makes a big difference.
    Null pointer access is reliably detected by hardware so no big
    deal. Say what you 'cc' will do with the following function:

    int
    foo(int n) {
    int a[10];
    int i;
    int res = 0;
    for(i = 0; i <= 10; i++) {
    a[i] = n + i;
    }
    for(i = 0; i <= 10; i++) {
    res += a[i];
    }
    res;
    }

    Here gcc at compile time says:

    foo.c: In function ‘foo’:
    foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
    15 | res += a[i];
    | ~^~~
    foo.c:14:18: note: within this loop
    14 | for(i = 0; i <= 10; i++) {
    | ~~^~~~~

    My 'cc -i' wouldn't detect it. The -i tells it to run an interpreter on
    the intermediate code. Within the interpreter, some things are easily
    checked, but bounds info on arrays doesn't exist. (The IL supports only pointer operations, not high level array ops.)

    That would need intervention at an earlier stage, but even then, the
    design of C makes that difficult. First, because array types like
    int[10] decay to simple pointers, and ones represented by types like
    int* don't have bounds info at all. (I don't support int[n] params and
    few people use them anyway.)

    In my static language, it would be a little easier because an int[10]
    type doesn't decay; the info persists. C's int* would be ref[]int, still unbounded so has the same problem.

    However the language also allows slices, array pointers that include a
    length, so those can be used for bounds checking. But then, it's not
    really needed in that case, since you tend to write loops like this:

    func foo(slice[]int a)int =
    for x in a do # iterate over values
    ....
    for i in a.bounds do # iterate over bounds
    ....

    Apart from that, I have a higher level, interpreted language does do
    full bounds checking, so algorithms can be tested with that then ported
    to the static language, a task made simpler by them using the same
    syntax. I just need to add type annotations.

    Getting back to 'cc -i', if I apply it to the program here, it gives an
    error:

    c:\cx>type c.c
    #include <stdio.h>

    int fred() {}

    int main(void) {
    printf("%d\n", fred());
    }

    c:\cx>cc -i c
    Compiling c.c to c.(int)
    PCL Exec error: RETF/SP mismatch: old=2 curr=1 seq: 7

    If I try it with gcc, then nothing much happens:

    c:\cx>gcc c.c
    c:\cx>a
    1

    If optimised, it shows 0 instead of 1, both meaningless values. It's a
    good thing the function wasn't called 'launchmissile()'.

    Trying it with my language:

    c:\mx>type t.m
    func fred:int =
    end

    proc main =
    println fred()
    end

    c:\mx>mm -i t
    Compiling t.m to t.(int)
    TX Type Error:
    ....
    Void expression/return value missing

    It won't compile it, and without needing to figure out which obscure set
    of options is needed to give a hard error.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sat Nov 23 01:19:05 2024
    On Fri, 22 Nov 2024 12:33:29 -0000 (UTC)
    antispam@fricas.org (Waldek Hebisch) wrote:

    Bart <bc@freeuk.com> wrote:

    Sure. That's when you run a production build. I can even do that
    myself on some programs (the ones where my C transpiler still
    works) and pass it through gcc-O3. Then it might run 30% faster.

    On fast machine running Dhrystone 2.2a I get:

    tcc-0.9.28rc 20000000
    gcc-12.2 -O 64184852
    gcc-12.2 -O2 83194672
    clang-14 -O 83194672
    clang-14 -O2 85763288

    so with 02 this is more than 4 times faster. Dhrystone correlated
    resonably with runtime of tight compute-intensive programs.
    Compiler started to cheat on original Dhrystone, so there are
    bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
    to make cheating harder, so I think it is still reasonable
    benchmark. Actually, difference may be much bigger, for example
    in image processing both clang and gcc can use vector intructions,
    with may give additional speedup of order 8-16.

    30% above means that you are much better than tcc or your program
    is badly behaving (I have programs that make intensive use of
    memory, here effect of optimization would be smaller, but still
    of order 2).


    gcc -O is not what Bart was talking about. It is quite similar to -O1.
    Try gcc -O0.
    With regard to speedup, I had run only one or two benchmarks with tcc
    and my results were close to those of Bart. gcc -O0 very similar to tcc
    in speed of the exe, but compiles several times slower. gcc -O2 exe
    about 2.5 times faster.
    I'd guess, I can construct a case, where gcc successfully vectorized
    some floating-point loop calculation and showed 10x speed up vs tcc on
    modern Zen5 hardware. But that's would not be typical.







    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 23 02:00:51 2024
    On 22/11/2024 12:33, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Sure. That's when you run a production build. I can even do that myself
    on some programs (the ones where my C transpiler still works) and pass
    it through gcc-O3. Then it might run 30% faster.

    On fast machine running Dhrystone 2.2a I get:

    tcc-0.9.28rc 20000000
    gcc-12.2 -O 64184852
    gcc-12.2 -O2 83194672
    clang-14 -O 83194672
    clang-14 -O2 85763288

    so with 02 this is more than 4 times faster. Dhrystone correlated
    resonably with runtime of tight compute-intensive programs.
    Compiler started to cheat on original Dhrystone, so there are
    bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
    to make cheating harder, so I think it is still reasonable
    benchmark. Actually, difference may be much bigger, for example
    in image processing both clang and gcc can use vector intructions,
    with may give additional speedup of order 8-16.

    30% above means that you are much better than tcc or your program
    is badly behaving (I have programs that make intensive use of
    memory, here effect of optimization would be smaller, but still
    of order 2).

    The 30% applies to my typical programs, not benchmarks. Sure, gcc -O3
    can do a lot of aggressive optimisations when everything is contained
    within one short module and most runtime is spent in clear bottlenecks.

    Real apps, like say my compilers, are different. They tend to use
    globals more, program flow is more disseminated. The bottlenecks are
    harder to pin down.

    But, OK, here's the first sizeable benchmark that I thought of (I can't
    find a reliable Dhrystone one; perhaps you can post a link).

    It's called Deltablue.c, copied to db.c below for convenience. I've no
    idea what it does, but the last figure shown is the runtime, so smaller
    is better:

    c:\cx>cc -r db
    Compiling db.c to db.(run)
    DeltaBlue C <S:> 1000x 0.517ms

    c:\cx>tcc -run db.c
    DeltaBlue C <S:> 1000x 0.546ms

    c:\cx>gcc db.c && a
    DeltaBlue C <S:> 1000x 0.502ms

    c:\cx>gcc -O3 db.c && a
    DeltaBlue C <S:> 1000x 0.314ms

    So here gcc is 64% faster than my product. However my 'cc' doesn't yet
    have the register allocator of the older 'mcc' compiler (which simply
    keeps some locals in registers). That gives this result:

    c:\cx>mcc -o3 db && db
    Compiling db.c to db.exe
    DeltaBlue C <S:> 1000x 0.439ms

    So, 40% faster, for a benchmark.

    Now, for a more practical test. First I will create an optimised version
    of my compiler via transpiling to C:

    c:\mx6>mc -opt mm -out:mmgcc
    M6 Compiling mm.m---------- to mmgcc.exe
    W:Invoking C compiler: gcc -m64 -O3 -ommgcc.exe mmgcc.c -s

    Now I run my normal compiler, self-hosted, on a test program 'fann4.m':

    c:\mx6>tm mm \mx\big\fann4 -ext
    Compiling \mx\big\fann4.m to \mx\big\fann4.exe
    TM: 0.99

    Now the gcc-optimised version:

    c:\mx6>tm mmgcc \mx\big\fann4 -ext
    Compiling \mx\big\fann4.m to \mx\big\fann4.exe
    TM: 0.78

    So it's 27% faster. Note that fann4.m is 740Kloc, so this represents compilation speed of just under a million lines per second.

    Some other stats:

    c:\mx6>dir mm.exe mmgcc.exe
    22/11/2024 14:43 393,216 mm.exe
    22/11/2024 14:37 651,776 mmgcc.exe

    So my product has a smaller EXE too. For more typical inputs, the
    differences are narrower:

    c:\mx6>copy mm.m bb.m

    c:\mx6>tm mm bb
    Compiling bb.m to bb.exe
    TM: 0.09

    c:\mx6>tm mmgcc bb -ext
    Compiling bb.m to bb.exe
    TM: 0.08

    gcc-O3 is 12% faster, saving 10ms in compile-time. Curious about how tcc
    would fare? Let's try it:

    c:\mx6>mc -tcc mm -out:mmtcc
    M6 Compiling mm.m---------- to mmtcc.exe
    W:Invoking C compiler: tcc -ommtcc.exe mmtcc.c c:\windows\system32\user32.dll -luser32 c:\windows\system32\kernel32.dll -fdollars-in-identifiers

    c:\mx6>tm mmtcc bb
    Compiling bb.m to bb.exe
    TM: 0.11

    Yeah, a tcc-compiled M compiler would take 0.03 seconds longer to build
    my 35Kloc compiler than a gcc-O3-compiled one; about 37% slower.

    One more point: when gcc builds my compiler, it can use whole-program optimisation because the input is one source file. So that gives it an
    extra edge compared with compiling individual modules.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Nov 23 02:21:55 2024
    On 22/11/2024 15:19, Michael S wrote:
    On Fri, 22 Nov 2024 12:33:29 -0000 (UTC)
    antispam@fricas.org (Waldek Hebisch) wrote:

    Bart <bc@freeuk.com> wrote:

    Sure. That's when you run a production build. I can even do that
    myself on some programs (the ones where my C transpiler still
    works) and pass it through gcc-O3. Then it might run 30% faster.

    On fast machine running Dhrystone 2.2a I get:

    tcc-0.9.28rc 20000000
    gcc-12.2 -O 64184852
    gcc-12.2 -O2 83194672
    clang-14 -O 83194672
    clang-14 -O2 85763288

    so with 02 this is more than 4 times faster. Dhrystone correlated
    resonably with runtime of tight compute-intensive programs.
    Compiler started to cheat on original Dhrystone, so there are
    bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
    to make cheating harder, so I think it is still reasonable
    benchmark. Actually, difference may be much bigger, for example
    in image processing both clang and gcc can use vector intructions,
    with may give additional speedup of order 8-16.

    30% above means that you are much better than tcc or your program
    is badly behaving (I have programs that make intensive use of
    memory, here effect of optimization would be smaller, but still
    of order 2).


    gcc -O is not what Bart was talking about. It is quite similar to -O1.

    "Similar" in this particular case being a synonym for "identical" :-)

    Try gcc -O0.
    With regard to speedup, I had run only one or two benchmarks with tcc
    and my results were close to those of Bart. gcc -O0 very similar to tcc
    in speed of the exe, but compiles several times slower. gcc -O2 exe
    about 2.5 times faster.

    (Note that "gcc -O0" is still a vastly more powerful compiler than tcc
    in many ways.)

    I'd guess, I can construct a case, where gcc successfully vectorized
    some floating-point loop calculation and showed 10x speed up vs tcc on
    modern Zen5 hardware. But that's would not be typical.


    The effect you get from optimisation depends very much on the code in question, the exact compiler flags, and also on the processor you are using.

    Fairly obviously, if your code spends a lot of time in system calls,
    waiting for external events (files, networks, etc.), or calling code in
    other separately compiled libraries, then optimisation of your code will
    make almost no difference. Something that does a lot of calculations
    and data manipulation, on the other hand, can be much faster. Even
    then, however, it depends on what you are doing.

    Beyond simple "-O3" flags, things like "-march=native" and "-ffast-math"
    (if you have floating point calculations, and you are sure this does not affect the correctness of the code!) can make a huge difference by
    allowing more re-arrangements, vector/SIMD processing, using more
    instructions on newer processors, and having a more accurate model of scheduling.

    And the type of processor is also very important. x86 processors are
    tuned to running crappy code, since a lot of the time they are used to
    run old binaries made by old tools, or binaries made by people who don't
    know how to use their tools well. So they have features like extremely
    local data caches to hide the cost of using the stack for local
    variables instead of registers. And often it doesn't matter if you do
    one instruction or a dozen instructions, because you are waiting for
    memory anyway. If you are looking at microcontrollers, on the other
    hand, optimisation can make a huge difference for a lot of real-world code.

    There is also another substantial difference in code efficiency that is
    missed out in these sorts of pretend benchmarks. When efficiency really matters, top-shelf compilers give you features and extensions to help.
    You can use intrinsics, or vector extensions, or pragmas, or attributes,
    or "builtins", to give the compiler more information and work with it to
    give more opportunities for optimisation. Many of these are not
    portable (or of limited portability), and getting top speed from your
    code is not an easy job, but you certainly have possibilities with a
    tool like gcc or clang that you can never have with tcc or other tiny compilers.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Nov 23 03:06:19 2024
    On 22/11/2024 12:05, Bart wrote:
    On 21/11/2024 13:00, David Brown wrote:
    On 20/11/2024 21:17, Bart wrote:

    Your development process sounds bad in so many ways it is hard to know
    where to start.ÿ I think perhaps the foundation is that you taught
    yourself a bit of programming in the 1970's,

    I did a CS degree actually. I also spent a year programming, working for
    the ARC and SRC (UK research councils).

    But since you are being so condescending, I think /your/ problem is in having to use C. I briefly mentioned that a 'better language' can help.


    I use better languages than C, when there are better languages than C
    for the task. And as you regularly point out, I don't program in
    "normal" C, but in a subset of C limited by (amongst many other things)
    a choice of gcc warnings, combined with compiler extensions.

    My programming and thinking is not limited to C. But I believe I have a better general understanding of that language than you do (though there
    are some aspects you no doubt know better than me). I can say that
    because I have read the standards, and make a point of keeping up with
    them. I think about the features of C - I don't simply reject half of
    them because of some weird prejudice (and then complain that the
    language doesn't have the features you want!). I learn what the
    language actually says and how it is defined - I don't alternate between pretending it is all terrible, and pretending it works the way I'd like
    it to work.

    While I don't claim that my language is particularly safe, mine is
    somewhat safer than C in its type system, and far less error prone in
    its syntax and its overall design (for example, a function's details are always defined in exactly one place, so less maintenance and fewer
    things to get wrong).

    So, half the options in your C compilers are to help get around those shortcomings.

    What is your point? Are you trying to say that your language is better
    than C because your language doesn't let you make certain mistakes that
    a few people sometimes make in C? So what? Your language doesn't let
    people make mistakes because no one else uses it. If they did, I am
    confident that it would provide plenty of scope for getting things wrong.

    People can write good quality C with few mistakes. They have the tools available to help them. If they don't make use of the tools, it's their
    fault - not the fault of the language. If they write bad code - as bad programmers do in any language, with any tools - it's their fault.



    You also seem proud that in this example:

    ÿ int F(int n) {
    ÿÿÿÿÿ if (n==1) return 10;
    ÿÿÿÿÿ if (n==2) return 20;
    ÿ }

    You can use 'unreachable()', a new C feature, to silence compiler
    messages about running into the end of the function, something I
    considered a complete hack.

    I don't care what you consider a hack. I appreciate being able to write
    code that is safe, correct, clear, maintainable and efficient. I don't
    really understand why that bothers you. Do you find it difficult to
    write such code in C?


    My language requires a valid return value from the last statement. In
    that it's similar to the Rust example I posted 9 hours ago.

    If you are not able to use a feature such as "unreachable()" safely and correctly, then I suppose it makes sense not to have such a feature in
    your language.

    Personally, I have use of powerful tools. And I like that those
    powerful tools also come with checks and safety features.

    Of course there is a place for different balances between power and
    safety here - there is a reason there are many programming languages,
    and why many programmers use different languages for different tasks. I
    would not expect many C programmers to have much use for "unreachable()".


    Yet the gaslighting here suggested what I chose to do was completely wrong.

    And presumably you also advise doing so on a bargain basement
    single-core computer from at least 15 years ago?

    Another example of you acknowledging that compilation speed can be a problem. So a brute force approach to speed is what counts for you.


    No, trying to use a long-outdated and underpowered computer and then complaining about the speed is a problem.

    But if I felt that compiler speed was a serious hinder to my work, and alternatives did not do as good a job, I'd get a faster computer (within reason). That's the way things work for professionals. (If I felt that expensive commercial compilers did a better job than gcc for my work,
    then I'd buy them - I've tested them and concluded that gcc is the best
    tool for my needs, regardless of price.)

    If you found that it took several hours to drive 20 miles from A to B,
    your answer would be to buy a car that goes at 300mph, rather than doing endless detours along the way.

    Presumably, in your analogy, the detours are useful.


    Or another option is to think about each journey extremely carefully,
    and then only do the trip once a week!


    That sounds a vastly better option, yes.

    Certainly it is better than swapping out the car with an electric
    scooter that can't do these important "detours".


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Nov 23 05:10:50 2024
    On 2024-11-22, Bart <bc@freeuk.com> wrote:
    You also seem proud that in this example:

    int F(int n) {
    if (n==1) return 10;
    if (n==2) return 20;
    }

    You can use 'unreachable()', a new C feature, to silence compiler
    messages about running into the end of the function, something I
    considered a complete hack.

    Unreachable assertions are actually a bad trade if all you are looking
    for is to suppress a diagnostic. Because the behavior is undefined
    if the unreachable is actually reached.

    That's literally the semantic definition! "unreachable()" means,
    roughly, "remove all definition of behavior from this spot in the
    program".

    Whereas falling off the end of an int-returning function only
    becomes undefined if the caller obtains the return value,
    and of course in the case of a void function, it's well-defined.

    You are better off with:

    assert(0 && "should not be reached");
    return 0;

    if asserts are turned off with NDEBUG, the function does something that
    is locally safe, and offers the possibility of avoiding a disaster.

    The only valid reason for using unreachable is optimization: you're
    introducing something unsafe in order to get better machine code. When
    the compiler is informed that the behavior is always undefined when some
    code is reached, it can just delete that code and everything dominated
    by it (reachable only through it).

    The above function does not need a function return sequence to be
    emitted for the fall-through case that is not expected to occur,
    if the situation truly does not occur. Then if it does occur, hell
    will break loose since control will fall through to whatever bytes
    follow the abrupt end of the function.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sat Nov 23 06:29:59 2024
    Bart <bc@freeuk.com> wrote:
    On 22/11/2024 12:33, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    Sure. That's when you run a production build. I can even do that myself
    on some programs (the ones where my C transpiler still works) and pass
    it through gcc-O3. Then it might run 30% faster.

    On fast machine running Dhrystone 2.2a I get:

    tcc-0.9.28rc 20000000
    gcc-12.2 -O 64184852
    gcc-12.2 -O2 83194672
    clang-14 -O 83194672
    clang-14 -O2 85763288

    so with 02 this is more than 4 times faster. Dhrystone correlated
    resonably with runtime of tight compute-intensive programs.
    Compiler started to cheat on original Dhrystone, so there are
    bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
    to make cheating harder, so I think it is still reasonable
    benchmark. Actually, difference may be much bigger, for example
    in image processing both clang and gcc can use vector intructions,
    with may give additional speedup of order 8-16.

    30% above means that you are much better than tcc or your program
    is badly behaving (I have programs that make intensive use of
    memory, here effect of optimization would be smaller, but still
    of order 2).

    The 30% applies to my typical programs, not benchmarks. Sure, gcc -O3
    can do a lot of aggressive optimisations when everything is contained
    within one short module and most runtime is spent in clear bottlenecks.

    Real apps, like say my compilers, are different. They tend to use
    globals more, program flow is more disseminated. The bottlenecks are
    harder to pin down.

    But, OK, here's the first sizeable benchmark that I thought of (I can't
    find a reliable Dhrystone one; perhaps you can post a link).

    First Google hit for Dhrystone 2.2a

    https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c

    (I used this one).

    Compiled in two steps like:

    gcc -c -O -o dry.o dry.c
    gcc -o dry2 -DPASS2 -O dry.c dry.o

    If you want something practical, I have the following C function:

    #include <stdint.h>
    void inner_mul(uint32_t * x, uint32_t * y, uint32_t * z,
    uint32_t xdeg, uint32_t ydeg, uint32_t zdeg, uint32_t p) {
    if (ydeg < xdeg) {
    uint32_t * tmpp = x;
    uint32_t tmp = xdeg;
    x = y;
    xdeg = ydeg;
    y = tmpp;
    ydeg = tmp;
    }
    if (zdeg < xdeg) {
    xdeg = zdeg;
    }
    if (zdeg < ydeg) {
    ydeg = zdeg;
    }
    uint64_t ss;
    long i;
    long j;
    for(i=0; i<=xdeg; i++) {
    ss = z[i];
    for(j=0; j<=i; j++) {
    ss += ((uint64_t)(x[i-j]))*((uint64_t)(y[j]));
    }
    z[i] = ss%p;
    }
    for(i=xdeg+1; i<=ydeg; i++) {
    ss = z[i];
    for(j=0; j<=xdeg; j++) {
    ss += ((uint64_t)(x[j]))*((uint64_t)(y[i-j]));
    }
    z[i] = ss%p;
    }
    for(i=ydeg+1; i<=zdeg; i++) {
    ss = z[i];
    for(j=i-xdeg; j<=ydeg; j++) {
    ss += ((uint64_t)(x[i-j]))*((uint64_t)(y[j]));
    }
    z[i] = ss%p;
    }
    }

    and the following test driver:

    #include <stdio.h>
    #include <stdint.h>
    #include <sys/time.h>

    extern void inner_mul(uint32_t * x, uint32_t * y, uint32_t * z,
    uint32_t xdeg, uint32_t ydeg, uint32_t zdeg, uint32_t p);

    int main(void) {
    uint32_t x[85], y[85], z[169];
    int i;
    for(i=0;i<85;i++) {
    x[i] = 1;
    y[i] = 1;
    }

    struct timeval tv1, tv2;
    gettimeofday(&tv1, 0);
    int j;
    for(j=0; j < 100000; j++) {
    for(i=0;i<169; i++) {
    z[i] = 1;
    }
    inner_mul(x, y, z, 84, 84, 168, 1000003);
    }
    gettimeofday(&tv2, 0);
    for(i=0;i<12; i++) {
    printf(" %u,", z[i]);
    }
    putchar('\n');
    long tt = tv2.tv_sec - tv1.tv_sec;
    tt *= (1000*1000);
    tt += (tv2.tv_usec - tv1.tv_usec);
    printf("Time: %ld us\n", tt);
    return 0;
    }

    At least for gcc and clang put them is separate files to avoid
    simplifing the task too much ('inner_mul' is supposed to work
    with variable data, here we feed it the same thing several times).
    Of course, the test driver is silly, but 'inner_mul' is doing
    important computation and, as long as 'inner_mul' is compiled
    without knowledge of actual parameters, the test should be fair.
    My results are:

    clang -O3 -march=native 126112us
    clang -O3 222136us
    clang -O 225855us
    gcc -O3 -march=native 82809us
    gcc -O3 114365us
    gcc -O 287786us
    tcc 757347us

    There is some irregularity in timings, but this shows that
    factor of order 9 is possible.

    Notes:
    - this code is somewhat hard to vectorize, but clang
    and gcc manage to do this,
    - vectorized code is sensitive to alignment of the data, some
    variation may be due to this,
    - modern processors dynamically change clock frequency, the
    times seem to be high enough to trigger switch to maximal
    frequency (initally I used smaller number of iterations
    but timing were less regular),
    - most of code is portable, but for timing we need timer with
    sufficient resolution, so I use Unix 'gettimeofday'.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 23 10:30:31 2024
    On 22/11/2024 19:29, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 22/11/2024 12:33, Waldek Hebisch wrote:

    But, OK, here's the first sizeable benchmark that I thought of (I can't
    find a reliable Dhrystone one; perhaps you can post a link).

    First Google hit for Dhrystone 2.2a

    https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c

    (I used this one).

    There was no shortage of them, there were just too many. All seemed to
    need some Linux script to compile them, and all needed Linux anyway
    because only that has sys/times.h.

    I eventually find one for Windows, and that goes to the other extreme
    and needs CL (MSVC) with these options:

    cl /O2 /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /MD /W4 /Wp64 /Zi
    /TP /EHsc /Fa /c dhry264.c dhry_264.c

    Plus it uses various ASM routines written MASM syntax. I was partway
    through getting it to work with my compiler, when I saw your post.

    Your version is much simpler to get going, but still not straightforward because of 'gettimeofday', which is available via gcc, but is not
    exported by msvcrt, which is what tcc and my product use.

    I changed it to use clock().

    The results then are like this (I tried two sizes of matrix element):

    uint32_t uint64_t

    gcc -O0 2165 2180 msec
    gcc -O3 282 470

    tcc 2572 2509

    cc 2165 2243
    mcc -opt 720 720

    The mcc product keeps some local variables in registers, a minor
    optimisation I will apply to cc in due course. It's not a priority,
    since usually it makes little difference on real applications. Only on benchmarks like this.

    gcc -O3 seems to enable some SIMD instructions, but only for u32. With
    u64 elements, then gcc -O3 is only about 50% faster than my compiler.

    If I try -march=native, then the 282 sometimes gets down to 235, and the
    470 to 420.

    (When functions like this were needed in my programs during 80s and 90s,
    I used inline assembly. Most code wasn't that critical.)



    - most of code is portable, but for timing we need timer with
    sufficient resolution, so I use Unix 'gettimeofday'.

    Why? Just make the task take long enough.

    BTW I also ported your program to my 'M' language. The timing however
    was about the same as mcc-opt.

    The source is below if interested.

    -------------------------------

    type T=u32

    proc inner_mul(ref[0:]T x, y, z, int xdeg, ydeg, zdeg, p) =
    u64 ss

    if ydeg<xdeg then
    swap(x, y)
    swap(xdeg, ydeg)
    fi

    xdeg min:=zdeg
    ydeg min:=zdeg

    for i in 0..xdeg do
    ss:=z[i]
    for j in 0..i do
    ss +:=u64(x[i-j]) * u64(y[j])
    od
    z[i]:=ss rem p
    od

    for i in xdeg+1..ydeg do
    ss:=z[i]
    for j in 0..xdeg do
    ss +:=u64(x[j]) * u64(y[i-j])
    od
    z[i]:=ss rem p
    od

    for i in ydeg+1..zdeg do
    ss:=z[i]
    for j in i-xdeg .. ydeg do
    ss +:=u64(x[i-j]) * u64(y[j])
    od
    z[i]:=ss rem p
    od

    end

    proc main=
    [0:85]T x, y, z
    int tv1, tv2

    for i in x.bounds do
    x[i]:=y[i]:=1
    od

    tv1:=clock()

    to 100'000 do
    for i in 0..168 do
    z[i]:=1
    od
    inner_mul(&x,&y,&z, 84, 84, 168, 1'000'003)
    od

    tv2:=clock()
    for i in 0..11 do
    print z[i], $
    od
    println

    println "Time:",tv2-tv1,"ms"
    end






    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Nov 24 00:28:14 2024
    On 22/11/2024 19:10, Kaz Kylheku wrote:
    On 2024-11-22, Bart <bc@freeuk.com> wrote:
    You also seem proud that in this example:

    int F(int n) {
    if (n==1) return 10;
    if (n==2) return 20;
    }

    You can use 'unreachable()', a new C feature, to silence compiler
    messages about running into the end of the function, something I
    considered a complete hack.

    Unreachable assertions are actually a bad trade if all you are looking
    for is to suppress a diagnostic. Because the behavior is undefined
    if the unreachable is actually reached.


    You should only use "unreachable()" in places where it is /never/
    actually reached - thus it is perfectly safe if you use it correctly.
    (I'm not sure of any features of any language that are safe to use /incorrectly/.)

    That's literally the semantic definition! "unreachable()" means,
    roughly, "remove all definition of behavior from this spot in the
    program".

    Yes. So that's fine, as long as execution never reaches it. That's the
    whole point - you are telling the compiler that this thing cannot
    happen. Compilers optimise all the time on the basis of what they know
    can and cannot happen - this just lets the programmer specify it.


    Whereas falling off the end of an int-returning function only
    becomes undefined if the caller obtains the return value,
    and of course in the case of a void function, it's well-defined.

    All true - but so what?


    You are better off with:

    assert(0 && "should not be reached");
    return 0;

    if asserts are turned off with NDEBUG, the function does something that
    is locally safe, and offers the possibility of avoiding a disaster.

    Asserts - or other temporary checks resulting in stopping the program
    with a useful message - can be very helpful in debugging. If you are
    not entirely sure that code execution can never reach a particular
    point, then either don't use "unreachable()" there, or if you prefer,
    put a conditional check there. "asset" is not magic - you can do the
    same thing yourself :

    #ifdef CHECK_UNREACHABLES
    #define Unreachable() \
    do { \
    printf("Unreachable hit on line %i of file %s\r\n", \
    __LINE__, __FILE__); \
    exit(1); \
    } while (0)
    #else
    #define Unreachable() unreachable()
    #endif


    Adjust it to suit your taste.


    In released code, hitting a false assertion is a bug in your code that
    should never happen. Hitting an "unreachable()" is a bug in your code
    that should never happen. Either way, you've screwed up. And unless
    you have good reason to believe that the user will actually give you all
    the critical information you need to duplicate the situation and find
    the bug, the assert is no better than the unreachable(). It is,
    however, less efficient and it means you are adding extra code ("return
    0;") that is of no use, and is in no way testable.

    So to me, unreachable() is better than an assert that is never
    triggered. And an assert that /could/ be triggered is not something I
    would ever want in released code of the kind of program I write
    (embedded systems).


    The only valid reason for using unreachable is optimization: you're introducing something unsafe in order to get better machine code. When
    the compiler is informed that the behavior is always undefined when some
    code is reached, it can just delete that code and everything dominated
    by it (reachable only through it).


    "unreachable()" is not unsafe unless you are using it incorrectly. /Everything/ is unsafe if you are using it incorrectly.

    The above function does not need a function return sequence to be
    emitted for the fall-through case that is not expected to occur,
    if the situation truly does not occur. Then if it does occur, hell
    will break loose since control will fall through to whatever bytes
    follow the abrupt end of the function.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sun Nov 24 01:17:36 2024
    On 22/11/2024 19:29, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    clang -O3 -march=native 126112us
    clang -O3 222136us
    clang -O 225855us
    gcc -O3 -march=native 82809us
    gcc -O3 114365us
    gcc -O 287786us
    tcc 757347us

    You've omitted -O0 for gcc and clang. That timing probably won't be too
    far from tcc, but compilation time for larger programs will be
    significantly longer (eg. 10 times or more).

    The trade-off then is not worth it unless you are running gcc for other reasons (eg. for deeper analysis, or to compile less portable code that
    has only been tested on or written for gcc/clang; or just an irrational
    hatred of simple tools).


    There is some irregularity in timings, but this shows that
    factor of order 9 is possible.

    That's an extreme case, for one small program with one obvious
    bottleneck where it spends 99% of its time, and with little use of
    memory either.

    For simply written programs, the difference is more like 2:1. For more complicated C code that makes much use of macros that can expand to lots
    of nested function calls, it might be 4:1, since it might rely on
    optimisation to inline some of those calls.

    Again, that would be code written to take advantage of specific compilers.

    But that is still computationally intensive code working on small
    amounts of memory.

    I have a text editor written in my scripting language. I can translate
    its interpreter to C and compile with both gcc-O3 and tcc.

    Then, yes, you will notice twice as much latency with the tcc
    interpreter compared with gcc-O3, when doing things like
    deleting/inserting lines at the beginning of a 1000000-line text file.

    But typically, the text files will be 1000 times smaller; you will
    notice no difference at all.

    I'm not saying no optimisation is needed, ever, I'm saying that the NEED
    for optimisation is far smaller than most people seem to think.

    Here are some timings for that interpreter, when used to run a script to compute fib(38) the long way:

    Interp Built with Timing

    qc tcc 9.0 secs (qc is C transpiled version)
    qq mm 5.0 (-fn; qq is original M version)

    qc gcc-O3 4.0
    qq mm 1.2 (-asm)

    (My interpreter doesn't bother with faster switch-based or computed-goto
    based dispatchers. The choice is between a slower function-table-based
    one, and an accelerated threaded-code version using inline ASM.

    These are selected with -fn/-asm options. The -asm version is not JIT;
    it is still interpreting a bytecode at a time).

    So the fastest version here doesn't use compiler optimisation, and it's
    3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
    slower than gcc-O3.

    That is for this test, but that's also one that is popular for language benchmarks.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 03:45:47 2024
    Bart <bc@freeuk.com> wrote:
    On 22/11/2024 19:29, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 22/11/2024 12:33, Waldek Hebisch wrote:

    But, OK, here's the first sizeable benchmark that I thought of (I can't
    find a reliable Dhrystone one; perhaps you can post a link).

    First Google hit for Dhrystone 2.2a

    https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c >>
    (I used this one).

    There was no shortage of them, there were just too many. All seemed to
    need some Linux script to compile them, and all needed Linux anyway
    because only that has sys/times.h.

    I eventually find one for Windows, and that goes to the other extreme
    and needs CL (MSVC) with these options:

    cl /O2 /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /MD /W4 /Wp64 /Zi
    /TP /EHsc /Fa /c dhry264.c dhry_264.c

    Plus it uses various ASM routines written MASM syntax. I was partway
    through getting it to work with my compiler, when I saw your post.

    Your version is much simpler to get going, but still not straightforward because of 'gettimeofday', which is available via gcc, but is not
    exported by msvcrt, which is what tcc and my product use.

    I changed it to use clock().

    The results then are like this (I tried two sizes of matrix element):

    uint32_t uint64_t

    gcc -O0 2165 2180 msec
    gcc -O3 282 470

    tcc 2572 2509

    cc 2165 2243
    mcc -opt 720 720

    The mcc product keeps some local variables in registers, a minor optimisation I will apply to cc in due course. It's not a priority,
    since usually it makes little difference on real applications. Only on benchmarks like this.

    gcc -O3 seems to enable some SIMD instructions, but only for u32. With
    u64 elements, then gcc -O3 is only about 50% faster than my compiler.

    If I try -march=native, then the 282 sometimes gets down to 235, and the
    470 to 420.

    (When functions like this were needed in my programs during 80s and 90s,
    I used inline assembly. Most code wasn't that critical.)

    FYI, ATM is have a version compiling via Lisp, with bounds checking
    on it takes 0.58s, with bounds checking off it takes 0.43s
    on my machine. The reason to look at C version is to do better.
    Taken together, your and my timing indicate that your 'cc' will
    give me less speed than going via Lisp. 'mcc -opt' pobably would
    give an impovement, but not compared to 'gcc'. BTW, below times
    on a slower machine (5 years old cheap laptop):

    gcc -O3 -march=native 1722910us
    gcc -O3 1720884us
    gcc -O 1642328us
    tcc 7661992us

    via Lisp, checking 5.29s
    via Lisp, no checking 4.27s

    With -O3 gcc vectorizes inner loops, but apparently on this machine
    it backfires and execution time is longer than without vectorization.

    In both cases 'tcc' gives slower code than going via Lisp with
    array bounds checking on, so ATM using 'tcc' for this application
    is rather unattractive.

    I may end up using inline assembly, but this is a mess: code for
    fast machine will not run on older ones, on some machines
    non-vectorized code is faster. So I would need mutiple versions
    of assembler just to cover x86_64. And I have other targets.
    And this is just one of critical routines. I have probably about
    10 such critical routines now and it may grow to about 50.
    To get good speed I am experimeting with various variants.
    So going assembler way I could be forced to write several
    thousends of lines of optimized assembler (most of that to
    throw out, but before writing them I would not know which
    ones are the best). That would be much more work than just
    passing various options to 'gcc' and 'clang' and measuring
    execution time.

    - most of code is portable, but for timing we need timer with
    sufficient resolution, so I use Unix 'gettimeofday'.

    Why? Just make the task take long enough.

    Well, Windows 'clock' looks OK, but some old style timing routines
    have really low resolution and would lead to excessive run
    time (I need to run rather large number of tests).

    BTW I also ported your program to my 'M' language. The timing however
    was about the same as mcc-opt.

    The source is below if interested.

    AFAICS you have assign-op combinations like 'min:='. ATM I am
    undecided about similar operations. I mean, in a language which
    like C applies operator only to base types they give some gain.
    But I want operators working on large variety of types, and then
    it is not clear how to define them.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sun Nov 24 09:36:14 2024
    On 23/11/2024 16:45, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    FYI, ATM is have a version compiling via Lisp, with bounds checking
    on it takes 0.58s, with bounds checking off it takes 0.43s
    on my machine. The reason to look at C version is to do better.
    Taken together, your and my timing indicate that your 'cc' will
    give me less speed than going via Lisp. 'mcc -opt' pobably would
    give an impovement, but not compared to 'gcc'. BTW, below times
    on a slower machine (5 years old cheap laptop):

    gcc -O3 -march=native 1722910us
    gcc -O3 1720884us
    gcc -O 1642328us
    tcc 7661992us

    via Lisp, checking 5.29s
    via Lisp, no checking 4.27s

    With -O3 gcc vectorizes inner loops, but apparently on this machine
    it backfires and execution time is longer than without vectorization.

    In both cases 'tcc' gives slower code than going via Lisp with
    array bounds checking on, so ATM using 'tcc' for this application
    is rather unattractive.

    Lisp is a rather mysterious language which can apparently be and do
    anything: it can be interpreted or compiled. Statically typed or
    dynamic. Imperative or functional.

    It can also apparently be implemented in a few dozen lines or code.

    Forth has similar claims.

    So Lisp being as fast or faster than C is not surprising!



    I may end up using inline assembly, but this is a mess: code for
    fast machine will not run on older ones, on some machines
    non-vectorized code is faster. So I would need mutiple versions
    of assembler just to cover x86_64. And I have other targets.
    And this is just one of critical routines. I have probably about
    10 such critical routines now and it may grow to about 50.
    To get good speed I am experimeting with various variants.
    So going assembler way I could be forced to write several
    thousends of lines of optimized assembler (most of that to
    throw out, but before writing them I would not know which
    ones are the best). That would be much more work than just
    passing various options to 'gcc' and 'clang' and measuring
    execution time.

    Using assembly to get speed is not as easy as it used to be. Most such attempts seem to generate slower code. Only for certain apps such as interpreters, but there you are dealing with a bigger picture than one particular bottleneck.


    - most of code is portable, but for timing we need timer with
    sufficient resolution, so I use Unix 'gettimeofday'.

    Why? Just make the task take long enough.

    Well, Windows 'clock' looks OK, but some old style timing routines
    have really low resolution and would lead to excessive run
    time (I need to run rather large number of tests).

    I've tried all sorts, from Windows' high performance routines, down to
    x64's RDTSC instruction. They all gave unreliable, variable results. Now
    I just use 'clock', but might turn off all other apps for extra conistency.


    BTW I also ported your program to my 'M' language. The timing however
    was about the same as mcc-opt.

    The source is below if interested.

    AFAICS you have assign-op combinations like 'min:='. ATM I am
    undecided about similar operations. I mean, in a language which
    like C applies operator only to base types they give some gain.
    But I want operators working on large variety of types, and then
    it is not clear how to define them.


    A assignment that in C syntax might be written as:

    x op= y;

    would be the equivalent of this, when x has type T:

    T* p;
    p = &x;
    *p = op(*p, (T)y);

    If 'op' is not defined for operands of T, then it just won't work.
    (Arithmetic ops won't work on most usertypes, but the language still
    allows x += y.)

    However the IL I use directly supports min/max including augmented
    assignment (in-place) versions.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 11:24:30 2024
    Bart <bc@freeuk.com> wrote:
    On 22/11/2024 19:29, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    clang -O3 -march=native 126112us
    clang -O3 222136us
    clang -O 225855us
    gcc -O3 -march=native 82809us
    gcc -O3 114365us
    gcc -O 287786us
    tcc 757347us

    You've omitted -O0 for gcc and clang. That timing probably won't be too
    far from tcc, but compilation time for larger programs will be
    significantly longer (eg. 10 times or more).

    The trade-off then is not worth it unless you are running gcc for other reasons (eg. for deeper analysis, or to compile less portable code that
    has only been tested on or written for gcc/clang; or just an irrational hatred of simple tools).

    I have tried to use 'tcc' for one on the project that I mentioned
    before. It appears to work, real time for build is essentially
    the same (actually some fraction of second longer, but that is
    withing measurement noise), CPU time _may_ be shorter by 1.6%.
    This confirms my earlier estimates that for that project C
    compile time has very small impact on overall compile time
    (most compilations are not C compilation). In this project
    I use '-O' which is likely to give better runtime speed
    (I do not bother with '-O2' or '-O3'). Also, I use '-O' for
    better diagnostics.

    I a second project, '-O2' is used for image processing library,
    this takes significant time to compile, but this library is
    performance critical code.

    There is some irregularity in timings, but this shows that
    factor of order 9 is possible.

    That's an extreme case, for one small program with one obvious
    bottleneck where it spends 99% of its time, and with little use of
    memory either.

    For simply written programs, the difference is more like 2:1. For more complicated C code that makes much use of macros that can expand to lots
    of nested function calls, it might be 4:1, since it might rely on optimisation to inline some of those calls.

    Again, that would be code written to take advantage of specific compilers.

    But that is still computationally intensive code working on small
    amounts of memory.

    I have a text editor written in my scripting language. I can translate
    its interpreter to C and compile with both gcc-O3 and tcc.

    Then, yes, you will notice twice as much latency with the tcc
    interpreter compared with gcc-O3, when doing things like
    deleting/inserting lines at the beginning of a 1000000-line text file.

    But typically, the text files will be 1000 times smaller; you will
    notice no difference at all.

    I'm not saying no optimisation is needed, ever, I'm saying that the NEED
    for optimisation is far smaller than most people seem to think.

    There is also question of disc space. 'tcc' compiled by itself is
    404733 bytes (code + data) (0.024s compile time), by gcc (default) is
    340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
    time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
    is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
    time). As you can see gcc -Os can save quite a bit of disc space
    for still moderate compile time.

    And of course, there is a question why program with runtime that
    does not matter is written in a low level language? Experience
    shows that using higher level language is easier, and higher
    level language compiled to bytecode can give significantly smaler
    code than gcc -Os from low-level code. Several programs for
    early micros used bytecode because this was the only way to
    fit the program into available memory.

    Here are some timings for that interpreter, when used to run a script to compute fib(38) the long way:

    Interp Built with Timing

    qc tcc 9.0 secs (qc is C transpiled version)
    qq mm 5.0 (-fn; qq is original M version)

    qc gcc-O3 4.0
    qq mm 1.2 (-asm)

    (My interpreter doesn't bother with faster switch-based or computed-goto based dispatchers. The choice is between a slower function-table-based
    one, and an accelerated threaded-code version using inline ASM.

    These are selected with -fn/-asm options. The -asm version is not JIT;
    it is still interpreting a bytecode at a time).

    So the fastest version here doesn't use compiler optimisation, and it's
    3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
    slower than gcc-O3.

    Well, most folks would "not bother" with inline ASM and instead use
    fastest wersion that C can give. Which likely would involve
    gcc -O2 or gcc -O3.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sun Nov 24 12:36:44 2024
    On 24/11/2024 00:24, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    I'm not saying no optimisation is needed, ever, I'm saying that the NEED
    for optimisation is far smaller than most people seem to think.

    There is also question of disc space. 'tcc' compiled by itself is
    404733 bytes (code + data) (0.024s compile time), by gcc (default) is
    340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
    time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
    is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
    time). As you can see gcc -Os can save quite a bit of disc space
    for still moderate compile time.


    I thought David Brown said that disk space is irrelevant? Anyway this is
    the exact copy of what I tried just now, compiling a 5-line hello.c
    program. I hadn't used these compilers since earlier today:

    c:\c>tm gcc hello.c
    TM: 5.80

    c:\c>tm tcc hello.c
    TM: 0.19

    c:\c>tm gcc hello.c
    TM: 0.19

    c:\c>tm tcc hello.c
    TM: 0.03

    From cold, gcc took nearly 6 seconds (if you've been used to instant
    feedback all day, it can feel like an age). tcc took 0.2 seconds.

    Doing it a second time, now gcc takes 0.2 seconds, and tcc takes 0.03
    seconds! (It can't get much faster on Windows.)

    gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.
    As for sizes:

    c:\c>dir hello.exe
    24/11/2024 00:44 2,048 hello.exe

    c:\c>dir a.exe
    24/11/2024 00:44 91,635 a.exe (48K with -s)

    (At least that's one good thing of gcc writing out that weird a.exe each
    time; I can compare both exes!)

    As for mine (however it's possible I used it more recently):

    c:\c>tm cc hello
    Compiling hello.c to hello.exe
    TM: 0.04

    c:\c>dir hello.exe
    24/11/2024 00:52 2,560 hello.exe

    My installation is 0.3MB (excluding windows.h which is 0.6MB). Being self-contained, I can trivally apply UPX compression to get a 0.1MB
    compiler, which can be easily copied to a memory stick or bundled in one
    of my apps. However compiling hello.c now takes 0.05 seconds.

    (I don't use UPX because my apps are already tiny; it's just to marvel
    at how much redundancy they still contain, and how much tinier they
    could be.)

    I know none of this will cut any ice; for various reasons you don't want
    to use tcc.

    One of them being that your build process involves N slow stages so
    speeding up just one makes little difference.

    This however is very similar to my argument about optimisation; a
    running app consists of lots of parts which take up execution time, not
    all of which can be speeded up by a factor of 9. The net benefit will be
    a lot less, just like your reduced build time.

    And of course, there is a question why program with runtime that
    does not matter is written in a low level language?

    I mean it doesn't matter if it's half the speed. It might matter if it
    was 40 times slower.

    There's quite a gulf between even unoptimised native code and even a
    fast dynamic language interpreter.

    People seem to think that the only choices are the fastest possible C
    code at one end, and slow CPython at the other:

    gcc/O3-tcc-----------------------------------------------------CPython

    On this scale, gcc/O3 code and tcc code are practically the same!

    So the fastest version here doesn't use compiler optimisation, and it's
    3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
    slower than gcc-O3.

    Well, most folks would "not bother" with inline ASM and instead use
    fastest wersion that C can give. Which likely would involve
    gcc -O2 or gcc -O3.

    But in this case, it works by giving my a product that, even using a non-optimising compiler, makes an application faster than using gcc-O3.





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sun Nov 24 12:45:34 2024
    On 24/11/2024 01:36, Bart wrote:
    On 24/11/2024 00:24, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    And of course, there is a question why program with runtime that
    does not matter is written in a low level language?

    I mean it doesn't matter if it's half the speed. It might matter if it
    was 40 times slower.

    There's quite a gulf between even unoptimised native code and even a
    fast dynamic language interpreter.

    People seem to think that the only choices are the fastest possible C
    code at one end, and slow CPython at the other:

    ÿ gcc/O3-tcc-----------------------------------------------------CPython

    On this scale, gcc/O3 code and tcc code are practically the same!

    (I wasn't able to post results earlier because CPython hadn't finished.
    But for a JPEG decoder test on an 85Mpixel image, all using the same algorithm:

    gcc-O3 2.2 seconds
    mm6-opt 3.3 seconds (My older compiler with the register optim.)
    mm7 5.7 seconds (My unoptimising new one)
    cc 6.0 seconds (Unoptimising)
    tcc 8.1 seconds
    PyPy 43 seconds (Uses JIT to optimise hot loops to native code)
    CPython 386 seconds)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 16:03:17 2024
    Bart <bc@freeuk.com> wrote:
    On 24/11/2024 00:24, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    I'm not saying no optimisation is needed, ever, I'm saying that the NEED >>> for optimisation is far smaller than most people seem to think.

    There is also question of disc space. 'tcc' compiled by itself is
    404733 bytes (code + data) (0.024s compile time), by gcc (default) is
    340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
    time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
    is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
    time). As you can see gcc -Os can save quite a bit of disc space
    for still moderate compile time.


    I thought David Brown said that disk space is irrelevant?

    I am not David Brown.

    Anyway this is
    the exact copy of what I tried just now, compiling a 5-line hello.c
    program. I hadn't used these compilers since earlier today:

    c:\c>tm gcc hello.c
    TM: 5.80

    c:\c>tm tcc hello.c
    TM: 0.19

    c:\c>tm gcc hello.c
    TM: 0.19

    c:\c>tm tcc hello.c
    TM: 0.03

    From cold, gcc took nearly 6 seconds (if you've been used to instant feedback all day, it can feel like an age). tcc took 0.2 seconds.

    Doing it a second time, now gcc takes 0.2 seconds, and tcc takes 0.03 seconds! (It can't get much faster on Windows.)

    gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.

    Yes, but exact size depends which version you install and how you
    install it. I installed version 6.5 and removed debugging info from executables. The result is 177MB, large but significantly smaller
    than what you have. Debian package for gcc-12.2 is something like
    144MB (+ about 8MB libraries which are usable for other purpose but
    mainly for gcc), but it only gives C compiler. To that one should
    add 'libc6-dev' (about 12MB) which is needed to create useful
    programs. C++ adds 36MB, Fortran 35MB, Ada 94MB so my installation
    is something like 330MB. Note: my 177MB reuses probably about 50MB
    from system installation and includes C and C++. Also, in both cases
    I do not count libc which is about 13MB (but needed by almost
    anything in the system), shell kernel, etc.

    On Windows some space savings trick do not work and traditionally
    program ship their own libraries, so size may be bigger.

    For me it is problematic that each gcc language and each extra
    target adds a lot of space. I have extra targets (not counted in
    size above) and together this is closer to 1G. In this aspect
    LLVM is somewhat better, it gives me more targets that I have
    intalled for gcc for total "cost" of something like 210MB (plus
    about 50MB shared with gcc).

    As for sizes:

    c:\c>dir hello.exe
    24/11/2024 00:44 2,048 hello.exe

    c:\c>dir a.exe
    24/11/2024 00:44 91,635 a.exe (48K with -s)

    (At least that's one good thing of gcc writing out that weird a.exe each time; I can compare both exes!)

    AFAICS this is one-time Windows overhead + default layout rules for
    the linker. On Linux I get 15952 bytes by defauls, 14472 after
    striping. However, the actual code + data size is 1904 and even
    in this most is crap needed to support extra features of C library.

    In other words, this is mostly irrelevant, as people who want to
    get size down can link it with different options to get smaller
    size down. Actual hello world code size is 99 bytes when compiled
    by gcc (default options) and 64 bytes by tcc. Again, gcc add things
    like exception handling which increase size for tiny files, but
    do not add much in a bigger file.

    I did
    hebisch@komp:~/kompi$ gcc -c hell2.c
    hebisch@komp:~/kompi$ tcc -o hell2.gcc hell2.o
    hebisch@komp:~/kompi$ tcc -c hell2.c
    hebisch@komp:~/kompi$ tcc -o hell2.tcc hell2.o
    hebisch@komp:~/kompi$ ls -l hell2.gcc hell2.tcc
    -rwxr-xr-x 1 hebisch hebisch 3680 Nov 24 04:21 hell2.gcc
    -rwxr-xr-x 1 hebisch hebisch 3560 Nov 24 04:21 hell2.tcc

    As you can see, when using tcc as a linker there is small size
    difference due to extra exception handling code put there by gcc.
    This size difference will vanish in the noise when there is
    bigger real code. And when you are really determined, linker
    tricks can completely remove the exception handling code (AFAICS
    it is not needed for simple programs).

    As for mine (however it's possible I used it more recently):

    c:\c>tm cc hello
    Compiling hello.c to hello.exe
    TM: 0.04

    c:\c>dir hello.exe
    24/11/2024 00:52 2,560 hello.exe

    My installation is 0.3MB (excluding windows.h which is 0.6MB). Being self-contained, I can trivally apply UPX compression to get a 0.1MB compiler, which can be easily copied to a memory stick or bundled in one
    of my apps. However compiling hello.c now takes 0.05 seconds.

    (I don't use UPX because my apps are already tiny; it's just to marvel
    at how much redundancy they still contain, and how much tinier they
    could be.)

    I know none of this will cut any ice; for various reasons you don't want
    to use tcc.

    Well, I tried to use tcc when it first appeared. Unfortunalty it
    could not compile some valid C code that I passed to it. I filled
    a bug report, but it was not fixed for several years. Shortly after
    that I got AMD-64 machine and configured it as only 64-bit (one
    reason to do this was to avoid bloat due to having both 64-bit
    and 32-bit libraries). At that time and in following several
    years tcc did not support 64-bit code, so was not usable for me.
    Later IIRC it got 64-bit support, but I needed also ARM (and
    on ARM faster compiler would make more difference).

    There is question of trust: when what I reported remained unfixed
    I lost faith in quality of tcc. I still need to check if it is
    fixed now, but at least now tcc seem to have some developement.

    One of them being that your build process involves N slow stages so
    speeding up just one makes little difference.

    Yes.

    This however is very similar to my argument about optimisation; a
    running app consists of lots of parts which take up execution time, not
    all of which can be speeded up by a factor of 9. The net benefit will be
    a lot less, just like your reduced build time.

    If I do not have good reasons to write program in C, then likely I
    will write it in some higher-level language. One good reason
    to use C is to code performance-critical routines.

    And of course, there is a question why program with runtime that
    does not matter is written in a low level language?

    I mean it doesn't matter if it's half the speed. It might matter if it
    was 40 times slower.

    If you code bottlenecks in C, than 40 times slower may be OK for the
    rest. And there are compiled higher-level languages, you pay for
    higher-level features, but overhead is much lower, closer to your
    half speed (and that is mostly due to simpler code generator).

    There's quite a gulf between even unoptimised native code and even a
    fast dynamic language interpreter.

    People seem to think that the only choices are the fastest possible C
    code at one end, and slow CPython at the other:

    gcc/O3-tcc-----------------------------------------------------CPython

    On this scale, gcc/O3 code and tcc code are practically the same!

    There is Ocaml, it offers interpreter (faster than Python) and a
    compiler (which pobably gives faster code than your 'mcc -opt').
    There are Lisp compilers. There is Java and C# (I am avoiding
    them as they depend on sizeable runtime and due to propritary
    games played by the vendors).

    IME big productivity boost comes from garbage collection. But
    nobody knows how to make cooperating garbage collectors. So
    each garbage collected runtime forms its own island which has
    trouble reusing code from other garbage collected environments.
    ATM Python is biggest kind-of garbage collected environment so
    people are attracted to it to reuse existing code.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 22:46:04 2024
    Bart <bc@freeuk.com> wrote:
    On 22/11/2024 12:51, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    int main(void) {
    int a;
    int* p = 0;
    a = *p;
    }

    Here's what happens with my C compiler when told to interpret it:

    c:\cx>cc -i c
    Compiling c.c to c.(int)
    Error: Null ptr access

    Here's what happens with gcc:

    c:\cx>gcc c.c
    c:\cx>a
    <crashes>

    Is there some option to insert such a check with gcc? I've no idea; most >>> people don't.

    I would do

    gcc -g c.c
    gdb a.out
    run

    and gdb would show me place with bad access. Things like bound
    checking array access or overflow checking makes a big difference.
    Null pointer access is reliably detected by hardware so no big
    deal. Say what you 'cc' will do with the following function:

    int
    foo(int n) {
    int a[10];
    int i;
    int res = 0;
    for(i = 0; i <= 10; i++) {
    a[i] = n + i;
    }
    for(i = 0; i <= 10; i++) {
    res += a[i];
    }
    res;
    }

    Here gcc at compile time says:

    foo.c: In function ‘foo’:
    foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
    15 | res += a[i];
    | ~^~~
    foo.c:14:18: note: within this loop
    14 | for(i = 0; i <= 10; i++) {
    | ~~^~~~~

    My 'cc -i' wouldn't detect it. The -i tells it to run an interpreter on
    the intermediate code. Within the interpreter, some things are easily checked, but bounds info on arrays doesn't exist. (The IL supports only pointer operations, not high level array ops.)

    That would need intervention at an earlier stage, but even then, the
    design of C makes that difficult. First, because array types like
    int[10] decay to simple pointers, and ones represented by types like
    int* don't have bounds info at all. (I don't support int[n] params and
    few people use them anyway.)

    There is well-known technique of "fat pointers": pointer keeps info
    about area + actual pointer. So 3 machine words instead of 1.
    This have some trouble when you convert between pointers and
    integers, but program above is not doing this.

    In program above one could use a simple compile-time checking:
    keep info about array declaration (which you need anyway to
    implement 'sizeof'), and during access when array "decays" to
    a pointer keep info about bounds. Using VMT that could be
    extended to the whole program (of course it would fail when
    user passes pointers in traditional C way, but would work
    for "well behaved" programs).

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 22:47:58 2024
    Bart <bc@freeuk.com> wrote:
    On 24/11/2024 01:36, Bart wrote:
    On 24/11/2024 00:24, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    And of course, there is a question why program with runtime that
    does not matter is written in a low level language?

    I mean it doesn't matter if it's half the speed. It might matter if it
    was 40 times slower.

    There's quite a gulf between even unoptimised native code and even a
    fast dynamic language interpreter.

    People seem to think that the only choices are the fastest possible C
    code at one end, and slow CPython at the other:

    ÿ gcc/O3-tcc-----------------------------------------------------CPython

    On this scale, gcc/O3 code and tcc code are practically the same!

    (I wasn't able to post results earlier because CPython hadn't finished.
    But for a JPEG decoder test on an 85Mpixel image, all using the same algorithm:

    gcc-O3 2.2 seconds
    mm6-opt 3.3 seconds (My older compiler with the register optim.)
    mm7 5.7 seconds (My unoptimising new one)
    cc 6.0 seconds (Unoptimising)
    tcc 8.1 seconds
    PyPy 43 seconds (Uses JIT to optimise hot loops to native code)
    CPython 386 seconds)

    That looks like example of a program that should use optimizing
    compiler.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 23:18:39 2024
    Bart <bc@freeuk.com> wrote:
    On 23/11/2024 16:45, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    FYI, ATM is have a version compiling via Lisp, with bounds checking
    on it takes 0.58s, with bounds checking off it takes 0.43s
    on my machine. The reason to look at C version is to do better.
    Taken together, your and my timing indicate that your 'cc' will
    give me less speed than going via Lisp. 'mcc -opt' pobably would
    give an impovement, but not compared to 'gcc'. BTW, below times
    on a slower machine (5 years old cheap laptop):

    gcc -O3 -march=native 1722910us
    gcc -O3 1720884us
    gcc -O 1642328us
    tcc 7661992us

    via Lisp, checking 5.29s
    via Lisp, no checking 4.27s

    With -O3 gcc vectorizes inner loops, but apparently on this machine
    it backfires and execution time is longer than without vectorization.

    In both cases 'tcc' gives slower code than going via Lisp with
    array bounds checking on, so ATM using 'tcc' for this application
    is rather unattractive.

    Lisp is a rather mysterious language which can apparently be and do anything: it can be interpreted or compiled.

    If a parser generates parse tree, then you can use it as an
    input to actual compiler. Or you can interpret it. Applies
    to almost any language.

    Statically typed or
    dynamic.

    Normal Lisp data is tagged, so one can use dynamic typing. But
    Lisp also have type declaration which basicaly say to the compiler
    "trust me, this will always be an integer" (when needed replace
    integer by some other type). Lisp has a subset which is similar
    to Fortran 77: there are arrays, conditionals, loops etc. Arrays
    may be specialized, say so that they can keep only machine integers
    or doubles. Lisp declarations works in similar way as Fortran 77
    declarations: they tell compiler to use machine instructions
    for specified type. Difference is that lacking declarations
    Lisp will use dynamic typing. Anyway, it is possible to translate
    Fortran 77 into Lisp and speed of resulting code depends mainly
    on quality of code generator.

    This approach could be used for a lot of different languages,
    just many language implementations do not bother with providing
    a compiler and then declarations have minor effect.

    Imperative or functional.

    It can also apparently be implemented in a few dozen lines or code.

    Few dozen lines is minimal old Lisp implemented in Lisp. Smallest implementation in C is very minimal and has about 500 lines.
    There is Lisp standard, to implement it you probably need about
    20000 lines. Anyway, modern Lisp implementation are much larger
    than your languages.

    - most of code is portable, but for timing we need timer with
    sufficient resolution, so I use Unix 'gettimeofday'.

    Why? Just make the task take long enough.

    Well, Windows 'clock' looks OK, but some old style timing routines
    have really low resolution and would lead to excessive run
    time (I need to run rather large number of tests).

    I've tried all sorts, from Windows' high performance routines, down to
    x64's RDTSC instruction. They all gave unreliable, variable results. Now
    I just use 'clock', but might turn off all other apps for extra conistency.

    On Linux 'gettimeofday' reliably gives real time with good
    resolution. There is a trouble, as CPU-s now have variable
    frequency clock, use slow clock when load is low and switch
    to fast clock only under heavier load. One way to solve it
    is to use an utility which pins clock to specific frequency.
    Another is to run long enough to switch to higher frequency,
    so that lower frequency part does not change much.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sun Nov 24 23:20:20 2024
    On 24/11/2024 05:03, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.

    Yes, but exact size depends which version you install and how you
    install it. I installed version 6.5 and removed debugging info from executables. The result is 177MB, large but significantly smaller
    than what you have.

    Most of a gcc installation is hundreds of header and archive (.a) files
    for various libraries. There might be 32-bit and 64-bit versions. I
    understand that. But it also makes it hard to isolate the core compiler.

    I might try copying the directory tree to a pen-drive, but if there was
    one essential file missing out of 1000s, I wouldn't know. Test-running
    it from the pen-drive wouldn't work as, on Windows, it will likely be
    picking up those files from the original installation.

    In fact, it's quite hard to run two or more gcc versions on Windows,
    since they use the OS's list of search paths to look for support files (cc1.exe etc), rather than use a path relative to the location of the
    gcc.exe that was launched.

    A single-file compiler doesn't have that problem, as there are no
    auxiliary files!

    As for sizes:

    c:\c>dir hello.exe
    24/11/2024 00:44 2,048 hello.exe

    c:\c>dir a.exe
    24/11/2024 00:44 91,635 a.exe (48K with -s)

    (At least that's one good thing of gcc writing out that weird a.exe each
    time; I can compare both exes!)

    AFAICS this is one-time Windows overhead + default layout rules for
    the linker. On Linux I get 15952 bytes by defauls, 14472 after
    striping. However, the actual code + data size is 1904 and even
    in this most is crap needed to support extra features of C library.

    In other words, this is mostly irrelevant, as people who want to
    get size down can link it with different options to get smaller
    size down. Actual hello world code size is 99 bytes when compiled
    by gcc (default options) and 64 bytes by tcc.

    I get a size of 3KB for tcc compiling hello.c under WSL.

    On Windows, my cc compiler has the option of generating my private
    binary format called 'MX':

    c:\c>cc -mx hello
    Compiling hello.c to hello.mx

    c:\c>dir hello.mx
    24/11/2024 11:58 194 hello.mx

    Then the size is 194 bytes (most of that is a big header and list of
    default DLL files to import). However that requires a one-off launcher
    (12KB compiled as C) to run it:

    c:\c>runmx hello
    Hello, World!

    (In practice, MX files are bigger than equivalent EXEs since they
    contain more reloc info. I developed the format before I had options for PIC/relocatable code, which is necessary for OBJ/DLL formats.)



    I know none of this will cut any ice; for various reasons you don't want
    to use tcc.

    Well, I tried to use tcc when it first appeared.

    Up until 0.9.26 it was quite poor. That was the time I started my C
    compiler project (2017). At one point, I had a program (a lexing
    benchmark), which ran slightly faster under my dynamic language
    interpreter, than using tcc-compiled native code!

    This was because of its poor implementation of 'switch' which figured
    heavily in my test.

    But later that year, 0.9.27 came out, which fixed such issues, and was a
    much better, complete and conforming C compiler all round than my product.


    There is question of trust: when what I reported remained unfixed
    I lost faith in quality of tcc. I still need to check if it is
    fixed now, but at least now tcc seem to have some developement.

    One of them being that your build process involves N slow stages so
    speeding up just one makes little difference.

    Yes.

    This however is very similar to my argument about optimisation; a
    running app consists of lots of parts which take up execution time, not
    all of which can be speeded up by a factor of 9. The net benefit will be
    a lot less, just like your reduced build time.

    If I do not have good reasons to write program in C, then likely I
    will write it in some higher-level language. One good reason
    to use C is to code performance-critical routines.

    It can also do manipulations that are harder in a 'softer', safer HLL.
    (My scripting language however can still do most of those underhand things.)




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 02:00:17 2024
    Bart <bc@freeuk.com> wrote:
    On 24/11/2024 05:03, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    As for sizes:

    c:\c>dir hello.exe
    24/11/2024 00:44 2,048 hello.exe

    c:\c>dir a.exe
    24/11/2024 00:44 91,635 a.exe (48K with -s)

    (At least that's one good thing of gcc writing out that weird a.exe each >>> time; I can compare both exes!)

    AFAICS this is one-time Windows overhead + default layout rules for
    the linker. On Linux I get 15952 bytes by defauls, 14472 after
    striping. However, the actual code + data size is 1904 and even
    in this most is crap needed to support extra features of C library.

    In other words, this is mostly irrelevant, as people who want to
    get size down can link it with different options to get smaller
    size down. Actual hello world code size is 99 bytes when compiled
    by gcc (default options) and 64 bytes by tcc.

    I get a size of 3KB for tcc compiling hello.c under WSL.

    That more or less agrees with file size that I reported. I
    prefer to look at what 'size' reports and at looking at .o
    files, as this is more relevant when scaling to larger
    programs. Simply, 10000 programs with 16kB overhead each
    is 160MB overhead. When it matters, then I am likely to
    have much less than 10000 executables. 100 executables
    each 10MB are more likely. Note that there is old Unix
    trick of puting multiple programs into a single file
    (executable). The executable appears in filesystem under
    say 100 names and performs different things depending on
    the name. There is dispatching code, something like 40
    bytes per name, so there is overhead, but much lower than
    having independent executables. So, per program
    overhead can be quite small.

    Larger size (16kB) is due to page alignment of program parts
    which have some benefits. So there are tradofs, and when
    size matters there are ways to save disc space. OTOH if
    actual code takes a lot of space, then there is no easy
    solution.

    On Windows, my cc compiler has the option of generating my private
    binary format called 'MX':

    c:\c>cc -mx hello
    Compiling hello.c to hello.mx

    c:\c>dir hello.mx
    24/11/2024 11:58 194 hello.mx

    Then the size is 194 bytes (most of that is a big header and list of
    default DLL files to import). However that requires a one-off launcher
    (12KB compiled as C) to run it:

    c:\c>runmx hello
    Hello, World!

    (In practice, MX files are bigger than equivalent EXEs since they
    contain more reloc info. I developed the format before I had options for PIC/relocatable code, which is necessary for OBJ/DLL formats.)

    In Linux typically filesystem block size is 4kB, so anything bigger
    than 0 takes at least 4kB. So super-small executables (I think
    record is below 200 bytes) do not really save space. And they
    actually need more RAM, as system first loads program file into
    buffers. If program is properly organized, it could be executed
    directly from file buffer. But super-small executable needs extra
    copy, to put parts in separate pages. So there is a compromise
    between memory use and disc space, and usually moderate increase
    in disc use is considered worth lower memory use.

    If I do not have good reasons to write program in C, then likely I
    will write it in some higher-level language. One good reason
    to use C is to code performance-critical routines.

    It can also do manipulations that are harder in a 'softer', safer HLL.
    (My scripting language however can still do most of those underhand things.)

    Anything computational can be done in a HLL. You may wish to
    play tricks to save time. Or possible some packing tricks to
    save memory. But packing tricks can be done in HLL (say by
    treating whole memory as a big array of u64), so this really
    boils down to speed.

    You may wish to write an OS or to interact with hardware, but
    here I usuallt want optimization. Maybe not as aggressive
    as modern gcc, but at least of order of gcc-1 (which probably
    would probably have compile times tens times lower than modern
    gcc).

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 25 04:50:36 2024
    On 24/11/2024 15:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 24/11/2024 05:03, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    As for sizes:

    c:\c>dir hello.exe
    24/11/2024 00:44 2,048 hello.exe

    c:\c>dir a.exe
    24/11/2024 00:44 91,635 a.exe (48K with -s)

    (At least that's one good thing of gcc writing out that weird a.exe each >>>> time; I can compare both exes!)

    AFAICS this is one-time Windows overhead + default layout rules for
    the linker. On Linux I get 15952 bytes by defauls, 14472 after
    striping. However, the actual code + data size is 1904 and even
    in this most is crap needed to support extra features of C library.

    In other words, this is mostly irrelevant, as people who want to
    get size down can link it with different options to get smaller
    size down. Actual hello world code size is 99 bytes when compiled
    by gcc (default options) and 64 bytes by tcc.

    I get a size of 3KB for tcc compiling hello.c under WSL.

    That more or less agrees with file size that I reported. I
    prefer to look at what 'size' reports and at looking at .o
    files,

    Oh, I thought you were reporting sizes of 99 and 64 bytes, in response
    to tcc's 2048 bytes.

    So I'm not sure what you mean by 'actual' size, unless it is the same as
    this reported by my product here (comments added):

    c:\cx>cc -v hello
    Compiling hello.c to hello.exe
    Code size: 34 bytes # .text
    Idata size: 15 # .data
    Code+Idata: 49
    Zdata size: 0 # .bss
    EXE size: 2,560

    So at 49 bytes, I guess I win! But in terms of actual file-size, since
    both tcc/cc can run programs from source, then all that's needed is
    hello.c, 53 bytes minimum.


    It can also do manipulations that are harder in a 'softer', safer HLL.
    (My scripting language however can still do most of those underhand things.)

    Anything computational can be done in a HLL. You may wish to
    play tricks to save time. Or possible some packing tricks to
    save memory. But packing tricks can be done in HLL (say by
    treating whole memory as a big array of u64), so this really
    boils down to speed.

    I'm sure that with Python, say, pretty much anything can be done given
    enough effort. Even if it means cheating by using external add-on
    modules to get around language limitations, like using Ctypes module,
    which you will likely find uses C code.

    This is different from having things part of the core language so they
    become effortless and natural.

    But, everything you've said seems to have backed up my remark that
    people only seem to consider two possibilities:

    * Either a scripting language where it doesn't matter that it's 1-2
    magnitudes slower than native code

    * Or a compiled language where it absolutely MUST be at least as fast as gcc/clang-O3. Only 20 times faster than CPython is not enough!

    (In my JPEG timings I posted earlier today, CPython was 175 times slower
    than gcc-O3. and 48-64 times slower than unoptimised C.

    Applying the simplest optimsation, which I can tell you adds only 10% to compilation time) made native code over 100 times faster than CPython,
    and only 50 slower than gcc-O3. This was on a deliberately large input

    Basically, if you are generating even the worst native code, then it
    will already wipe the floor with any scripting language, when comparing
    them both executing the same algorithm.)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 06:35:37 2024
    Bart <bc@freeuk.com> wrote:
    On 24/11/2024 15:00, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 24/11/2024 05:03, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    As for sizes:

    c:\c>dir hello.exe
    24/11/2024 00:44 2,048 hello.exe

    c:\c>dir a.exe
    24/11/2024 00:44 91,635 a.exe (48K with -s)

    (At least that's one good thing of gcc writing out that weird a.exe each >>>>> time; I can compare both exes!)

    AFAICS this is one-time Windows overhead + default layout rules for
    the linker. On Linux I get 15952 bytes by defauls, 14472 after
    striping. However, the actual code + data size is 1904 and even
    in this most is crap needed to support extra features of C library.

    In other words, this is mostly irrelevant, as people who want to
    get size down can link it with different options to get smaller
    size down. Actual hello world code size is 99 bytes when compiled
    by gcc (default options) and 64 bytes by tcc.

    I get a size of 3KB for tcc compiling hello.c under WSL.

    That more or less agrees with file size that I reported. I
    prefer to look at what 'size' reports and at looking at .o
    files,

    Oh, I thought you were reporting sizes of 99 and 64 bytes, in response
    to tcc's 2048 bytes.

    So I'm not sure what you mean by 'actual' size, unless it is the same as this reported by my product here (comments added):

    c:\cx>cc -v hello
    Compiling hello.c to hello.exe
    Code size: 34 bytes # .text
    Idata size: 15 # .data
    Code+Idata: 49
    Zdata size: 0 # .bss
    EXE size: 2,560

    So at 49 bytes, I guess I win!

    It looks so. Yes, I mean code + data size, if you have multiple
    functions this adds up, while constant overhead remains constant.
    On linux each program is supposed to have a header and that
    puts absolute lower bound on size of the program (no neader =>
    OS considers it as invalid). In modern programs you are
    supposed to have separate code area, read-only data area and
    mutable data area. In running program each of them consists
    of integral number of pages. If you arrange them so that OS
    can load them most easily, you get something like 12kB or 16kB
    (actually a bit smaller as normally file will not contain
    unused part of the last page). But if you add more code or
    data the size will grow only sligthly or not at all: you
    will see growth on last page and when one of inner pages
    overflows and you need to start a new page.

    It can also do manipulations that are harder in a 'softer', safer HLL.
    (My scripting language however can still do most of those underhand things.)

    Anything computational can be done in a HLL. You may wish to
    play tricks to save time. Or possible some packing tricks to
    save memory. But packing tricks can be done in HLL (say by
    treating whole memory as a big array of u64), so this really
    boils down to speed.

    I'm sure that with Python, say, pretty much anything can be done given enough effort. Even if it means cheating by using external add-on
    modules to get around language limitations, like using Ctypes module,
    which you will likely find uses C code.

    I did not look how Python is doing its things. In one system that
    I use there is rather general routine written in assembler which
    can call routines using C call convention. The assembler routine
    performs simple data convertion like removing tags so that C
    sees raw machine integers or floats. It also knows which arguments
    are supposed to go on the stack and which should be in registers.
    There is less complete routine which allows callbacks from C,
    this one abuses C (it is invalid C which happens to work OK in
    all C compilers used to compile the system). There is a bunch
    other assembler support routines, like access to arbitrary
    bitstring, byte copy (used to copy arrays when needed), etc.

    Rest is in the language itself: code generator knows about
    references to bitstrings and in simple cases generates inline
    code and passes general case to assembler support. There are
    language defined data structures to represent external pointers
    and functions. At higher level there is parser for C
    declarations which can generate code to repack data structure
    from C version to internal and back.

    Concerning cheating, of course Python is cheating a lot. It has
    several routines which work on sizeable pieces of data. Those
    routines are coded in C or C++, so you get optimized C speed
    when you call them.

    This is different from having things part of the core language so they become effortless and natural.

    But, everything you've said seems to have backed up my remark that
    people only seem to consider two possibilities:

    * Either a scripting language where it doesn't matter that it's 1-2 magnitudes slower than native code

    * Or a compiled language where it absolutely MUST be at least as fast as gcc/clang-O3. Only 20 times faster than CPython is not enough!

    You ignored what I wrote about compiled higher-level languages,
    they exist, have speed competitive with your low-level language
    and some people use them. Majority seem to go with interpreted
    languages. Note that interpreted languages frequently have
    large library of performance critical routines written in
    lower level language. Do not be surprised that they want
    optimizing compiler for them.

    (In my JPEG timings I posted earlier today, CPython was 175 times slower than gcc-O3. and 48-64 times slower than unoptimised C.

    Applying the simplest optimsation, which I can tell you adds only 10% to compilation time) made native code over 100 times faster than CPython,
    and only 50 slower than gcc-O3. This was on a deliberately large input

    Basically, if you are generating even the worst native code, then it
    will already wipe the floor with any scripting language, when comparing
    them both executing the same algorithm.)

    But competion in not fair, the other side is cheating. Note
    that using low-level language coding effort will be comparable
    to C. You may save some time if you get better diagnostics.
    There were studies claiming that stronger type checking reduces
    effort needed to write correct program. But main increase in
    productivity comes from higher-level constructs. Actually,
    probably bigest gain is when you can reuse existing code, which
    means to popular languages have very big advantage over less
    popular ones. You need to rather strong advantages to
    overcome popularity advantage of other language. Faster
    compilation while nice have limited effect. And people have
    ways to mitigate long compile times. So normal justification
    for using low level language is "I need runtime speed". And
    in such case it is natural to use compiler giving fastest
    runtime speed.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Mon Nov 25 07:01:43 2024
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]

    That doesn't agree with my observations.

    Of course most of the headers and libraries are not part of gcc itself.
    As usual, you refer to the entire implementation as "gcc".

    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
    installing each into a new directory.

    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
    the executables.

    The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.

    Of course there are other libraries that can be used with gcc, and they
    could take a lot of space -- but they're not part of gcc.

    These sizes might differ on Windows.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 25 07:52:54 2024
    On 24/11/2024 20:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]

    That doesn't agree with my observations.

    Of course most of the headers and libraries are not part of gcc itself.
    As usual, you refer to the entire implementation as "gcc".

    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.

    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
    the executables.

    That's even huger than mine! So, that are those 3.7GB full of? What does
    the 1.9GB of executables do?


    The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.

    Is that included in one of those two divisions above?


    Of course there are other libraries that can be used with gcc, and they
    could take a lot of space -- but they're not part of gcc.

    So, what /is/ gcc? What's the minimum installation that can compile
    hello.c to hello.s for example?

    I've done that experiment on my TDM version, and the answer appears to
    be about 40MB in this directory structure:

    Directory of c:\tdm\bin
    24/07/2024 10:21 1,926,670 gcc.exe
    24/07/2024 10:21 2,279,503 libisl-23.dll
    24/07/2024 10:22 164,512 libmpc-3.dll
    24/07/2024 10:22 702,852 libmpfr-6.dll

    Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
    24/07/2024 10:24 34,224,654 cc1.exe

    Directory of c:\tdm\x86_64-w64-mingw32\include
    17/01/2021 17:33 368 stddef.h
    27/03/2021 20:07 2,924 stdio.h

    7 File(s) 39,301,483 bytes

    Here I cheated a little and used the minimum std headers from my
    compiler, otherwise I could have spent an hour chasing down dozens of
    obscure nested headers that gcc's stdio.h likes to make use of.

    Is /this/ gcc then? Will you agree that it is by no means clear what
    'gcc' includes, or what to call the part of a gcc installed bundle that
    is not technically gcc?

    A more useful installation would of course need more standard headers,
    an assembler, linker, and whatever .a files are needed to provide the
    standard library.

    With clang, it is easier: apparently everything needed to do the above,
    other than header files, is contained with a 120MB executable clang.exe.

    However the full 2.8GB llvm/clang installation doesn't provide any
    headers, nor a linker. At least it doesn't use the provided 88MB (!)
    lld.exe; it expects to work on top of MSVC, which it has never managed
    to do.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 08:45:39 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]

    That doesn't agree with my observations.

    Of course most of the headers and libraries are not part of gcc itself.
    As usual, you refer to the entire implementation as "gcc".

    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.

    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
    the executables.

    That is much larger than what I got. On Debian 12.7 I used
    '--disable-multilib --enable-languages=c,c++,objc,obj-c++,fortran,ada,m2,go'.

    IIRC it was something like 2.4G originally and 1012176k after
    striping. AFAICS with earlier versions ARM compiler was much
    bigger than x86_64 one, mainly because ARM had libraries for
    several variants of the architecture. Header files are not
    that big (but still several megabytes), but libraries seem to
    be quite large (I did not check, but it is possible that
    libraries still contain debug info).

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Mon Nov 25 08:45:55 2024
    Bart <bc@freeuk.com> writes:
    On 24/11/2024 20:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]
    That doesn't agree with my observations.
    Of course most of the headers and libraries are not part of gcc
    itself.
    As usual, you refer to the entire implementation as "gcc".
    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
    installing each into a new directory.
    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
    strip
    the executables.

    That's even huger than mine! So, that are those 3.7GB full of? What
    does the 1.9GB of executables do?

    I installed compilers for multiple languages. A more typical
    installation likely won't include compilers for Ada, Go, Fortran,
    Modula-2, and Rust. There are a number of hard links to other files;
    for example c++, g++, x86_64-pc-linux-gnu-c++, and
    x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
    clever enough to count them only once.

    Here's the output of `ls -s` on the bin directory (sizes are in units of
    1024 bytes) :

    total 611908
    8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
    8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
    8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
    8820 gcc 9556 gnatchop 8820 x86_64-pc-linux-gnu-gcc-14.2.0
    156 gcc-ar 12564 gnatclean 156 x86_64-pc-linux-gnu-gcc-ar
    156 gcc-nm 7864 gnatkr 156 x86_64-pc-linux-gnu-gcc-nm
    152 gcc-ranlib 8564 gnatlink 152 x86_64-pc-linux-gnu-gcc-ranlib
    8828 gccgo 12764 gnatls 8828 x86_64-pc-linux-gnu-gccgo
    8820 gccrs 13584 gnatmake 8820 x86_64-pc-linux-gnu-gccrs
    7784 gcov 12236 gnatname 8828 x86_64-pc-linux-gnu-gdc
    6324 gcov-dump 12308 gnatprep 8824 x86_64-pc-linux-gnu-gfortran
    6468 gcov-tool 11136 go 8960 x86_64-pc-linux-gnu-gm2
    8828 gdc 620 gofmt
    8824 gfortran 308740 lto-dump

    The glibc installation (libraries and headers) is about 199 MB, a small
    fraction of the size of the gcc intallation.

    Is that included in one of those two divisions above?

    Of course not. glibc is not part of gcc.

    Of course there are other libraries that can be used with gcc, and they
    could take a lot of space -- but they're not part of gcc.

    So, what /is/ gcc? What's the minimum installation that can compile
    hello.c to hello.s for example?

    Those are two separate questions. gcc by itself can't compile hello.c
    to hello.s. But it's always installed along with other tools that allow
    it to do so, as part of what the C standard calls an "implementation".

    You can't compile hello.c to hello.s without an OS kernel, but I presume
    you'd agree that the kernel isn't part of gcc. And hello.s isn't useful without an assembler, which is not treated as part of gcc.

    gcc is a compiler, or rather a compiler collection. (The "gcc" command
    is the C compiler component of the "gcc" compiler collection.) Since
    gcc does not provide <stdio.h>, I presume that a standalone gcc would
    not be able to compile hello.c without depending on a library, whether
    that library is installed separately or as part of a package like
    tdm-gcc (there's nothing wrong with either approach).

    I should also acknowledge that the "gcc" package, whether it's provided
    as source code or as binaries, provides some files that are not part of
    the compiler itself, for example library files that are closely tied to
    the compiler. Installable software packages don't have to follow any particular division between compiler, library, and other components.

    When I install gcc, binutils, and glibc from the Ubuntu package manager,
    the binaries are installed in common directories (/usr/bin, /usr/lib, et
    al). There's no "gcc directory" or "glibc directory". But the system
    keeps track of which files were install from which packages.

    Perhaps you don't care what is or isn't part of "gcc". If that's the
    case, that's fine, but it would help if you'd stop referring to things
    as "gcc" without knowing what that means. You're using "gcc-tdm"; just
    call it that.

    I've done that experiment on my TDM version, and the answer appears to
    be about 40MB in this directory structure:

    Directory of c:\tdm\bin
    24/07/2024 10:21 1,926,670 gcc.exe
    24/07/2024 10:21 2,279,503 libisl-23.dll
    24/07/2024 10:22 164,512 libmpc-3.dll
    24/07/2024 10:22 702,852 libmpfr-6.dll

    Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
    24/07/2024 10:24 34,224,654 cc1.exe

    Directory of c:\tdm\x86_64-w64-mingw32\include
    17/01/2021 17:33 368 stddef.h
    27/03/2021 20:07 2,924 stdio.h

    7 File(s) 39,301,483 bytes

    Here I cheated a little and used the minimum std headers from my
    compiler, otherwise I could have spent an hour chasing down dozens of
    obscure nested headers that gcc's stdio.h likes to make use of.

    Is /this/ gcc then? Will you agree that it is by no means clear what
    'gcc' includes, or what to call the part of a gcc installed bundle
    that is not technically gcc?

    It's not entirely clear, but it's much clearer than you make it out to
    be.

    One thing that should be obvious by now is that stdio.h is not part of
    "gcc", though it's probably part of "gcc-tdm". On my system, stddef.h
    is provided by libgcc-11-dev, which is closely associated with gcc. I'm
    not entirely sure why gcc-11 and libgcc-11-dev (the Ubuntu binary
    packages) are separate -- nor do I have to care, since the package
    management system is clever enough to recognize the dependencies and
    keep them in sync.

    A more useful installation would of course need more standard headers,
    an assembler, linker, and whatever .a files are needed to provide the standard library.

    Sure, those are all part of a C implementation, though they're not part
    of gcc.

    With clang, it is easier: apparently everything needed to do the
    above, other than header files, is contained with a 120MB executable clang.exe.

    That may be true for the "clang.exe" on your system. I'm fairly sure
    it's not true for the "/usr/bin/clang" on my system. Perhaps you
    installed some Windows package that provides the clang compiler and
    other components of a C implementation, similar to the way gcc-tdm
    provides gcc and other components.

    However the full 2.8GB llvm/clang installation doesn't provide any
    headers, nor a linker. At least it doesn't use the provided 88MB (!)
    lld.exe; it expects to work on top of MSVC, which it has never managed
    to do.

    I suspect others have managed it, but I haven't tried (I don't use
    llvm/clang on Windows other than via Cygwin and WSL). But apparently MS
    Visual Studio can be configured to use clang as its compiler.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 09:21:59 2024
    Bart <bc@freeuk.com> wrote:
    On 24/11/2024 20:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]

    That doesn't agree with my observations.

    Of course most of the headers and libraries are not part of gcc itself.
    As usual, you refer to the entire implementation as "gcc".

    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
    installing each into a new directory.

    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
    the executables.

    That's even huger than mine! So, that are those 3.7GB full of? What does
    the 1.9GB of executables do?

    The 3.7GB is debug info which Keith removed. gcc is now written in
    C++ and when you compile with debug info on about 90% of executable
    is debug info.

    Of course there are other libraries that can be used with gcc, and they
    could take a lot of space -- but they're not part of gcc.

    So, what /is/ gcc? What's the minimum installation that can compile
    hello.c to hello.s for example?

    I've done that experiment on my TDM version, and the answer appears to
    be about 40MB in this directory structure:

    Directory of c:\tdm\bin
    24/07/2024 10:21 1,926,670 gcc.exe
    24/07/2024 10:21 2,279,503 libisl-23.dll
    24/07/2024 10:22 164,512 libmpc-3.dll
    24/07/2024 10:22 702,852 libmpfr-6.dll

    Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
    24/07/2024 10:24 34,224,654 cc1.exe

    That is reasonably good apporximation to the compiler proper.
    More preciesly, to compile you need 'cc1.exe' and libraries
    it uses. On Linux I get:
    ldd /sklad0/p0/kompi/gcc_pp/usr_14.2.0/libexec/gcc/x86_64-pc-linux-gnu/14.2.0/cc1
    linux-vdso.so.1 (0x00007ffc8a8f2000)
    libmpc.so.3 => /lib/x86_64-linux-gnu/libmpc.so.3 (0x00007fa55e071000)
    libmpfr.so.6 => /lib/x86_64-linux-gnu/libmpfr.so.6 (0x00007fa55dfb7000)
    libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fa55df36000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa55de57000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa55dc76000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa55e0a9000)

    This is list of libraries needed by 'cc1'. /lib64/ld-linux-x86-64.so.2, libc.so.6 and libm.so.6 are system libraries needed by almost all
    things. linux-vdso.so.1 is virtual thing, IIUC there is nothing
    corresponding on the disc.

    Directory of c:\tdm\x86_64-w64-mingw32\include
    17/01/2021 17:33 368 stddef.h
    27/03/2021 20:07 2,924 stdio.h

    7 File(s) 39,301,483 bytes

    Here I cheated a little and used the minimum std headers from my
    compiler, otherwise I could have spent an hour chasing down dozens of obscure nested headers that gcc's stdio.h likes to make use of.

    Yes, beside compiler propor you also need headers used by the C
    file.

    Is /this/ gcc then? Will you agree that it is by no means clear what
    'gcc' includes, or what to call the part of a gcc installed bundle that
    is not technically gcc?

    A more useful installation would of course need more standard headers,
    an assembler, linker, and whatever .a files are needed to provide the standard library.

    Debian splits gcc into several package. One of them is 'cpp-12'
    and this one gives you 'cc1' (that is compiler proper). There
    is 'gcc-12' which actually mainly provides extra features like
    lto (link time optimization), sanitizers. It also provides
    things like 'collect2' (wrapper around linker to have extra
    features) and 'x86_64-linux-gnu-gcc-ar-12' (I do not know why
    this is needed). 'gcc-12' pulls several dependencies:

    cpp-12, gcc-12-base, libcc1-0, binutils, libgcc-12-dev,
    libc6, libgcc-s1, libgmp10, libisl23, libmpc3, libmpfr6,
    libstdc++6, libzstd1, zlib1g

    binutils gives you assembler and linker, libgcc-s1 is shared
    support library (needed to run dynamically linked programs),
    libgcc-12-dev contains startup files (needed to link any program)
    and bunch of libraries and headers supporting extra features,
    libgmp10, libmpc3, libmpfr6 (and of course libc6) are needed
    to run compiler. I am not sure about libisl23, libstdc++6,
    libzstd1, zlib1g.

    To get standard header files you need to install 'libc6-dev'.

    With clang, it is easier: apparently everything needed to do the above, other than header files, is contained with a 120MB executable clang.exe.

    Probably you means things needed to run the compiler. clang compiled executable need libraries too, on Debian this is shared with gcc.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 25 11:00:59 2024
    On 24/11/2024 21:45, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    On 24/11/2024 20:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]
    That doesn't agree with my observations.
    Of course most of the headers and libraries are not part of gcc
    itself.
    As usual, you refer to the entire implementation as "gcc".
    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
    installing each into a new directory.
    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
    strip
    the executables.

    That's even huger than mine! So, that are those 3.7GB full of? What
    does the 1.9GB of executables do?

    I installed compilers for multiple languages. A more typical
    installation likely won't include compilers for Ada, Go, Fortran,
    Modula-2, and Rust. There are a number of hard links to other files;
    for example c++, g++, x86_64-pc-linux-gnu-c++, and
    x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
    clever enough to count them only once.

    Here's the output of `ls -s` on the bin directory (sizes are in units of
    1024 bytes) :

    total 611908
    8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
    8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
    8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
    8820 gcc 9556 gnatchop 8820 x86_64-pc-linux-gnu-gcc-14.2.0
    156 gcc-ar 12564 gnatclean 156 x86_64-pc-linux-gnu-gcc-ar
    156 gcc-nm 7864 gnatkr 156 x86_64-pc-linux-gnu-gcc-nm
    152 gcc-ranlib 8564 gnatlink 152 x86_64-pc-linux-gnu-gcc-ranlib
    8828 gccgo 12764 gnatls 8828 x86_64-pc-linux-gnu-gccgo
    8820 gccrs 13584 gnatmake 8820 x86_64-pc-linux-gnu-gccrs
    7784 gcov 12236 gnatname 8828 x86_64-pc-linux-gnu-gdc
    6324 gcov-dump 12308 gnatprep 8824 x86_64-pc-linux-gnu-gfortran
    6468 gcov-tool 11136 go 8960 x86_64-pc-linux-gnu-gm2
    8828 gdc 620 gofmt
    8824 gfortran 308740 lto-dump

    The glibc installation (libraries and headers) is about 199 MB, a small
    fraction of the size of the gcc intallation.

    Is that included in one of those two divisions above?

    Of course not. glibc is not part of gcc.

    Of course there are other libraries that can be used with gcc, and they
    could take a lot of space -- but they're not part of gcc.

    So, what /is/ gcc? What's the minimum installation that can compile
    hello.c to hello.s for example?

    Those are two separate questions. gcc by itself can't compile hello.c
    to hello.s. But it's always installed along with other tools that allow
    it to do so, as part of what the C standard calls an "implementation".

    You can't compile hello.c to hello.s without an OS kernel, but I presume you'd agree that the kernel isn't part of gcc. And hello.s isn't useful without an assembler, which is not treated as part of gcc.

    gcc is a compiler, or rather a compiler collection. (The "gcc" command
    is the C compiler component of the "gcc" compiler collection.) Since
    gcc does not provide <stdio.h>, I presume that a standalone gcc would
    not be able to compile hello.c without depending on a library, whether
    that library is installed separately or as part of a package like
    tdm-gcc (there's nothing wrong with either approach).

    I should also acknowledge that the "gcc" package, whether it's provided
    as source code or as binaries, provides some files that are not part of
    the compiler itself, for example library files that are closely tied to
    the compiler. Installable software packages don't have to follow any particular division between compiler, library, and other components.

    When I install gcc, binutils, and glibc from the Ubuntu package manager,
    the binaries are installed in common directories (/usr/bin, /usr/lib, et
    al). There's no "gcc directory" or "glibc directory". But the system
    keeps track of which files were install from which packages.

    Perhaps you don't care what is or isn't part of "gcc". If that's the
    case, that's fine, but it would help if you'd stop referring to things
    as "gcc" without knowing what that means. You're using "gcc-tdm"; just
    call it that.

    I've done that experiment on my TDM version, and the answer appears to
    be about 40MB in this directory structure:

    Directory of c:\tdm\bin
    24/07/2024 10:21 1,926,670 gcc.exe
    24/07/2024 10:21 2,279,503 libisl-23.dll
    24/07/2024 10:22 164,512 libmpc-3.dll
    24/07/2024 10:22 702,852 libmpfr-6.dll

    Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
    24/07/2024 10:24 34,224,654 cc1.exe

    Directory of c:\tdm\x86_64-w64-mingw32\include
    17/01/2021 17:33 368 stddef.h
    27/03/2021 20:07 2,924 stdio.h

    7 File(s) 39,301,483 bytes

    Here I cheated a little and used the minimum std headers from my
    compiler, otherwise I could have spent an hour chasing down dozens of
    obscure nested headers that gcc's stdio.h likes to make use of.

    Is /this/ gcc then? Will you agree that it is by no means clear what
    'gcc' includes, or what to call the part of a gcc installed bundle
    that is not technically gcc?

    It's not entirely clear, but it's much clearer than you make it out to
    be.

    One thing that should be obvious by now is that stdio.h is not part of
    "gcc", though it's probably part of "gcc-tdm". On my system, stddef.h
    is provided by libgcc-11-dev, which is closely associated with gcc. I'm
    not entirely sure why gcc-11 and libgcc-11-dev (the Ubuntu binary
    packages) are separate -- nor do I have to care, since the package
    management system is clever enough to recognize the dependencies and
    keep them in sync.

    A more useful installation would of course need more standard headers,
    an assembler, linker, and whatever .a files are needed to provide the
    standard library.

    Sure, those are all part of a C implementation, though they're not part
    of gcc.


    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities. On Windows, C compilers tend to be self-contained (except for
    Clang which appears to be parasitical: it used to piggy-back onto gcc,
    then it switched to MSVC).

    I'm not sure what the utility to compile C programs is called, if it is
    not 'gcc'. But this is a C group, I would expect people to know it is a
    C compiler, or the front end of one.

    However I use 'gcc' in other forums and everyone knows what I mean.

    What do /you/ call the C compiler that is invoked by gcc?




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 25 11:19:14 2024
    On 24/11/2024 22:21, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:




    With clang, it is easier: apparently everything needed to do the above,
    other than header files, is contained with a 120MB executable clang.exe.

    Probably you means things needed to run the compiler. clang compiled executable need libraries too, on Debian this is shared with gcc.

    No, this was a standalone 119MB clang.exe. I had to give it a tweaked
    hello.c without stdio.h, and it produced only hello.s.

    My cc.exe is 1/400th the size (99.75% smaller) and it can convert
    hello.c (/with/ stdio.h) to hello.exe, or any of half-dozen options
    within the same package (eg. interpret or run).

    (cc.exe concentrates on single-file programs. To compile multi-module programs, it needs a 200-line script, and an extra 0.1MB utility, an assembler-linker. Then outputs are limited to EXE/DLL/OBJ/MX.

    Most of my needs for a C compiler however are for programs contained
    within one file.)

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Mon Nov 25 14:17:11 2024
    Bart <bc@freeuk.com> writes:
    On 24/11/2024 21:45, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    A more useful installation would of course need more standard headers,
    an assembler, linker, and whatever .a files are needed to provide the
    standard library.
    Sure, those are all part of a C implementation, though they're not
    part of gcc.

    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    I'm not sure what you mean by "provided by the OS". Linux-based
    systems tend to be very modular, with almost everything provided by
    some installable binary package. Some of those packages have to
    be provided by default, for example any dynamic libraries relied
    on by most executables. Files that are needed for development,
    such as header files, compilers, and associated tools such as
    assemblers and linkers, may be optional.

    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities.

    On my system (Ubuntu), the as and ld commands are provided by the
    binutils package ("binutils-x86-64-linux-gnu"). Some distributions
    may install these by default. Others do not, but they're easy
    to install.

    On Windows, C compilers tend to be self-contained (except
    for Clang which appears to be parasitical: it used to piggy-back onto
    gcc, then it switched to MSVC).

    I don't know what you mean by "piggy-back onto gcc".

    I'm not sure what the utility to compile C programs is called, if it
    is not 'gcc'. But this is a C group, I would expect people to know it
    is a C compiler, or the front end of one.

    However I use 'gcc' in other forums and everyone knows what I mean.

    What do /you/ call the C compiler that is invoked by gcc?

    I call it gcc.

    "gcc" is the name for several things. It's the "GNU Compiler
    Collection". It's the command invoked as the driver for any of
    several compilers that are part of the GNU Compiler Collection.
    It can refer specifically to the C compiler. It's mildly confusing
    for historical reasons, but most people don't have much of a
    problem with it, and don't pretend that it's more confusing than
    it really is.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Nov 25 20:30:46 2024
    On Sun, 24 Nov 2024 13:45:55 -0800
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    Bart <bc@freeuk.com> writes:
    On 24/11/2024 20:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate
    the core compiler.
    [...]
    That doesn't agree with my observations.
    Of course most of the headers and libraries are not part of gcc
    itself.
    As usual, you refer to the entire implementation as "gcc".
    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
    installing each into a new directory.
    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
    strip
    the executables.

    That's even huger than mine! So, that are those 3.7GB full of? What
    does the 1.9GB of executables do?

    I installed compilers for multiple languages. A more typical
    installation likely won't include compilers for Ada, Go, Fortran,
    Modula-2, and Rust. There are a number of hard links to other files;
    for example c++, g++, x86_64-pc-linux-gnu-c++, and
    x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
    clever enough to count them only once.

    Here's the output of `ls -s` on the bin directory (sizes are in units
    of 1024 bytes) :

    total 611908
    8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
    8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
    8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
    8820 gcc 9556 gnatchop 8820
    x86_64-pc-linux-gnu-gcc-14.2.0 156 gcc-ar 12564 gnatclean
    156 x86_64-pc-linux-gnu-gcc-ar 156 gcc-nm 7864 gnatkr
    156 x86_64-pc-linux-gnu-gcc-nm 152 gcc-ranlib 8564 gnatlink
    152 x86_64-pc-linux-gnu-gcc-ranlib 8828 gccgo 12764 gnatls
    8828 x86_64-pc-linux-gnu-gccgo 8820 gccrs 13584 gnatmake
    8820 x86_64-pc-linux-gnu-gccrs 7784 gcov 12236 gnatname
    8828 x86_64-pc-linux-gnu-gdc 6324 gcov-dump 12308 gnatprep
    8824 x86_64-pc-linux-gnu-gfortran 6468 gcov-tool 11136 go
    8960 x86_64-pc-linux-gnu-gm2 8828 gdc 620 gofmt
    8824 gfortran 308740 lto-dump


    67% of bin directory of i386 gcc13 compiler that I compiled from source
    on msys2 few months ago is a single huge executive:i386-elf-lto-dump.exe 410,230,002 bytes with symbols, 28,347,904 bytes stripped.
    Copying such file is not instant, even on SSD. Certainly takes time
    over internet.

    It does not look like I have any use for it, stripped or not. When I
    want dump, I use smaller utility, i386-elf-objdump.exe (14,740,647
    bytes with symbols, 2,242,048 bytes stripped) that already does more
    than I would know to use.

    Arm gcc12 compiler for small emebedded targets (arm-none-eabi-gcc) in
    the same msys2 environment that I did not compile from source also
    contains arm-none-eabi-lto-dump.exe and it is also the biggest exe by
    far, but at least it is stripped and only 23,728,128







    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 22:21:23 2024
    Bart <bc@freeuk.com> wrote:

    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities. On Windows, C compilers tend to be self-contained (except for Clang which appears to be parasitical: it used to piggy-back onto gcc,
    then it switched to MSVC).

    You know that at source level there are separate projects: gcc proper,
    binutils and libc. libc provides C library, however header should
    be matched to the library, so libc also provides headers.

    Linux has distributions, which beside bare OS provide a lot of packages.
    Binary C library is used by almost all programs so is provided even
    in minimal install. Linux has package managers, so everyting you
    install may be split into small packages, but for user it is just
    knowing few crucial names, package manager will install all
    dependencies.

    AFAIK Windows alone does not have a package manager and you apparently
    reject package managers provided by third parties. So the only
    viable approach is to install big bundle ("self-contained compiler").
    There is also commercial aspect: even if this is free download
    commercial entity normally does not want to pass "sales" to other
    parties. OTOH open source project cooperate and acknowledge
    existence of other projects.

    I'm not sure what the utility to compile C programs is called, if it is
    not 'gcc'. But this is a C group, I would expect people to know it is a
    C compiler, or the front end of one.

    However I use 'gcc' in other forums and everyone knows what I mean.

    What do /you/ call the C compiler that is invoked by gcc?

    If you want to be technical you could say 'cc1'. But usually
    people know what you mean when you say 'gcc'.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 25 22:35:20 2024
    On 25/11/2024 03:17, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    On 24/11/2024 21:45, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
    standard library.
    Sure, those are all part of a C implementation, though they're not
    part of gcc.

    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    I'm not sure what you mean by "provided by the OS". Linux-based
    systems tend to be very modular, with almost everything provided by
    some installable binary package. Some of those packages have to
    be provided by default, for example any dynamic libraries relied
    on by most executables. Files that are needed for development,
    such as header files, compilers, and associated tools such as
    assemblers and linkers, may be optional.


    Well, does a C compiler for Linux come with its own stdio.h, or does it
    share /usr/include/stdio.h along with other compilers?

    C compilers for Windows tend to be self-contained. Except for clang (see below). So each has its own stdio.h.

    The only thing the OS provides is msvcrt.dll, a library of C standard functions, one which probably started out for internal use but too many programs now rely on it.



    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities.

    On my system (Ubuntu), the as and ld commands are provided by the
    binutils package ("binutils-x86-64-linux-gnu"). Some distributions
    may install these by default. Others do not, but they're easy
    to install.

    On Windows, C compilers tend to be self-contained (except
    for Clang which appears to be parasitical: it used to piggy-back onto
    gcc, then it switched to MSVC).

    I don't know what you mean by "piggy-back onto gcc".

    It relies for things like header files, linkers and libraries on an
    existing gcc installation.

    I used clang for 18 months before I realised this.

    Then they changed over to relying on MSVC for those facilities. This is
    when it started having trouble finding and syncing to MSVC, even when I
    had a working CL compiler.


    I'm not sure what the utility to compile C programs is called, if it
    is not 'gcc'. But this is a C group, I would expect people to know it
    is a C compiler, or the front end of one.

    However I use 'gcc' in other forums and everyone knows what I mean.

    What do /you/ call the C compiler that is invoked by gcc?

    I call it gcc.

    "gcc" is the name for several things. It's the "GNU Compiler
    Collection". It's the command invoked as the driver for any of
    several compilers that are part of the GNU Compiler Collection.
    It can refer specifically to the C compiler. It's mildly confusing
    for historical reasons, but most people don't have much of a
    problem with it, and don't pretend that it's more confusing than
    it really is.

    But you seem to like pointing out that gcc doesn't include header files, assemblers, linkers and libraries. And previously you claimed that:

    "gcc by itself can't compile hello.c to hello.s."

    So, what does it need? Is your point that it invokes a separate program
    like 'cc1.exe'? (Plus those 3 other binaries I listed in the case of tdm.)

    You also said:

    "You can't compile hello.c to hello.s without an OS kernel"

    I guess you mean that it needs on OS to provide a file system, means to
    launch an executable like gcc.exe in the first place, and a display for messages?

    That's a rather silly one. (However my first compiler did run on bare
    metal!)

    And hello.s isn't useful without an assembler, which is not treated
    as part of gcc

    I deliberately stopped at the assembly file because I knew you would
    leap at that.

    I assume that turning a .c file into .s/.asm is the very definition of
    what a baseline C compiler is expected to do.

    It seems you are still disputing this and casting confusing. C compilers
    for Windows such as lccwin32, Pelles C, DMC, tcc, my mcc/cc are all self-contained and can turn hello.c all the way to hello.exe.

    It is gcc that is always the exception, in every way (like generating
    a.exe files by default, or thinking that HELLO.C must be a C++ file).



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Nov 25 23:06:36 2024
    On 24/11/2024 21:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]

    That doesn't agree with my observations.

    Of course most of the headers and libraries are not part of gcc itself.
    As usual, you refer to the entire implementation as "gcc".

    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.

    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
    the executables.

    That sounds like a /very/ large size. A quick check of the pre-build
    Debian package for gcc-14 is about 90 MB installed. (That is for the C compiler - not binutils, or libraries.) C++ adds another 50% to that.
    Are you including the build directories with all the object files too?

    For a full gcc-based toolchain, I have lots of these for
    cross-compilation, each in individual directories. (Contrary to Bart's imagination, this is entirely possible - even on Windows. All it needs
    is appropriate configuration when building the toolchain.) A typical
    ARM toolchain is about 1 GB or so, including all the libraries, headers, debuggers, C and C++ support, binutils, gdb, documentation, and so on.
    Of that, maybe about 250 MB of that is executable files and 650 MB is pre-build libraries optimised for 20+ device families.


    The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.

    Of course there are other libraries that can be used with gcc, and they
    could take a lot of space -- but they're not part of gcc.

    These sizes might differ on Windows.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Nov 25 23:17:13 2024
    On 25/11/2024 11:21, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities. On Windows, C compilers tend to be self-contained (except for
    Clang which appears to be parasitical: it used to piggy-back onto gcc,
    then it switched to MSVC).

    You know that at source level there are separate projects: gcc proper, binutils and libc.

    Actually, no I don't. I said more on this in my reply to Keith a short
    while ago.

    My experience of C compilers on Windows is that they provide a means to
    turn .c files into executable files. Such a compiler on Windows
    generally has to be self-contained, since very little is provided by the OS.

    How the source code is structured, or how it's organised internally, is
    of little concern to me. My source code for cc.exe is also structured
    into different components, but I don't expect users to know or care
    about that.

    Those terms are simply how Linux (and Unix I guess) has decided a C
    compiler should be organised.

    So from my point of view, gcc is the outlier.

    (See: https://github.com/sal55/langs/blob/master/CompilerSuite.md

    This describes my current set of tools. Each .exe file is
    self-contained; no other program and no other file is needed to get from
    the input to any of the outputs.

    Processing some outputs may need one of the other programs or an
    external tool, but that is by choice. Both mm.exe/cc.exe can go straight
    to EXE without any help.

    The only thing not included in cc.exe is windows.h, because it is so
    massive.)

    libc provides C library, however header should
    be matched to the library, so libc also provides headers.

    There is no header that I can see for Windows' msvcrt.dll C runtime.
    (There was/is a Windows SDK, but that is a massive product mostly to do
    with WinAPI.)


    Linux has distributions, which beside bare OS provide a lot of packages. Binary C library is used by almost all programs so is provided even
    in minimal install. Linux has package managers, so everyting you
    install may be split into small packages, but for user it is just
    knowing few crucial names, package manager will install all
    dependencies.

    AFAIK Windows alone does not have a package manager and you apparently
    reject package managers provided by third parties. So the only
    viable approach is to install big bundle ("self-contained compiler").

    Other C compilers I've used on Windows (excluding monsters like gcc,
    clang, msvc) either have their own install routine or the process is
    trivial, such as extracting files from a ZIP file.

    Mine doesn't even need installing: you just run the EXE from anywhere!




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Nov 25 23:45:28 2024
    On 25/11/2024 10:30, Michael S wrote:
    On Sun, 24 Nov 2024 13:45:55 -0800
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    Bart <bc@freeuk.com> writes:
    On 24/11/2024 20:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate
    the core compiler.
    [...]
    That doesn't agree with my observations.
    Of course most of the headers and libraries are not part of gcc
    itself.
    As usual, you refer to the entire implementation as "gcc".
    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
    installing each into a new directory.
    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
    strip
    the executables.

    That's even huger than mine! So, that are those 3.7GB full of? What
    does the 1.9GB of executables do?

    I installed compilers for multiple languages. A more typical
    installation likely won't include compilers for Ada, Go, Fortran,
    Modula-2, and Rust. There are a number of hard links to other files;
    for example c++, g++, x86_64-pc-linux-gnu-c++, and
    x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
    clever enough to count them only once.

    Here's the output of `ls -s` on the bin directory (sizes are in units
    of 1024 bytes) :

    total 611908
    8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
    8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
    8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
    8820 gcc 9556 gnatchop 8820
    x86_64-pc-linux-gnu-gcc-14.2.0 156 gcc-ar 12564 gnatclean
    156 x86_64-pc-linux-gnu-gcc-ar 156 gcc-nm 7864 gnatkr
    156 x86_64-pc-linux-gnu-gcc-nm 152 gcc-ranlib 8564 gnatlink
    152 x86_64-pc-linux-gnu-gcc-ranlib 8828 gccgo 12764 gnatls
    8828 x86_64-pc-linux-gnu-gccgo 8820 gccrs 13584 gnatmake
    8820 x86_64-pc-linux-gnu-gccrs 7784 gcov 12236 gnatname
    8828 x86_64-pc-linux-gnu-gdc 6324 gcov-dump 12308 gnatprep
    8824 x86_64-pc-linux-gnu-gfortran 6468 gcov-tool 11136 go
    8960 x86_64-pc-linux-gnu-gm2 8828 gdc 620 gofmt
    8824 gfortran 308740 lto-dump


    67% of bin directory of i386 gcc13 compiler that I compiled from source
    on msys2 few months ago is a single huge executive:i386-elf-lto-dump.exe 410,230,002 bytes with symbols, 28,347,904 bytes stripped.
    Copying such file is not instant, even on SSD. Certainly takes time
    over internet.

    It does not look like I have any use for it, stripped or not. When I
    want dump, I use smaller utility, i386-elf-objdump.exe (14,740,647
    bytes with symbols, 2,242,048 bytes stripped) that already does more
    than I would know to use.


    LTO object files are vastly different beasts from normal object files,
    so it does not surprise me that the dump utility is so much bigger. If
    you don't use LTO, then presumably you will not need the lto-dump
    utility. (It is not a tool I have ever looked at myself.)

    When people build gcc themselves, it is not uncommon that they want
    binaries with symbols for debugging, testing, profiling, objdumping, or whatever - after all, most users use pre-build binaries. So it is not unreasonable to have at least some symbols with the binaries. But it
    seems here that you have built them with full debugging information, not
    just symbols. That is only really useful if you intend to run gcc
    itself under gdb. Stripping the binaries isn't going to make them any
    faster (at least, not under Linux - maybe in Windows the whole file is loaded), but it would make copying the files faster.


    Arm gcc12 compiler for small emebedded targets (arm-none-eabi-gcc) in
    the same msys2 environment that I did not compile from source also
    contains arm-none-eabi-lto-dump.exe and it is also the biggest exe by
    far, but at least it is stripped and only 23,728,128





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Nov 26 02:55:09 2024
    On Mon, 25 Nov 2024 13:45:28 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    LTO object files are vastly different beasts from normal object
    files, so it does not surprise me that the dump utility is so much
    bigger. If you don't use LTO, then presumably you will not need the
    lto-dump utility. (It is not a tool I have ever looked at myself.)


    I am pretty sure that even if I ever want to use LTO with gcc I'd still
    will have no need for lto-dump. What would matter for me in this case
    would be a final result (exe) rather than object files. And in order to
    look at exe I'd still use a normal objdump.

    The situation is not purely hypothetical. I regularly use LTCG with
    Microsoft tools. Never ever I wanted to disassemble .obj files after
    LTCG compilation. When occasionally I wanted to look at asm after LTCG,
    it always was an exe.







    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Tue Nov 26 03:27:46 2024
    Bart <bc@freeuk.com> writes:
    On 25/11/2024 11:21, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities. On Windows, C compilers tend to be self-contained (except for >>> Clang which appears to be parasitical: it used to piggy-back onto gcc,
    then it switched to MSVC).
    You know that at source level there are separate projects: gcc
    proper, binutils and libc.

    Actually, no I don't. I said more on this in my reply to Keith a short
    while ago.

    You don't know that after it's been explained to you dozens of times?

    My experience of C compilers on Windows is that they provide a means
    to turn .c files into executable files. Such a compiler on Windows
    generally has to be self-contained, since very little is provided by
    the OS.

    Bart, can you explain the difference between a C compiler and a C implementation? Or do you believe they're the same thing? (Hint:
    They're not.)

    [...]

    So from my point of view, gcc is the outlier.

    And what's wrong with that?

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Tue Nov 26 03:32:06 2024
    David Brown <david.brown@hesbynett.no> writes:
    On 24/11/2024 21:01, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    [...]
    Most of a gcc installation is hundreds of header and archive (.a)
    files for various libraries. There might be 32-bit and 64-bit
    versions. I understand that. But it also makes it hard to isolate the
    core compiler.
    [...]
    That doesn't agree with my observations.
    Of course most of the headers and libraries are not part of gcc
    itself.
    As usual, you refer to the entire implementation as "gcc".
    I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
    installing each into a new directory.
    The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
    strip the executables.

    That sounds like a /very/ large size. A quick check of the pre-build
    Debian package for gcc-14 is about 90 MB installed. (That is for the
    C compiler - not binutils, or libraries.) C++ adds another 50% to
    that. Are you including the build directories with all the object
    files too?

    It is very large, partly because the executables are not stripped
    (that's the default when building from source), and partly because I
    configured it for multiple languages. No cross-compilers.

    No, I'm not including the build directories, just the directory
    specified with "./configure --prefix=...".

    I might try doing a stripped installation for C only, just to see how
    big it is.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 26 04:25:42 2024
    On 25/11/2024 16:27, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    On 25/11/2024 11:21, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities. On Windows, C compilers tend to be self-contained (except for >>>> Clang which appears to be parasitical: it used to piggy-back onto gcc, >>>> then it switched to MSVC).
    You know that at source level there are separate projects: gcc
    proper, binutils and libc.

    Actually, no I don't. I said more on this in my reply to Keith a short
    while ago.

    You don't know that after it's been explained to you dozens of times?

    My experience of C compilers on Windows is that they provide a means
    to turn .c files into executable files. Such a compiler on Windows
    generally has to be self-contained, since very little is provided by
    the OS.

    Bart, can you explain the difference between a C compiler and a C implementation? Or do you believe they're the same thing? (Hint:
    They're not.)

    Well, I write language implementations, and I consider them largely the
    same thing.

    So who's right? Just becase a C compiler works in a certain peculiar way
    on one OS doesn't means that is the only way.

    Have a look at the 'CC' product described here, about half way down:

    https://github.com/sal55/langs/blob/master/CompilerSuite.md

    It is a single file that can turn into source into native code, or it
    can run it directly, or it can interpret it.

    I call this 0.3MB program a 'compiler'. I also call it an C
    implementation (technically, a C subset). (What would /you/ call what
    this program does?)

    All it lacks that you might quibble over is an implementation of the C standard library. I use a library that is part of Windows, and also use
    that same library from two other languages, neither of which is C.

    Technically, a 'C' compiler only needs to turn C source into some
    next-level representation. Beyond that it's pretty much a compiler like
    any other, not specific to C. A compiler may consider its job done when
    it gets to IR, or ASM source, or it may continue all the way to a
    running binary, like mine do.

    As to what gcc does and how it's classified, I'm past caring. Does it eventually produce a binary? Then that's all that matters.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Tue Nov 26 04:30:10 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 24/11/2024 21:45, Keith Thompson wrote:

    A more useful installation would of course need more standard headers,
    an assembler, linker, and whatever .a files are needed to provide the
    standard library.

    Sure, those are all part of a C implementation, though they're not part
    of gcc.


    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    Actually, no. The OS provides the dynamic linker and some os-specific
    header files. Pretty much everything else comes from various
    third-party packages.


    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities.

    None of those come from the OS. They come from separate packages
    produced by third parties (some, like gcc, binutils, etc come from
    the FSF, other libraries come from other sources).


    On Windows, C compilers tend to be self-contained (except for

    Leaving aside the fact that Windows has always been a toy
    environment, all the tools you complain about were developed
    on, and primarily for UNIX and derivitives. Not windows.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 26 04:50:10 2024
    On 25/11/2024 18:30, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 24/11/2024 21:45, Keith Thompson wrote:

    A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
    standard library.

    Sure, those are all part of a C implementation, though they're not part
    of gcc.


    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    Actually, no. The OS provides the dynamic linker and some os-specific
    header files. Pretty much everything else comes from various
    third-party packages.


    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities.

    None of those come from the OS. They come from separate packages
    produced by third parties (some, like gcc, binutils, etc come from
    the FSF, other libraries come from other sources).


    And of course there are different standard C libraries available, as
    well as different C compilers, and you can mix and match - gcc with
    musl, clang with glibc, icc with newlib, etc. There has to be a certain degree of cooperation and compatibility for a compiler and a library to
    work together, but they can be (and usually are) separate projects from separate groups.


    On Windows, C compilers tend to be self-contained (except for

    Leaving aside the fact that Windows has always been a toy
    environment, all the tools you complain about were developed
    on, and primarily for UNIX and derivitives. Not windows.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Nov 26 04:54:29 2024
    On 25/11/2024 16:55, Michael S wrote:
    On Mon, 25 Nov 2024 13:45:28 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    LTO object files are vastly different beasts from normal object
    files, so it does not surprise me that the dump utility is so much
    bigger. If you don't use LTO, then presumably you will not need the
    lto-dump utility. (It is not a tool I have ever looked at myself.)


    I am pretty sure that even if I ever want to use LTO with gcc I'd still
    will have no need for lto-dump.

    That is quite plausible. I only occasionally have use for objdump, and
    I suspect many programmers never use it at all. I doubt if I'd use the lto-dump version much if and when I start using LTO seriously.

    What would matter for me in this case
    would be a final result (exe) rather than object files. And in order to
    look at exe I'd still use a normal objdump.


    Again, I don't doubt you are correct.

    All I am saying is that it does not surprise me that the lto-dump
    program is significantly bigger than objdump. And presumably some
    people do find it useful.

    The situation is not purely hypothetical. I regularly use LTCG with
    Microsoft tools. Never ever I wanted to disassemble .obj files after
    LTCG compilation. When occasionally I wanted to look at asm after LTCG,
    it always was an exe.





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Tue Nov 26 05:49:27 2024
    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1
    speedup you might get by optimising it is vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Tue Nov 26 05:50:25 2024
    Bart <bc@freeuk.com> writes:

    On 25/11/2024 16:27, Keith Thompson wrote:

    Bart, can you explain the difference between a C compiler and a C
    implementation? Or do you believe they're the same thing? (Hint:
    They're not.)

    Well, I write language implementations, and I consider them largely
    the same thing.

    So who's right?

    In comp.lang.c, the C standard is right.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 26 06:46:45 2024
    On 25/11/2024 17:30, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 24/11/2024 21:45, Keith Thompson wrote:

    A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
    standard library.

    Sure, those are all part of a C implementation, though they're not part
    of gcc.


    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    Actually, no. The OS provides the dynamic linker and some os-specific
    header files. Pretty much everything else comes from various
    third-party packages.


    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities.

    None of those come from the OS.

    So, if I install 5 distinct C compilers on Linux, will they each come
    with their own stdio.h, or will they use the common one in /usr/include?



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 26 07:19:04 2024
    On 25/11/2024 18:49, Tim Rentsch wrote:
    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1
    speedup you might get by optimising it is vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours


    I'm trying to think of some computationally intensive app that would run non-stop for several hours without interaction.

    If you dig back throug the thread, you will see that I am not against compiling with optimisations for production code. But for very frequent routine builds I want it as fast as possible.

    For such a task as your example might do, you would spend some time
    testing on shorter examples and getting the best algorithm. Once you
    feel it's the best, /then/ you can think about getting it optimised. It doesn't even matter how long it takes, if it's going to take hours anyway.


    I thought of one artificial example, it's a C program to display the
    Fibonacci sequence 1 to 100 using the recursive function for each fib(i).

    I compiled it with gcc-O3 and set going. While it was doing that, I set
    up the same test using my interpreted language. It was much slower
    obviously. So I added memoisation. Now it showed all 100 values
    instantly (the C version meanwhile is in the low 50s).

    I noticed however that it overflowed the 64-bit range at around fib(93)
    (as the C version might do eventually). So I tweaked my 'slow' version
    to use bignum values. Then I tweaked it again to show the first 10,000
    values.

    At this point, the optimised C was still in the mid 50s.

    The point is, for such a task as this, you do as much as you can to
    bring down the runtime, which could reduce it by a magnitude or two with
    the right choices.

    Adding -O3 at the end is a nice bonus speedup, but that's all it is.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Tue Nov 26 07:32:01 2024
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Bart <bc@freeuk.com> writes:
    On 25/11/2024 16:27, Keith Thompson wrote:
    Bart, can you explain the difference between a C compiler and a C
    implementation? Or do you believe they're the same thing? (Hint:
    They're not.)

    Well, I write language implementations, and I consider them largely
    the same thing.

    So who's right?

    In comp.lang.c, the C standard is right.

    Agreed, but the C standard doesn't define the word "compiler",
    and uses it only in non-normative text (I searched N3096).

    What I consider to be a "compiler" is the program or programs that
    implement translation phases 1 through 7. (The 8th and final phase
    is linking.)

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Tue Nov 26 07:51:48 2024
    Bart <bc@freeuk.com> writes:
    [...]
    So, if I install 5 distinct C compilers on Linux, will they each come
    with their own stdio.h, or will they use the common one in
    /usr/include?

    History does not suggest that you actually care about the answer,
    but I'll give you one anyway.

    It depends on how each compiler is configured. On my system,
    gcc, clang, and tcc all use /usr/include/stdio.h, but musl-gcc (a
    wrapper that invokes gcc with options to use musl, an alternative
    C library implementation). Or I can invoke any of those compilers
    with options to use some other library implementation.

    Remember that typical Linux-based systems are very modular, with system
    files provided via the package manager. The files that make up a C implementation are provided by multiple different package. Package dependencies are managed in such a way that installing a full C
    implementation is reasonably straightforward.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Tue Nov 26 08:29:48 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 25/11/2024 18:49, Tim Rentsch wrote:


    I'm trying to think of some computationally intensive app that would run >non-stop for several hours without interaction.

    I can think of several - HDL simulators (vcs, et al), system simulators
    like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
    Machine Learning training, et alia.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 26 10:20:04 2024
    On 25/11/2024 21:29, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 25/11/2024 18:49, Tim Rentsch wrote:


    I'm trying to think of some computationally intensive app that would run
    non-stop for several hours without interaction.

    I can think of several - HDL simulators (vcs, et al), system simulators
    like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
    Machine Learning training, et alia.

    OK, good.

    So the only preparation you have to do to get those running at maximum
    speed is just to use -O3 on your compilers instead of -O0.

    Understood. You don't need to need to worry about anything else.


    However I assume that has already been done when building products like
    LLVM (which apparently takes somewhat longer then one second to build), however I keep seeing comments about it like this:

    "I think the biggest complaint is compile time."

    "but if you want fast compile times or just "O1" instead of "O3" level performance, it can feel like overkill."

    "ah this seems like two very different use cases. Stating the obvious:
    when debugging I want as fast builds as possible. When shipping I want
    as fast software as possible."

    Apparently this is not obvious to anybody here except me!



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Tue Nov 26 12:09:54 2024
    Reply-To: slp53@pacbell.net

    Bart <bc@freeuk.com> writes:
    On 25/11/2024 21:29, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 25/11/2024 18:49, Tim Rentsch wrote:


    I'm trying to think of some computationally intensive app that would run >>> non-stop for several hours without interaction.

    I can think of several - HDL simulators (vcs, et al), system simulators
    like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
    Machine Learning training, et alia.

    OK, good.

    So the only preparation you have to do to get those running at maximum
    speed is just to use -O3 on your compilers instead of -O0.

    That appears to be your opinion. It is not shared by myself
    nor any programmer I've ever met.


    Understood. You don't need to need to worry about anything else.

    How do you conclude that based on a simple list of applications?

    Everything from the initial design proposal to the selection of
    implementation language to the characteristics of the data structures
    to the algorithms chosen are part of the process of creating a real-world application. The actually compiler flags are in the noise, for the
    most part.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Nov 26 12:28:47 2024
    On 26/11/2024 01:09, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 25/11/2024 21:29, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 25/11/2024 18:49, Tim Rentsch wrote:


    I'm trying to think of some computationally intensive app that would run >>>> non-stop for several hours without interaction.

    I can think of several - HDL simulators (vcs, et al), system simulators
    like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
    Machine Learning training, et alia.

    OK, good.

    So the only preparation you have to do to get those running at maximum
    speed is just to use -O3 on your compilers instead of -O0.

    That appears to be your opinion. It is not shared by myself
    nor any programmer I've ever met.


    Understood. You don't need to need to worry about anything else.

    How do you conclude that based on a simple list of applications?

    Everything from the initial design proposal to the selection of implementation language to the characteristics of the data structures
    to the algorithms chosen are part of the process of creating a real-world application. The actually compiler flags are in the noise, for the
    most part.

    That's my point.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Tue Nov 26 23:29:55 2024
    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1
    speedup you might get by optimising it is vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that would
    run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Wed Nov 27 00:31:30 2024
    On 26/11/2024 12:29, Tim Rentsch wrote:
    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1
    speedup you might get by optimising it is vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that would
    run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single source file called sql.c for example, that's how long it takes for gcc to create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Testing it might only take a second:

    c:\cx>type input
    select 2+2;

    c:\cx>sql <input
    4

    With a different compiler then the edit-run cycle can be a lot nippier:

    c:\cx>tm cc sql
    Compiling sql.c to sql.exe
    TM: 0.27

    If that is still onerous, I can try interpreting:

    c:\cx>tm cc -i sql <input
    Compiling sql.c to sql.(int)
    4
    TM: 0.19

    So compiling to IL, then interpreting that IL (which is 40 times slower
    than native code), /and/ running my test, takes 1/5th of a second in all.

    That's 40 times faster than the equivalent with gcc-O0 (despite the interpreted part being 40 times slower!):

    c:\cx\tm test.bat
    c:\cx>gcc sql.c -osql.exe && sql 0<input
    4
    TM: 7.74

    And 200 times faster than gcc-O2 which everyone here seems to be
    recommending:

    c:\cx>tm test.bat
    c:\cx>gcc sql.c -O2 -osql.exe && sql 0<input
    4
    TM: 38.60

    Some might advise not working with such a large single source module at
    all, but that is the task here. If trying to investigate why my cc
    product is failing, I might put in tracing statements into the source,
    and compile and run with both compilers to compare the outputs. For such
    a purpose, -O2 or -O3 is utterly pointless.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Wed Nov 27 21:47:11 2024
    Bart <bc@freeuk.com> wrote:
    On 25/11/2024 17:30, Scott Lurndal wrote:
    Bart <bc@freeuk.com> writes:
    On 24/11/2024 21:45, Keith Thompson wrote:

    A more useful installation would of course need more standard headers, >>>>> an assembler, linker, and whatever .a files are needed to provide the >>>>> standard library.

    Sure, those are all part of a C implementation, though they're not part >>>> of gcc.


    This seems to be a thing with Linux, where a big chunk of a C
    implementation is provided by the OS.

    Actually, no. The OS provides the dynamic linker and some os-specific
    header files. Pretty much everything else comes from various
    third-party packages.


    That is, standard headers, libraries, possibly even 'as' and 'ld'
    utilities.

    None of those come from the OS.

    So, if I install 5 distinct C compilers on Linux, will they each come
    with their own stdio.h, or will they use the common one in /usr/include?

    It depends on the compiler. IIUC your compiler has its own stdio.h.
    There was 'Tendra C compiler' (tcc in short) which had its own
    handling of headers. Basically, there was internal compiler
    magic to activate headers. I do not remember if "real" headers were
    just part of compiler executable or were kept in files. But
    real header data were in compiler-specific format. You could
    not look for stdio.h to see function declarations, I do not
    remember if stdio.h was present as real file, but if it were
    present it would contain only some compiler-specific magic
    to activate declarations. In other words, Tendra did not
    use system headers and its headers were unusable for other
    compilers.

    If you ask why, the reason was portability and standard compilance.
    Tendra was supposed to give you the same results on wide
    selection of machines, provided that machines supported
    appropriater API-s. I do not remember how/if they handled
    32 versus 64 bit issue, but their headers were claimed to be
    100% standard compliant, as opposed to vendor headers which
    often had various incompatibilites. They also provided
    wrapper libraries so that when you called their wrapper
    you got standard-specified behaviour (vendor libraries
    frequently violated standards). Concerning API-s, they
    went quite a bit beyond standard C and provided several
    industry standards.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Thu Nov 28 10:23:32 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 16:27, Keith Thompson wrote:

    Bart, can you explain the difference between a C compiler and a C
    implementation? Or do you believe they're the same thing? (Hint:
    They're not.)

    Well, I write language implementations, and I consider them largely
    the same thing.

    So who's right?

    In comp.lang.c, the C standard is right.

    Agreed, but the C standard doesn't define the word "compiler",
    and uses it only in non-normative text (I searched N3096).

    That makes no difference to my point, which is about word
    usage, not about what is or isn't C. It is clear that the
    C standard considers a compiler and an implementation to be
    two different things.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Thu Nov 28 16:18:09 2024
    Bart <bc@freeuk.com> writes:

    On 26/11/2024 12:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1
    speedup you might get by optimising it is vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that would
    run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single source
    file called sql.c for example, that's how long it takes for gcc to
    create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed. And of course you picked the farthest-most
    outlier as your example, grossly misrepresenting any sort of
    average or typical case.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Thu Nov 28 23:37:15 2024
    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Bart <bc@freeuk.com> writes:

    On 26/11/2024 12:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the
    2:1 speedup you might get by optimising it is vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that
    would run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single source
    file called sql.c for example, that's how long it takes for gcc to
    create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much.
    I would think that in field of compiled-code HDL simulation people are interested in compilation of as big sources as the can afford.

    And of course you picked the farthest-most
    outlier as your example, grossly misrepresenting any sort of
    average or typical case.

    I remember having much shorter file (core of 3rd-party TCP protocol implementation) where compilation with gcc took several seconds.

    Looked at it now - only 22 Klocs.
    Text size in .o - 34KB.
    Compilation time on much newer computer than the one I remembered, with
    good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
    with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
    times longer.
    So, in terms of Klock/sec it seems to me that time reported by Bart
    is not outrageous. Indeed, gcc is very slow when compiling any source
    several times above average size.
    In this particular case I can not compare gcc to alternative, because
    for a given target (Altera Nios2) there are no alternatives.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 29 01:27:25 2024
    On 28/11/2024 05:18, Tim Rentsch wrote:
    Bart <bc@freeuk.com> writes:

    On 26/11/2024 12:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of compilers
    (which can vary by 100:1), but for the generated programs, the 2:1 >>>>>> speedup you might get by optimising it is vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that would
    run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single source
    file called sql.c for example, that's how long it takes for gcc to
    create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed. And of course you picked the farthest-most
    outlier as your example, grossly misrepresenting any sort of
    average or typical case.

    It's not atypical for me! I explained why I might use such a file.

    And for me, used to decades of sub-one-second response times, 7 seconds
    seems like for ever. Especially when there is no feedback at all from gcc.

    When my tools had to compile multiple modules they would show a progress report as each one was processed.

    gcc says nothing (unless you use --verbose then it spews reams of junk
    for every file). Maybe after a few seconds it's 90% done, or maybe 10%;
    who knows?

    Also, you haven't really explained why someone should wait an extra 7
    seconds for a task that can clearly be accomplished in a fraction of
    second, given that gcc-O0 generates equally poor code.

    Nor why a production version of gcc needs to be itself built with -O3
    anyway. Since it sounds like an unoptimised version would only ever take
    an extra second or two on any of your tiny inputs!

    (And with David Brown's projects where apparently the gcc compiler is
    either never invoked, or always finishes in milliseconds, it would make
    no measurable difference at all.)

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 29 02:25:48 2024
    On 28/11/2024 12:37, Michael S wrote:
    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much.

    This file mostly comprises sqlite3.c which is a machine-generated
    amalgamation of some 100 actual C files.

    You wouldn't normally do development with that version, but in my
    scenario, where I was trying to find out why the version built with my compiler was buggy, I might try adding debug info to it then building
    with a working compiler (eg. gcc) to compare with.

    But, yes, when I used to do more transpilation to C, then the generated
    code would be a single C source file. That one could also require
    frequent recompiles as C, if there were bugs in the process.

    Then the differences in compile-time of the C are clear; here,
    generating qc.c from the original sources took 0.09 seconds:

    c:\qx>tm gcc qc.c GCC -O0
    TM: 2.28

    c:\qx>tm tc qc TCC from a script as it's messy
    c:\qx>tcc qc.c c:\windows\system32\user32.dll -luser32 c:\windows\system32\kernel32.dll -fdollars-in-identifiers
    TM: 0.23

    c:\qx>tm cc qc Using my C compiler
    Compiling qc.c to qc.exe
    TM: 0.11

    c:\qx>tm mm qc Compile original source to EXE
    Compiling qc.m to qc.exe
    TM: 0.09

    c:\qx>tm gcc -O2 qc.c GCC -O2
    TM: 11.02

    Usually tcc is faster than my product, but something about the generated
    C (maybe long, messy identifiers) is slowing it down. But it is still 10
    times faster than gcc-O0.

    The last timing is gcc generating optimised code; usually the only
    reason why gcc would be used. Then it takes 120 times longer to create
    the executable than my normal native build process.

    Tim isn't asking the right questions (or any questions!). WHY does gcc
    take so long to generate indifferent code when the task can clearly be
    done at least a magnitude faster?

    Whatever it is it's doing, why isn't there an option to skip that for a streamlined build? (Maybe you accidentally deleted the EXE and need to recreate it; it doesn't need the same analysis.)

    I've several times suggested that gcc should have an -O-1 option that
    runs a secretly bundled version of Tiny C.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Fri Nov 29 02:46:31 2024
    On Thu, 28 Nov 2024 15:25:48 +0000
    Bart <bc@freeuk.com> wrote:


    I've several times suggested that gcc should have an -O-1 option that
    runs a secretly bundled version of Tiny C.


    Hopefully, you are not serious about it.
    The differences between gcc and tcc goes well beyond code
    analysis warnings or code generation. tcc does not even support full
    c99, although it is very close to it. Much less so, new versions of C
    standard. Also, while tcc supports few gnu extensions, it certainly does
    not support all of them.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Fri Nov 29 04:28:06 2024
    On 28.11.2024 15:27, Bart wrote:
    [ compilation times ]

    And for me, used to decades of sub-one-second response times, 7 seconds
    seems like for ever. [...]

    Sub-seconds is very important in response times of interactive tools;
    I recall we've measured, e.g. for GUI applications, the exact timing,
    and we've taken into account results of psychological sciences. The
    accepted response times for our applications were somewhere around
    0.20 seconds, and even 0.50 seconds was by far unacceptable.

    But we're speaking about compilation times. And I'm a bit astonished
    about a sub-second requirement or necessity. I'm typically compiling
    source code after I've edited it, where the latter is by far the most dominating step. And before the editing there's usually the analysis
    of code, that requires even more time than the simple but interactive
    editing process. When I start the compile all the major time demanding
    tasks that are necessary to create the software fix have already been
    done, and I certainly don't need a sub-second response from compiler.

    Though I observed a certain behavior of programmers who use tools with
    a fast response time. Since it doesn't cost anything they just make a
    single change and compile to see whether it works, and, rinse repeat,
    do that for every _single_ change *multiple* times. My own programming
    habits got also somewhat influenced by that, though I still try to fix
    things in brain before I ask the compiler what it thinks of my change.
    This is certainly influenced by the mainframe days where I designed my algorithms on paper, punched my program on a stack of punch cards, and
    examined and fixed the errors all at once. The technical situation has
    changed (mostly improved) during the decades but the habits (how often
    you start a compiler in the development process cycle), I think, has
    also changed, but not necessarily improved.

    Yes, I understand that it seems to you that 7 seconds is like forever
    if you see the compiler as an instant-responder interactive tool.

    BTW; it may be worthwhile (for those who compile often, probably more
    often than necessary, and want the compilation results instantly) to
    consider tools that compile in parallel while editing their code.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Fri Nov 29 05:25:59 2024
    On 28/11/2024 17:28, Janis Papanagnou wrote:
    On 28.11.2024 15:27, Bart wrote:
    [ compilation times ]

    And for me, used to decades of sub-one-second response times, 7 seconds
    seems like for ever. [...]

    Sub-seconds is very important in response times of interactive tools;
    I recall we've measured, e.g. for GUI applications, the exact timing,
    and we've taken into account results of psychological sciences. The
    accepted response times for our applications were somewhere around
    0.20 seconds, and even 0.50 seconds was by far unacceptable.

    But we're speaking about compilation times. And I'm a bit astonished
    about a sub-second requirement or necessity. I'm typically compiling
    source code after I've edited it, where the latter is by far the most dominating step. And before the editing there's usually the analysis
    of code, that requires even more time than the simple but interactive
    editing process.

    You can make a similar argument about turning on the light switch when entering a room. Flicking light switches is not something you need to do
    every few seconds, but if the light took 5 seconds to come on (or even
    one second), it would be incredibly annoying.

    It would stop the fluency of whatever you were planning to do. You might
    even forget why you needed to go into the room in the first place.

    When I start the compile all the major time demanding
    tasks that are necessary to create the software fix have already been
    done, and I certainly don't need a sub-second response from compiler.

    Though I observed a certain behavior of programmers who use tools with
    a fast response time. Since it doesn't cost anything they just make a
    single change and compile to see whether it works, and, rinse repeat,
    do that for every _single_ change *multiple* times.

    Well, what's wrong with that? It's how lots of things already work, by
    doing things incrementally.

    If recompiling an entire program of any size really was instant, would
    you still work exactly the same way?

    People find scripting languages productive, partly because there is no discrete build step.

    My own programming
    habits got also somewhat influenced by that, though I still try to fix
    things in brain before I ask the compiler what it thinks of my change.
    This is certainly influenced by the mainframe days where I designed my algorithms on paper, punched my program on a stack of punch cards, and examined and fixed the errors all at once.

    I also remember using punched cards at college. But generally it was
    using an interactive terminal. Compiling and linking were still big
    deals when using mini- and mainframe computers.

    Oddly, it was only using tiny, underpowered microprocessor systems,
    that I realised how fast language tools really could be. At least the
    ones I wrote.

    Those ported from bigger computers would take minutes for the simplest program, as I later found. Mine took seconds or fraction of a second.
    Part of that was down to using a resident compile/IDE that kept things
    in memory as much as possible, since floppy disks were slow.

    Here's a test: how many times can you twiddle your thumbs while waiting
    for something to build? (That is, put your hands together with
    interlocked fingers, and rotate your thumbs around each other).

    I can only manage 3-4 - if building an artificial 1Mloc benchmark.
    Otherwise it's impossible to even put my hands together.

    In 7 seconds I can do nearly 25 twiddles. That's a really useful use of
    my time!



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Fri Nov 29 20:36:03 2024
    On 28.11.2024 19:25, Bart wrote:
    On 28/11/2024 17:28, Janis Papanagnou wrote:
    On 28.11.2024 15:27, Bart wrote:
    [ compilation times ]

    And for me, used to decades of sub-one-second response times, 7 seconds
    seems like for ever. [...]

    Sub-seconds is very important in response times of interactive tools;
    I recall we've measured, e.g. for GUI applications, the exact timing,
    and we've taken into account results of psychological sciences. The
    accepted response times for our applications were somewhere around
    0.20 seconds, and even 0.50 seconds was by far unacceptable.

    But we're speaking about compilation times. And I'm a bit astonished
    about a sub-second requirement or necessity. I'm typically compiling
    source code after I've edited it, where the latter is by far the most
    dominating step. And before the editing there's usually the analysis
    of code, that requires even more time than the simple but interactive
    editing process.

    You can make a similar argument about turning on the light switch when entering a room. Flicking light switches is not something you need to do every few seconds, but if the light took 5 seconds to come on (or even
    one second), it would be incredibly annoying.

    It is. (It was with flickering fluorescent lamps in the past and is
    with the contemporary energy saving lamps nowadays that need time to
    radiate in full glory.) - But I'm not making comparisons/parables;
    I made a concrete argument and coupled it with behavioral patterns
    and work processes in the context we were speaking about, compiling.


    It would stop the fluency of whatever you were planning to do. You might
    even forget why you needed to go into the room in the first place.

    When I start the compile all the major time demanding
    tasks that are necessary to create the software fix have already been
    done, and I certainly don't need a sub-second response from compiler.

    Though I observed a certain behavior of programmers who use tools with
    a fast response time. Since it doesn't cost anything they just make a
    single change and compile to see whether it works, and, rinse repeat,
    do that for every _single_ change *multiple* times.

    Well, what's wrong with that? It's how lots of things already work, by
    doing things incrementally.

    There's nothing "wrong" with it. (I just consider it non-ergonomic
    in the edit-compile-loop context I described.) You can (and should)
    do what you prefer and what works for you - unless you work and
    operate in a larger project context where efficient processes may
    (or may not) conflict with your habits.


    If recompiling an entire program of any size really was instant, would
    you still work exactly the same way?

    (I addressed that in my previous post.)


    People find scripting languages productive, partly because there is no discrete build step.

    (There are many reasons for using scripting languages; at least for
    those that I use. And there are reasons to not use them.)

    And there's reasons for using compiled and strongly typed languages.
    One I already mentioned in my previous post; you see all errors at
    once and can fix them in one iteration. - I seem to recall that you
    are somewhat familiar with Algol 68; its error messages fosters an
    efficient error correction process.

    The point was and still is that it's inefficient to save seconds in
    compiling and spend much more time in your edit-compile iterations.

    The rest can be re-read if you missed that I wrote "I understand"
    your edit-compile habits as an effect of being used to instant
    responsive compilers [for the sort of code you are doing, in the
    project context you are working, with the software organization
    you have, and the development processes you apply].


    My own programming
    habits got also somewhat influenced by that, though I still try to fix
    things in brain before I ask the compiler what it thinks of my change.
    This is certainly influenced by the mainframe days where I designed my
    algorithms on paper, punched my program on a stack of punch cards, and
    examined and fixed the errors all at once.

    I also remember using punched cards at college. But generally it was
    using an interactive terminal. Compiling and linking were still big
    deals when using mini- and mainframe computers.

    I have (and also heard of) different experiences. (Like hitting the
    Enter key on an interactive terminal to start a job and instantly
    getting the prompt back.) Myself I worked with punch cards only on
    mechanical punch terminals and then put the stack of cards in a
    batch queue that got processed (with other jobs) at occasion; the
    build times, compiles/links, were not an issue anyway with those
    mainframes (TR, CDC, 360-clone). When we switched to interactive (non-mainframe) systems the processes got slower, much more time
    consuming.


    Oddly, it was only using tiny, underpowered microprocessor systems, that
    I realised how fast language tools really could be. At least the ones I wrote.

    Sure.

    Janis

    [...]



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 10:29:07 2024
    Bart <bc@freeuk.com> writes:

    On 28/11/2024 17:28, Janis Papanagnou wrote:

    On 28.11.2024 15:27, Bart wrote:

    [ compilation times ]

    And for me, used to decades of sub-one-second response times, 7
    seconds seems like for ever. [...]

    Sub-seconds is very important in response times of interactive
    tools; I recall we've measured, e.g. for GUI applications, the
    exact timing, and we've taken into account results of psychological
    sciences. The accepted response times for our applications were
    somewhere around 0.20 seconds, and even 0.50 seconds was by far
    unacceptable.

    But we're speaking about compilation times. And I'm a bit
    astonished about a sub-second requirement or necessity. I'm
    typically compiling source code after I've edited it, where the
    latter is by far the most dominating step. And before the editing
    there's usually the analysis of code, that requires even more time
    than the simple but interactive editing process.

    You can make a similar argument about turning on the light switch
    when entering a room. Flicking light switches is not something you
    need to do every few seconds, but if the light took 5 seconds to
    come on (or even one second), it would be incredibly annoying.

    This analogy sounds like something a defense attorney would say who
    has a client that everyone knows is guilty.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 30 13:46:18 2024
    On 30.11.2024 00:29, Tim Rentsch wrote:
    Bart <bc@freeuk.com> writes:
    On 28/11/2024 17:28, Janis Papanagnou wrote:

    But we're speaking about compilation times. [...]

    You can make a similar argument about turning on the light switch
    when entering a room. Flicking light switches is not something you
    need to do every few seconds, but if the light took 5 seconds to
    come on (or even one second), it would be incredibly annoying.

    This analogy sounds like something a defense attorney would say who
    has a client that everyone knows is guilty.

    Intentionally or not; it's funny to respond to an analogy with an
    analogy. :-}

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 15:40:11 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 30.11.2024 00:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 28/11/2024 17:28, Janis Papanagnou wrote:

    But we're speaking about compilation times. [...]

    You can make a similar argument about turning on the light switch
    when entering a room. Flicking light switches is not something you
    need to do every few seconds, but if the light took 5 seconds to
    come on (or even one second), it would be incredibly annoying.

    This analogy sounds like something a defense attorney would say who
    has a client that everyone knows is guilty.

    Intentionally or not; it's funny to respond to an analogy with an
    analogy. :-}

    My statement was not an analogy. Similar is not the same as
    analogous.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 16:03:17 2024
    Bart <bc@freeuk.com> writes:

    On 28/11/2024 05:18, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 26/11/2024 12:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of
    compilers (which can vary by 100:1), but for the generated
    programs, the 2:1 speedup you might get by optimising it is
    vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that
    would run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single
    source file called sql.c for example, that's how long it takes for
    gcc to create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed. And of course you picked the farthest-most
    outlier as your example, grossly misrepresenting any sort of
    average or typical case.

    It's not atypical for me! [...]

    I can easily accept that it might be typical for you. My
    point is that it is not typical for almost everyone else.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 16:25:15 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Bart <bc@freeuk.com> writes:

    On 26/11/2024 12:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of
    compilers (which can vary by 100:1), but for the generated
    programs, the 2:1 speedup you might get by optimising it is
    vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that
    would run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single
    source file called sql.c for example, that's how long it takes for
    gcc to create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much. I would think
    that in field of compiled-code HDL simulation people are interested
    in compilation of as big sources as the can afford.

    Sure. But Bart is implicitly saying that such cases make up the
    bulk of C compilations, whereas in fact the reverse is true. People
    don't care about Bart's complaint because the circumstances of his
    examples almost never apply to them. And he must know this, even
    though he tries to pretend he doesn't.

    And of course you picked the farthest-most
    outlier as your example, grossly misrepresenting any sort of
    average or typical case.

    I remember having much shorter file (core of 3rd-party TCP protocol implementation) where compilation with gcc took several seconds.

    Looked at it now - only 22 Klocs.
    Text size in .o - 34KB.
    Compilation time on much newer computer than the one I remembered, with
    good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
    with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
    times longer.
    So, in terms of Klock/sec it seems to me that time reported by Bart
    is not outrageous. Indeed, gcc is very slow when compiling any source several times above average size.
    In this particular case I can not compare gcc to alternative, because
    for a given target (Altera Nios2) there are no alternatives.

    I'm not disputing his ratios on compilation speeds. I implicitly
    agreed to them in my earlier remarks. The point is that the
    absolute times are so small that most people don't care. For
    some reason I can't fathom Bart does care, and apparently cannot
    understand why most other people do not care. My conclusion is
    that Bart is either quite immature or a narcissist. I have tried
    to explain to him why other people think differently than he does,
    but it seems he isn't really interested in having it explained.
    Oh well, not my problem.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Nov 30 21:00:30 2024
    On 30.11.2024 05:40, Tim Rentsch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 30.11.2024 00:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 28/11/2024 17:28, Janis Papanagnou wrote:

    But we're speaking about compilation times. [...]

    You can make a similar argument about turning on the light switch
    when entering a room. Flicking light switches is not something you
    need to do every few seconds, but if the light took 5 seconds to
    come on (or even one second), it would be incredibly annoying.

    This analogy sounds like something a defense attorney would say who
    has a client that everyone knows is guilty.

    Intentionally or not; it's funny to respond to an analogy with an
    analogy. :-}

    My statement was not an analogy. Similar is not the same as
    analogous.

    It's of course (and obviously) not the same; it's just a
    similar term where the semantics of both terms have an overlap.

    (Not sure why you even bothered to reply and nit-pick here.
    But with your habit you seem to have just missed the point;
    the comparison of your reply-type with Bart's argumentation.)

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Nov 30 22:26:41 2024
    On 30/11/2024 05:25, Tim Rentsch wrote:
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Bart <bc@freeuk.com> writes:

    On 26/11/2024 12:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of
    compilers (which can vary by 100:1), but for the generated
    programs, the 2:1 speedup you might get by optimising it is
    vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above):

    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that
    would run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single
    source file called sql.c for example, that's how long it takes for
    gcc to create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much. I would think
    that in field of compiled-code HDL simulation people are interested
    in compilation of as big sources as the can afford.

    Sure. But Bart is implicitly saying that such cases make up the
    bulk of C compilations, whereas in fact the reverse is true. People
    don't care about Bart's complaint because the circumstances of his
    examples almost never apply to them. And he must know this, even
    though he tries to pretend he doesn't.

    And of course you picked the farthest-most
    outlier as your example, grossly misrepresenting any sort of
    average or typical case.

    I remember having much shorter file (core of 3rd-party TCP protocol
    implementation) where compilation with gcc took several seconds.

    Looked at it now - only 22 Klocs.
    Text size in .o - 34KB.
    Compilation time on much newer computer than the one I remembered, with
    good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
    with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
    times longer.
    So, in terms of Klock/sec it seems to me that time reported by Bart
    is not outrageous. Indeed, gcc is very slow when compiling any source
    several times above average size.
    In this particular case I can not compare gcc to alternative, because
    for a given target (Altera Nios2) there are no alternatives.

    I'm not disputing his ratios on compilation speeds. I implicitly
    agreed to them in my earlier remarks. The point is that the
    absolute times are so small that most people don't care. For
    some reason I can't fathom Bart does care, and apparently cannot
    understand why most other people do not care. My conclusion is
    that Bart is either quite immature or a narcissist. I have tried
    to explain to him why other people think differently than he does,
    but it seems he isn't really interested in having it explained.
    Oh well, not my problem.

    EVERYBODY cares about compilation speeds. Except in this newsgroup where people try to pretent that it's irrelevant.

    But then at the same time, they strive to keep those compile-times small:

    * By using tools that have themselves been optimised to reduce their
    runtimes, and where considerable resources have been expended to get the
    best possible code, which naturally also benefits the tool

    * By using the fastest possible hardware

    * By trying to do parallel builds across multiple cores

    * By organising source code into artificially small modules so that recompilation of just one module is quicker. So, relying on independent compilation.

    * By going to considerable trouble to define inter-dependencies between modules, so that a make system can AVOID recompiling modules. (Why on
    earth would it need to? Oh, because it would be slower!)

    * By using development techniques involving thinking deeply about what
    to change, to avoid a costly rebuild.

    Etc.

    All instead of relying on raw compilation speed and a lot of those
    points become less relevant.

    My conclusion is
    that Bart is either quite immature or a narcissist.

    I'd never bothered much about compile-speed in the past, except to
    ensure that an edit-run cycle was usually a fraction of second, except
    when I had to compile all modules of a project then it might have been a
    few seconds.

    My tools were naturally fast, even though unoptimised, through being
    small and simple. It's only recently that I took advantage of that
    through developing whole-program compilers.

    This normally needs language support (eg. a decent module scheme).
    Applying it to C is harder (if 50 modules of a project each use some
    huge, 0.5Mloc header, then it means processing it 50 times).

    I think it is possilble without changing the language, but decided it
    wasn't worth the effort. I don't use it enough myself, and nobody else
    seems to care.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Rosario19@3:633/280.2 to All on Sun Dec 1 03:35:42 2024
    On Wed, 20 Nov 2024 12:31:35 -0000 (UTC), Dan Purgert wrote:

    On 2024-11-16, Stefan Ram wrote:
    Dan Purgert <dan@djph.net> wrote or quoted:
    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    above should be equivalent to this

    for(;n>=0&&n<5;++n) printf ("n: %u\n",n);
    printf ("all if completed, n=%u\n",n);


    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    I honestly lost the plot ages ago; not sure if it was either!


    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Segfaults? :D


    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }

    oooh, that's way better at making a point of the hazard than mine was.

    ... almost needed to engage my rubber duckie, before I realized I was >mentally auto-correcting the 'english()' function while reading it.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sun Dec 1 09:07:49 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 16.11.2024 16:14, James Kuyper wrote:

    On 11/16/24 04:42, Stefan Ram wrote:
    ...

    [...]

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    That's indeed a nice example. Where you get fooled by treachery
    "trustiness" of formatting[*]. - In syntax we trust! [**]

    Misleading formatting is the lesser of two problems. A more
    significant bad design choice is writing in an imperative
    style rather than a functional style.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sun Dec 1 23:41:03 2024
    Stefan Ram <ram@zedat.fu-berlin.de> wrote:

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }


    That breaks two rules:
    - instructions conditioned by 'if' should have braces,
    - when we have the result we should return it immediately.

    Once those are fixed code works as expected...

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Dec 2 00:04:30 2024
    Bart <bc@freeuk.com> wrote:
    On 28/11/2024 12:37, Michael S wrote:
    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much.

    This file mostly comprises sqlite3.c which is a machine-generated amalgamation of some 100 actual C files.

    You wouldn't normally do development with that version, but in my
    scenario, where I was trying to find out why the version built with my compiler was buggy, I might try adding debug info to it then building
    with a working compiler (eg. gcc) to compare with.

    Even in context of developing a compiler I would not run blindly
    many compiliations of large file. At first stage I would debug
    compiled program, to find out what is wrong with it. That normally
    involves several runs of the same executable. Possible trick is
    to compile each file separately and link files in various
    combionations, some compiled by gcc, some by my compiler.
    Normally that would locate error to a single file.

    After that I would try to minimize the testcase, removing code which
    do not contribute to the bug. That involves severla compilations
    of files with quickly decreasing sizes.

    Tim isn't asking the right questions (or any questions!). WHY does gcc
    take so long to generate indifferent code when the task can clearly be
    done at least a magnitude faster?

    The simple answer is: users tolerate long compile time. If users
    abandoned 'gcc' to some other compiler due to long compile time,
    then 'gcc' developers would notice. But the opposite has happened:
    'llvm' was significantly smaller and faster but produced slower code.
    'llvm' developers improved optimizations in the process making
    their compiler bigger and slower.

    You need to improve your propaganda for faster C compilers...

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Dec 2 00:19:54 2024
    Bart <bc@freeuk.com> wrote:
    On 30/11/2024 05:25, Tim Rentsch wrote:
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Bart <bc@freeuk.com> writes:

    On 26/11/2024 12:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 25/11/2024 18:49, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    It's funny how nobody seems to care about the speed of
    compilers (which can vary by 100:1), but for the generated
    programs, the 2:1 speedup you might get by optimising it is
    vital!

    I think most people would rather take this path (these times
    are actual measured times of a recently written program):

    compile time: 1 second
    program run time: ~7 hours

    than this path (extrapolated using the ratios mentioned above): >>>>>>>>
    compile time: 0.01 second
    program run time: ~14 hours

    I'm trying to think of some computationally intensive app that
    would run non-stop for several hours without interaction.

    The conclusion is the same whether the program run time
    is 7 hours, 7 minutes, or 7 seconds.

    Funny you should mention 7 seconds. If I'm working on single
    source file called sql.c for example, that's how long it takes for
    gcc to create an unoptimised executable:

    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much. I would think
    that in field of compiled-code HDL simulation people are interested
    in compilation of as big sources as the can afford.

    Sure. But Bart is implicitly saying that such cases make up the
    bulk of C compilations, whereas in fact the reverse is true. People
    don't care about Bart's complaint because the circumstances of his
    examples almost never apply to them. And he must know this, even
    though he tries to pretend he doesn't.

    And of course you picked the farthest-most
    outlier as your example, grossly misrepresenting any sort of
    average or typical case.

    I remember having much shorter file (core of 3rd-party TCP protocol
    implementation) where compilation with gcc took several seconds.

    Looked at it now - only 22 Klocs.
    Text size in .o - 34KB.
    Compilation time on much newer computer than the one I remembered, with
    good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
    with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
    times longer.
    So, in terms of Klock/sec it seems to me that time reported by Bart
    is not outrageous. Indeed, gcc is very slow when compiling any source
    several times above average size.
    In this particular case I can not compare gcc to alternative, because
    for a given target (Altera Nios2) there are no alternatives.

    I'm not disputing his ratios on compilation speeds. I implicitly
    agreed to them in my earlier remarks. The point is that the
    absolute times are so small that most people don't care. For
    some reason I can't fathom Bart does care, and apparently cannot
    understand why most other people do not care. My conclusion is
    that Bart is either quite immature or a narcissist. I have tried
    to explain to him why other people think differently than he does,
    but it seems he isn't really interested in having it explained.
    Oh well, not my problem.

    EVERYBODY cares about compilation speeds. Except in this newsgroup where people try to pretent that it's irrelevant.

    But then at the same time, they strive to keep those compile-times small:

    * By using tools that have themselves been optimised to reduce their runtimes, and where considerable resources have been expended to get the best possible code, which naturally also benefits the tool

    * By using the fastest possible hardware

    * By trying to do parallel builds across multiple cores

    * By organising source code into artificially small modules so that recompilation of just one module is quicker. So, relying on independent compilation.

    * By going to considerable trouble to define inter-dependencies between modules, so that a make system can AVOID recompiling modules. (Why on
    earth would it need to? Oh, because it would be slower!)

    * By using development techniques involving thinking deeply about what
    to change, to avoid a costly rebuild.

    Etc.

    Those methods are effective and work. And one gets optimized
    binaries as a result.

    All instead of relying on raw compilation speed and a lot of those
    points become less relevant.

    If all other factors were the same, then using "better" compiler
    would be nice. But other factors are not equal. You basically
    advocate that people give up features that they want/need to
    allow for simpler compilers, this is not going to happen.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Mon Dec 2 02:13:35 2024
    On 01/12/2024 13:04, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 28/11/2024 12:37, Michael S wrote:
    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much.

    This file mostly comprises sqlite3.c which is a machine-generated
    amalgamation of some 100 actual C files.

    You wouldn't normally do development with that version, but in my
    scenario, where I was trying to find out why the version built with my
    compiler was buggy, I might try adding debug info to it then building
    with a working compiler (eg. gcc) to compare with.

    Even in context of developing a compiler I would not run blindly
    many compiliations of large file.
    Difficult bugs always occur in larger codebases, but with C, these in a language that I can't navigate, and for programs which are not mine, and
    which tend to be badly written, bristling with typedefs and macros.

    It could take a week to track down where the error might be ...

    At first stage I would debug
    compiled program, to find out what is wrong with it.

    .... within the C program. Except there's nothing wrong with the C
    program! It works fine with a working compiler.

    The problem will be in the generated code, so in an entirely different program. So normal debugging tools are useful when several sets of
    source code are in involved, in different languages, or the error occurs
    in the second generation version of either the self-hosted tool, or the program under test if it is to do with languages.

    (For example, I got tcc.c working at one point. My generated tcc.exe
    could compile tcc.c, but that second-generation tcc.c didn't work.)


    After that I would try to minimize the testcase, removing code which
    do not contribute to the bug.

    Again, there is nothing wrong with the C program, but in the code
    generated for it. The bug can be very subtle, but it usually turns out
    to be something silly.

    Removing code from 10s of 1000s of lines (or 250Kloc for sql) is not practical. But yet, the aim is to isolate some code which can be used to recreate the issue in a smaller program.

    Debugging can involve comparing two versions, one working, the other
    not, looking for differences. And here there may be tracking statements
    added.

    If the only working version is via gcc, then that's bad news because it
    makes the process even more of a PITA.

    I added an interpreter mode to my IL, because I assume that would give a solid, reliable reference implementation to compare against.

    If turned out to be even more buggy than the generated native code!

    (One problem was to do with my stdarg.h header which implements VARARGS
    used in function definitions. It assumes the stack grows downwords. In
    my interpreter, it grows downwards!)

    That involves severla compilations
    of files with quickly decreasing sizes.

    Tim isn't asking the right questions (or any questions!). WHY does gcc
    take so long to generate indifferent code when the task can clearly be
    done at least a magnitude faster?

    The simple answer is: users tolerate long compile time. If users
    abandoned 'gcc' to some other compiler due to long compile time,
    then 'gcc' developers would notice.

    People use gcc. They come to depend on its features, or they might use (perhaps unknowingly) some extensions. On Windows, gcc includes some
    headers and libraries that belong to Linux, but other compilers don't
    provide them.

    The result is that if they were to switch to a smaller, faster compiler,
    their program may not work.

    They'd have to use it from the start. But then they may want to use
    libraries which only work with gcc ...


    You need to improve your propaganda for faster C compilers...

    I actually don't know why I care. I get the benefit of my fast tools
    every day; they're a joy to use. So I'm not bothered that other people
    are that tolerant of slow, cumbersome build systems.

    But then, people in this group do like to belittle small, fast products
    (tcc for example as well as my stuff), and that's where it gets annoying.

    So, how long to build LLVM again? It used to be hours. Here's my take on
    it being built from scratch:

    c:\px>tm mm pc
    Compiling pc.m to pc.exe
    TM: 0.08

    This standalone program takes a source file containing an IL program
    rendered as text. It can create EXE, or run it or interpret it.

    Let's try it out:

    c:\cx>cc -p lua # compile a C program to IL
    Compiling lua.c to lua.pcl

    c:\cx>\px\pc -r lua fib.lua # Now compile and run it in-memory
    Processing lua.pcl to lua.(run)
    Running: fib.lua
    1 1
    2 1
    3 2
    4 3
    5 5
    6 8
    7 13
    ...

    Or I can interpret it:

    c:\cx>\px\pc -i lua fib.lua
    Processing lua.pcl to lua.(int)
    Running: fib.lua
    1 1
    ...

    All that from a product that took 80ms to build and comprises a
    self-contained 180KB executable.

    If nobody here can appreciate the benefits of have such a baseline
    product, then there's nothing I can do about that.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Mon Dec 2 02:34:24 2024
    On 01.12.2024 13:41, Waldek Hebisch wrote:
    Stefan Ram <ram@zedat.fu-berlin.de> wrote:

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }


    That breaks two rules:
    - instructions conditioned by 'if' should have braces,

    I suppose you don't mean

    if (n == value) { result = string; }
    else { result = other; }

    which I'd think doesn't change anything. - So what is it?

    Actually, you should just add explicit 'else' to fix the problem.
    (Here there's no need to fiddle with spurious braces, I'd say.)

    - when we have the result we should return it immediately.

    This would suffice to fix it, wouldn't it?


    Once those are fixed code works as expected...

    I find this answer - not wrong, but - problematic for two reasons.
    There's no accepted "general rules" that could get "broken"; it's
    just rules that serve in given languages and application contexts.
    And they may conflict with other "rules" that have been set up to
    streamline code, make it safer, or whatever.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Mon Dec 2 03:14:36 2024
    Reply-To: slp53@pacbell.net

    antispam@fricas.org (Waldek Hebisch) writes:
    Stefan Ram <ram@zedat.fu-berlin.de> wrote:

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }


    That breaks two rules:
    - instructions conditioned by 'if' should have braces,
    - when we have the result we should return it immediately.

    Three rules
    - don't do something at runtime if you can do it at compile time.

    const static char *english_numbers[] = { "zero", "one", "two", "three", "four" };
    const static size_t num_english_numbers = sizeof(english_numbers)/sizeof(english_numbers[0]);

    const char *english(const int n)
    {
    return (n < num_english_numbers) ? english_numbers[n] : "Out-of-range";
    }

    I was doing a code review just last week where a junior programmer had
    to convert a small integer (0..5) to a text label, so the programmer creates a function to return the corrsponding label. That function creates a
    std::map and initializes it with the set of text labels each time the function is called, just to discard the map after looking up the argument.

    Needless to say, it didn't pass review.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Mon Dec 2 09:23:55 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 01.12.2024 13:41, Waldek Hebisch wrote:
    Stefan Ram <ram@zedat.fu-berlin.de> wrote:

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }


    That breaks two rules:
    - instructions conditioned by 'if' should have braces,

    I suppose you don't mean

    if (n == value) { result = string; }
    else { result = other; }

    which I'd think doesn't change anything. - So what is it?

    Actually, you should just add explicit 'else' to fix the problem.
    (Here there's no need to fiddle with spurious braces, I'd say.)

    Lack of braces is a smokescreen hiding the second problem.
    Or to put if differently, due to lack of braces the code
    immediately smells bad.

    - when we have the result we should return it immediately.

    This would suffice to fix it, wouldn't it?

    Yes (but see above).

    Once those are fixed code works as expected...

    I find this answer - not wrong, but - problematic for two reasons.
    There's no accepted "general rules" that could get "broken"; it's
    just rules that serve in given languages and application contexts.
    And they may conflict with other "rules" that have been set up to
    streamline code, make it safer, or whatever.

    No general rules, yes. But every sane programmer has _some_ rules.
    My point was that if you adopt resonable rules, then whole classes
    of potential problems go away.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Mon Dec 2 18:29:40 2024
    On 01.12.2024 23:23, Waldek Hebisch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 01.12.2024 13:41, Waldek Hebisch wrote:
    Stefan Ram <ram@zedat.fu-berlin.de> wrote:

    My bad if the following instruction structure's already been hashed
    out in this thread, but I haven't been following the whole convo!

    In my C 101 classes, after we've covered "if" and "else",
    I always throw this program up on the screen and hit the newbies
    with this curveball: "What's this bad boy going to spit out?".

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }


    That breaks two rules:
    - instructions conditioned by 'if' should have braces,

    I suppose you don't mean

    if (n == value) { result = string; }
    else { result = other; }

    which I'd think doesn't change anything. - So what is it?

    Actually, you should just add explicit 'else' to fix the problem.
    (Here there's no need to fiddle with spurious braces, I'd say.)

    Lack of braces is a smokescreen hiding the second problem.
    Or to put if differently, due to lack of braces the code
    immediately smells bad.

    I know what you mean. Though since in the given example it's not
    the braces that correct the code, and I also think that adding the
    braces doesn't remove the "bad smell" (here). - YMMV, of course. -
    For me the smell stems from the use of sequences of 'if' (instead
    of 'switch'), and the lacking 'else' keywords. - Note that the OP's
    original code *had* braces; it nevertheless had a "bad smell", IMO.

    Spurious braces may even make the code less readable; so it depends.
    And thus a "brace rule" can (IME) only be a "rule of thumb" and any
    "codified rule" (see below) should reflect that.


    - when we have the result we should return it immediately.

    This would suffice to fix it, wouldn't it?

    Yes (but see above).

    Once those are fixed code works as expected...

    I find this answer - not wrong, but - problematic for two reasons.
    There's no accepted "general rules" that could get "broken"; it's
    just rules that serve in given languages and application contexts.
    And they may conflict with other "rules" that have been set up to
    streamline code, make it safer, or whatever.

    No general rules, yes. But every sane programmer has _some_ rules.
    My point was that if you adopt resonable rules, then whole classes
    of potential problems go away.

    I associated the term "rule" with formal coding standards, so that
    I wouldn't call personal coding habits "rules" but rather "rules of
    thumb" (formal coding standards have both). But personal projects
    (and programmers' habits) are anyway not my major concern, while
    coding standards actually are. When you formulate coding standards
    (and I've done that for a couple languages) you often have to walk
    on the edge of what's possible and what's sensible.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Tue Dec 3 01:09:27 2024
    Bart <bc@freeuk.com> writes:

    On 30/11/2024 05:25, Tim Rentsch wrote:

    Michael S <already5chosen@yahoo.com> writes:
    [...]
    I remember having much shorter file (core of 3rd-party TCP protocol
    implementation) where compilation with gcc took several seconds.

    Looked at it now - only 22 Klocs.
    Text size in .o - 34KB.
    Compilation time on much newer computer than the one I remembered, with
    good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
    with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
    times longer.
    So, in terms of Klock/sec it seems to me that time reported by Bart
    is not outrageous. Indeed, gcc is very slow when compiling any source
    several times above average size.
    In this particular case I can not compare gcc to alternative, because
    for a given target (Altera Nios2) there are no alternatives.

    I'm not disputing his ratios on compilation speeds. I implicitly
    agreed to them in my earlier remarks. The point is that the
    absolute times are so small that most people don't care. For
    some reason I can't fathom Bart does care, and apparently cannot
    understand why most other people do not care. My conclusion is
    that Bart is either quite immature or a narcissist. I have tried
    to explain to him why other people think differently than he does,
    but it seems he isn't really interested in having it explained.
    Oh well, not my problem.

    EVERYBODY cares about compilation speeds. [...]

    No, they don't. I accept that you care about compiler speed. What
    most people care about is not speed but compilation times, and as
    long as the times are small enough they don't worry about it.

    Another difference may be relevant here. Based on other comments of
    yours I have the impression that you frequently invoke compilations interactively. A lot of people never do that (or do it only very
    rarely). In a project I am working on now I do builds often,
    including full builds where every .c file is recompiled. But all
    the compilation times together are only a small fraction of the
    total, because doing a build includes lots of other steps, including
    running regression tests. Even if the total compilation time were
    zero the build process wouldn't be appreciably shorter.

    I understand that you care about compiler speed, and that's fine
    with me; more power to you. Why do you find it so hard to accept
    that lots of other people have different views than you do, and
    those people are not all stupid? Do you really consider yourself
    the only smart person in the room?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Dec 3 01:44:46 2024
    On 02/12/2024 14:09, Tim Rentsch wrote:
    Bart <bc@freeuk.com> writes:

    On 30/11/2024 05:25, Tim Rentsch wrote:

    EVERYBODY cares about compilation speeds. [...]

    No, they don't. I accept that you care about compiler speed. What
    most people care about is not speed but compilation times, and as
    long as the times are small enough they don't worry about it.

    Another difference may be relevant here. Based on other comments of
    yours I have the impression that you frequently invoke compilations interactively. A lot of people never do that (or do it only very
    rarely). In a project I am working on now I do builds often,
    including full builds where every .c file is recompiled. But all
    the compilation times together are only a small fraction of the
    total, because doing a build includes lots of other steps, including
    running regression tests. Even if the total compilation time were
    zero the build process wouldn't be appreciably shorter.

    But it might be appreciably longer if the compilers you used were a lot slower! Or needed to be invoked more. Then even you might start to care
    about it.

    You don't care because in your case it is not the bottleneck, and enough
    work has been put into those compilers to ensure they are not even slower.

    (I don't know why regression tests need to feature in every single build.)


    I understand that you care about compiler speed, and that's fine
    with me; more power to you. Why do you find it so hard to accept
    that lots of other people have different views than you do, and
    those people are not all stupid?

    You might also accept that for many, compilation /is/ a bottleneck in
    their work, or at least it introduces an annoying delay.

    Or are you suggesting that the scenario portrayed here:

    https://xkcd.com/303/

    is a complete fantasy?

    Do you really consider yourself
    the only smart person in the room?

    Perhaps the most impatient.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Tue Dec 3 05:19:48 2024
    On 02.12.2024 15:44, Bart wrote:
    On 02/12/2024 14:09, Tim Rentsch wrote:
    Bart <bc@freeuk.com> writes:
    On 30/11/2024 05:25, Tim Rentsch wrote:

    EVERYBODY cares about compilation speeds. [...]

    No, they don't. I accept that you care about compiler speed. What
    most people care about is not speed but compilation times, and as
    long as the times are small enough they don't worry about it.

    Another difference may be relevant here. Based on other comments of
    yours I have the impression that you frequently invoke compilations
    interactively. A lot of people never do that (or do it only very
    rarely). In a project I am working on now I do builds often,
    including full builds where every .c file is recompiled. But all
    the compilation times together are only a small fraction of the
    total, because doing a build includes lots of other steps, including
    running regression tests. Even if the total compilation time were
    zero the build process wouldn't be appreciably shorter.

    Yes, a compiler is no interactive tool. (Even if some, or Bart, use it
    that way.) I've also mentioned that upthread already.

    I want to add that there's also other factors in professional projects
    that makes absolute compilation times not the primary issue. Usually
    we organize our code in modules, components, subsystems, etc.

    The 'make' (or similar tools) will work on small subsets, results will (automatically) be part of a regression on unit-test level. Full builds
    will require more time, but the results will be part of a higher-level
    test (requiring yet more time).

    It just makes little sense to only compile (a single file or a whole
    project) if you don't at least test it.

    But also if you omit the tests, the compile's results are typically
    instantly available, since there's usually only few unit instances
    compiled, where each is comparably small. In case one compiles mostly monolithic software he gets worse response-characteristics, of course.

    Multiple compiles for the same thing, as Bart seem to employ, makes
    sense to fix compile-time (coding-)errors after a significant amount
    of code has changed. That's where habits get relevant; Bart said that
    he likes the (IMO costly) piecewise incremental fix/compile cycles[*],
    I understand that this way to work (with 'make' or triggered by hand)
    will lead to observable delays. Since Bart will likely not change his
    habits (or his code organization) the speed of a single compilation
    is relevant to him. - There's thus nothing we have left to discuss.

    [*] Were I (for example) prefer to fix, if not all, at least a larger
    set of errors in one go.


    But it might be appreciably longer if the compilers you used were a lot slower! Or needed to be invoked more. Then even you might start to care
    about it.

    You don't care because in your case it is not the bottleneck, and enough
    work has been put into those compilers to ensure they are not even slower.

    (I don't know why regression tests need to feature in every single build.)

    Tests are optional, it doesn't need to be done "every time".

    If all you want is to _sequentially_ process each single error in
    a source file you don't need a test; all you need is to get the
    error message, to start the editor, edit, and reiterate the compile
    (to get the next error message, and so on). - Very time consuming.

    But as soon as the errors are [all] fixed in a module... - what
    do you do with it? - ...you should test that what you've changed
    or implemented has been done correctly.

    So edit/compile-iterating a single source is more time-consuming
    than fixing it in, let's call it, "batch-mode". And once it's
    error-free the compile times are negligible in the whole process.


    I understand that you care about compiler speed, and that's fine
    with me; more power to you. Why do you find it so hard to accept
    that lots of other people have different views than you do, and
    those people are not all stupid?

    You might also accept that for many, compilation /is/ a bottleneck in
    their work, or at least it introduces an annoying delay.

    And there are various ways to address that.


    Or are you suggesting that the scenario portrayed here:

    https://xkcd.com/303/

    is a complete fantasy?

    It is a comic. - So, yes, it's fantasy. It's worth a scribbling
    on a WC wall but not suited as a sensible base for discussions.


    Do you really consider yourself
    the only smart person in the room?

    Perhaps the most impatient.

    Don't count on that.

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Tue Dec 3 05:48:14 2024
    On 02/12/2024 18:19, Janis Papanagnou wrote:
    On 02.12.2024 15:44, Bart wrote:


    If all you want is to _sequentially_ process each single error in
    a source file you don't need a test; all you need is to get the
    error message, to start the editor, edit, and reiterate the compile
    (to get the next error message, and so on). - Very time consuming.

    But as soon as the errors are [all] fixed in a module... - what
    do you do with it? - ...you should test that what you've changed
    or implemented has been done correctly.

    So edit/compile-iterating a single source is more time-consuming
    than fixing it in, let's call it, "batch-mode". And once it's
    error-free the compile times are negligible in the whole process.

    I've struggled to find a suitable real-life analogy.

    All I can suggest is that people have gone to some lengths to justify
    having a car that can only travel at 3 mph around town, rather then 30
    mph (ie 5 vs 50 kph).

    Maybe their town is only a village, so the net difference is neglible.
    Or they rarely drive, or avoid doing so, another way to downplay the inconvenience of such slow wheels.

    The fact is that driving at 3 mph on a clear road is incredibly
    frustrating even when you're not in a hurry to get anywhere!

    Or are you suggesting that the scenario portrayed here:

    https://xkcd.com/303/

    is a complete fantasy?

    It is a comic. - So, yes, it's fantasy. It's worth a scribbling
    on a WC wall but not suited as a sensible base for discussions.

    I would disagree. The reason those work is that people can identify with
    them from their own experience, even if exaggerated for comic effect.

    Otherwise no would get them.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Thu Dec 5 12:34:59 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 30.11.2024 05:40, Tim Rentsch wrote:

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 30.11.2024 00:29, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 28/11/2024 17:28, Janis Papanagnou wrote:

    But we're speaking about compilation times. [...]

    You can make a similar argument about turning on the light switch
    when entering a room. Flicking light switches is not something you
    need to do every few seconds, but if the light took 5 seconds to
    come on (or even one second), it would be incredibly annoying.

    This analogy sounds like something a defense attorney would say who
    has a client that everyone knows is guilty.

    Intentionally or not; it's funny to respond to an analogy with an
    analogy. :-}

    My statement was not an analogy. Similar is not the same as
    analogous.

    It's of course (and obviously) not the same; it's just a
    similar term where the semantics of both terms have an overlap.

    (Not sure why you even bothered to reply and nit-pick here.

    It's because you thought it was just a nit-pick that I bothered
    to reply.

    But with your habit you seem to have just missed the point;
    the comparison of your reply-type with Bart's argumentation.)

    If you think they are the same then it is you who has missed the
    point.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Dan Purgert@3:633/280.2 to All on Thu Dec 5 21:51:51 2024
    On 2024-11-30, Rosario19 wrote:
    On Wed, 20 Nov 2024 12:31:35 -0000 (UTC), Dan Purgert wrote:

    On 2024-11-16, Stefan Ram wrote:
    Dan Purgert <dan@djph.net> wrote or quoted:
    if (n==0) { printf ("n: %u\n",n); n++;}
    if (n==1) { printf ("n: %u\n",n); n++;}
    if (n==2) { printf ("n: %u\n",n); n++;}
    if (n==3) { printf ("n: %u\n",n); n++;}
    if (n==4) { printf ("n: %u\n",n); n++;}
    printf ("all if completed, n=%u\n",n);

    above should be equivalent to this

    for(;n>=0&&n<5;++n) printf ("n: %u\n",n);
    printf ("all if completed, n=%u\n",n);

    Sure, but fir's original posting in
    MID <3deb64c5b0ee344acd9fbaea1002baf7302c1e8f@i2pn2.org>

    was a contrived sequence to the effect of
    if (n==0) { //do something }
    if (n==1) { //do something }
    if (n==2) { //do something }
    if (n==3) { //do something }
    if (n==4) { //do something }

    So, I merely took the contrived sequence, and made "do something" trip
    each condition.

    Stefan's example from a few posts back is better:

    Well, it's a blue moon when someone nails it. Most of them fall
    for my little gotcha hook, line, and sinker.

    #include <stdio.h>

    const char * english( int const n )
    { const char * result;
    if( n == 0 )result = "zero";
    if( n == 1 )result = "one";
    if( n == 2 )result = "two";
    if( n == 3 )result = "three";
    else result = "four";
    return result; }

    void print_english( int const n )
    { printf( "%s\n", english( n )); }

    int main( void )
    { print_english( 0 );
    print_english( 1 );
    print_english( 2 );
    print_english( 3 );
    print_english( 4 ); }

    --
    |_|O|_|
    |_|_|O| Github: https://github.com/dpurgert
    |O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Fri Dec 6 00:21:41 2024
    On 05.12.2024 02:34, Tim Rentsch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 30.11.2024 05:40, Tim Rentsch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 30.11.2024 00:29, Tim Rentsch wrote:
    Bart <bc@freeuk.com> writes:
    On 28/11/2024 17:28, Janis Papanagnou wrote:

    But we're speaking about compilation times. [...]

    You can make a similar argument about turning on the light switch
    when entering a room. Flicking light switches is not something you >>>>>> need to do every few seconds, but if the light took 5 seconds to
    come on (or even one second), it would be incredibly annoying.

    This analogy sounds like something a defense attorney would say who
    has a client that everyone knows is guilty.

    Intentionally or not; it's funny to respond to an analogy with an
    analogy. :-}

    My statement was not an analogy. Similar is not the same as
    analogous.

    It's of course (and obviously) not the same; it's just a
    similar term where the semantics of both terms have an overlap.

    (Not sure why you even bothered to reply and nit-pick here.

    It's because you thought it was just a nit-pick that I bothered
    to reply.

    But with your habit you seem to have just missed the point;
    the comparison of your reply-type with Bart's argumentation.)

    If you think they are the same then it is you who has missed the
    point.

    (After the nit-pick level you seem to have now reached the
    Kindergarten niveau of communication. - And no substance as so
    often in contexts where you cannot copy/paste a "C" standard
    text passage.)

    The point was; you were both making comparisons by expressing
    similarities - "a similar argument" [Bart] and "sounds like"
    [Tim]; you both expressed an opinion and backed that up by
    formulating similarities; Bart (unnecessarily leaving the well
    disputable IT context) by his light bulbs, any you (more on a
    personal behavior level, unsurprisingly) comparing his habits
    with [also a prejudice] other professions' habits (attorneys).

    (Again, I wondered why you even bothered to reply. My original
    reply wasn't even meant disrespectful; I was just amused. -
    But meanwhile, given your response habits, I better ignore you
    again, especially since you don't want to contribute but prefer
    playing the troll.)

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Fri Dec 6 00:41:49 2024
    On 02.12.2024 19:48, Bart wrote:
    On 02/12/2024 18:19, Janis Papanagnou wrote:
    On 02.12.2024 15:44, Bart wrote:


    If all you want is to _sequentially_ process each single error in
    a source file you don't need a test; all you need is to get the
    error message, to start the editor, edit, and reiterate the compile
    (to get the next error message, and so on). - Very time consuming.

    But as soon as the errors are [all] fixed in a module... - what
    do you do with it? - ...you should test that what you've changed
    or implemented has been done correctly.

    So edit/compile-iterating a single source is more time-consuming
    than fixing it in, let's call it, "batch-mode". And once it's
    error-free the compile times are negligible in the whole process.

    I've struggled to find a suitable real-life analogy.

    To argue in the topical domain is always better than making up
    (typically non-fitting) real-life analogies.

    (The same with your light-bulb analogy; I was inclined to answer
    on that level, and could have even affirmed my point by it, but
    decided that it's not the appropriate way to discuss the simple
    processual issue, that I tried to explain you.)


    All I can suggest is that people have gone to some lengths to justify
    having a car that can only travel at 3 mph around town, rather then 30
    mph (ie 5 vs 50 kph).

    (You certainly meant km/h.)

    Since you like analogies, let me tell you that I recently got
    aware that on a city-highway(!) in my city they had introduced
    a speed limit of 30 km/h (about 20mph); for reasons.


    Maybe their town is only a village, so the net difference is neglible.
    Or they rarely drive, or avoid doing so, another way to downplay the inconvenience of such slow wheels.

    The fact is that driving at 3 mph on a clear road is incredibly
    frustrating even when you're not in a hurry to get anywhere!

    There are many more factors than frustration to be considered;
    safety, pollution, noise, and optimal throughput, for example.
    Similar as with development processes; if you have just one
    factor (speed?) on your scale you might miss the overall goals.

    (If you want to quickly get anywhere within the metropolitan
    boundaries you just take the bicycle or the public transport
    facilities. Just BTW. In other countries' cities there may be
    other situations, preconditions and regulations.)

    Janis

    [...]



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Fri Dec 6 10:51:54 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 02.12.2024 19:48, Bart wrote:
    [...]
    All I can suggest is that people have gone to some lengths to justify
    having a car that can only travel at 3 mph around town, rather then 30
    mph (ie 5 vs 50 kph).

    (You certainly meant km/h.)

    Both "kph" and "km/h" are common abbreviations for "kilometers per
    hour". Were you not familiar with "kph"?

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Dec 6 11:24:10 2024
    Bart <bc@freeuk.com> writes:

    On 02/12/2024 14:09, Tim Rentsch wrote:

    Bart <bc@freeuk.com> writes:

    On 30/11/2024 05:25, Tim Rentsch wrote:

    EVERYBODY cares about compilation speeds. [...]

    No, they don't. I accept that you care about compiler speed.
    What most people care about is not speed but compilation times,
    and as long as the times are small enough they don't worry about
    it.

    Another difference may be relevant here. Based on other comments
    of yours I have the impression that you frequently invoke
    compilations interactively. A lot of people never do that (or do
    it only very rarely). In a project I am working on now I do
    builds often, including full builds where every .c file is
    recompiled. But all the compilation times together are only a
    small fraction of the total, because doing a build includes lots
    of other steps, including running regression tests. Even if the
    total compilation time were zero the build process wouldn't be
    appreciably shorter.

    But it might be appreciably longer if the compilers you used were
    a lot slower! Or needed to be invoked more. [...]

    I concede your point. If things were different they wouldn't
    be the same.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Waldek Hebisch@3:633/280.2 to All on Sat Dec 7 10:30:40 2024
    Bart <bc@freeuk.com> wrote:
    On 01/12/2024 13:04, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:
    On 28/11/2024 12:37, Michael S wrote:
    On Wed, 27 Nov 2024 21:18:09 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    c:\cx>tm gcc sql.c #250Kloc file
    TM: 7.38

    Your example illustrates my point. Even 250 thousand lines of
    source takes only a few seconds to compile. Only people nutty
    enough to have single source files over 25,000 lines or so --
    over 400 pages at 60 lines/page! -- are so obsessed about
    compilation speed.

    My impression was that Bart is talking about machine-generated code.
    For machine generated code 250Kloc is not too much.

    This file mostly comprises sqlite3.c which is a machine-generated
    amalgamation of some 100 actual C files.

    You wouldn't normally do development with that version, but in my
    scenario, where I was trying to find out why the version built with my
    compiler was buggy, I might try adding debug info to it then building
    with a working compiler (eg. gcc) to compare with.

    Even in context of developing a compiler I would not run blindly
    many compiliations of large file.
    Difficult bugs always occur in larger codebases, but with C, these in a language that I can't navigate, and for programs which are not mine, and which tend to be badly written, bristling with typedefs and macros.

    It could take a week to track down where the error might be ...

    It could be. You could declare that the program is hopeless or do
    what is needed. Which frequently means effectively using available
    debugging features. For example, I got strange crash. Looking at
    data in the debugger suggested that data is malformed. So I used
    data breakpoints to figure out which instruction initialized the data.
    That needed several runs of the program, in each run looking what
    happened to suspected memory location. At the end I localized the
    problem and rest was easy.

    Some problems are easy, for example significat percentage of
    segfaults: you have something which is not a valid address
    ad freqently you immediatly see why the address is wrong and
    how to fix this. Still, finding this usually takes longer
    than compilation.

    At first stage I would debug
    compiled program, to find out what is wrong with it.

    ... within the C program. Except there's nothing wrong with the C
    program! It works fine with a working compiler.

    The problem will be in the generated code, so in an entirely different program.

    Of course problem is in the generated code. But debug info (I had
    at least _some_ debug info, apparently you do not have it) shows you
    which part of source is responsible for given machine code. And you
    can see data, so can see what is happening in the generated program.
    And you have C source so you can see what should happen. Once
    you know place where "what is happening" differs from "what should
    happen" you normally can produce quite small reproducing example.

    So normal debugging tools are useful when several sets of
    source code are in involved, in different languages, or the error occurs
    in the second generation version of either the self-hosted tool, or the program under test if it is to do with languages.

    (For example, I got tcc.c working at one point. My generated tcc.exe
    could compile tcc.c, but that second-generation tcc.c didn't work.)

    Clear, you work in stages: first you find out what is wrong with second-generation tcc.exe. Then you find out piece of tcc.c that was miscompiled by first generation tcc.exe (producing wrong second
    generation compiler). Then you find piece of tcc.c which was
    responsible for this miscompilation. And finally you look why
    your compiler miscompiled this piece of tcc.c.

    Tedius, yes. It is easier if you have good testsuite, that is
    collection of small programs that excercise various constructs
    and potentially problematic combinations.

    Anyway, most of the work involves executing programs in debugger
    and observing critical things. Re-creating executables is rare
    in comparison. Main point where compiler speed matters is time
    to run compiler testsuite.

    After that I would try to minimize the testcase, removing code which
    do not contribute to the bug.

    Again, there is nothing wrong with the C program, but in the code
    generated for it. The bug can be very subtle, but it usually turns out
    to be something silly.

    Removing code from 10s of 1000s of lines (or 250Kloc for sql) is not practical. But yet, the aim is to isolate some code which can be used to recreate the issue in a smaller program.

    If you have "good" version (say one produced by 'gcc' or by earlier
    worong verion of your compiler), then you can isolate problem by
    linking parts produced by different compilers. Even if you have
    one huge file, typically you can split it into parts (if it is one
    huge function normally it is possible to split it into smaller
    ones). Yes, it is work but getting quality product needs work.

    Debugging can involve comparing two versions, one working, the other
    not, looking for differences. And here there may be tracking statements added.

    If the only working version is via gcc, then that's bad news because it makes the process even more of a PITA.

    Well, IME tracking statements frequently produce too much or too little
    data. When dealing with C code I tend to depend more on debugger,
    setting breakpoints in crucial places and examing data there. Extra
    printing functions can help, for example gcc has printing functions
    for its main data structures. Such functions can be called from
    debugger and give nicer output than generic debugger functions.
    But even if you need extra printiong functions you can put them
    in separate file, compile once and use multiple times.

    I added an interpreter mode to my IL, because I assume that would give a solid, reliable reference implementation to compare against.

    If turned out to be even more buggy than the generated native code!

    (One problem was to do with my stdarg.h header which implements VARARGS
    used in function definitions. It assumes the stack grows downwords.

    This is true on most machines, but not all.

    In
    my interpreter, it grows downwards!)

    You probably meant upwards? And handling such things is natural
    when you have portablity in mind, either you parametrise stdarg.h
    so that it works for both stack directions, or you make sure that
    interpreter and compiler use the same direction (the later seem to
    be much easier). Actually, I think that most natural way is to
    have data structure layout in the interpreter to be as close as
    possible to compiler data layout. Of course, there are some
    unavoidable differences, interpreter needs registers for its operation
    so some variables that could be in registers in compiled code
    will end in stack frame.

    That involves severla compilations
    of files with quickly decreasing sizes.

    Tim isn't asking the right questions (or any questions!). WHY does gcc
    take so long to generate indifferent code when the task can clearly be
    done at least a magnitude faster?

    The simple answer is: users tolerate long compile time. If users
    abandoned 'gcc' to some other compiler due to long compile time,
    then 'gcc' developers would notice.

    People use gcc. They come to depend on its features, or they might use (perhaps unknowingly) some extensions. On Windows, gcc includes some
    headers and libraries that belong to Linux, but other compilers don't provide them.

    The result is that if they were to switch to a smaller, faster compiler, their program may not work.

    They'd have to use it from the start. But then they may want to use libraries which only work with gcc ...

    Well, you see that there are reasons to use 'gcc'. Long ago I
    produced image processing DLL for Windows. First version was
    developed on Linux using 'gcc' and then compiled on Windows
    using Borland C. It turned out that in Borland C 'setjmp/longjmp'
    did not work, so I had to work around this. Not nice, but
    managable. At that time C standard did not include function
    to round floats to integers and that proved to be problematic.
    C default, that is truncation produced artifacts that were not
    acceptable. So I used emulation of rounding based on 'floor',
    that worked OK, but turned out to be slow (something like 70%
    of runtime went into rounding). So I replaced this by assembler
    code. With Borland C I had to call a separate assembler routine,
    which had some overhead.

    Next version was cross-compiled on Linux using gcc. This version
    used inline assembly for rounding and was significantly faster
    than what Borland C produced. Note: images to process were
    largish (think of say 12000 by 20000 pixels) and speed was
    important factor. So using 'gcc' specific code was IMO justified
    (this code was used conditionally, other compilers would get
    slow portable version using 'floor').

    You need to improve your propaganda for faster C compilers...

    I actually don't know why I care. I get the benefit of my fast tools
    every day; they're a joy to use. So I'm not bothered that other people
    are that tolerant of slow, cumbersome build systems.

    But then, people in this group do like to belittle small, fast products
    (tcc for example as well as my stuff), and that's where it gets annoying.

    I tried tcc compiling TeX. Long ago it did not work due to limitations
    of tcc. This time it worked. Small comparison on main file (19062
    lines):

    Command time size code size data
    tcc -g 0.017 290521 1188
    tcc 0.015 290521 1188
    gcc -O0 -g 0.440 248467 14
    gcc -O0 0.413 248467 14
    gcc -O -g 1.385 167565 0
    gcc -O 1.151 167565 0
    gcc -Os -g 1.998 142336 0
    gcc -Os 1.724 142336 0
    gcc -O2 -g 2.683 207913 0
    gcc -O2 2.257 207913 0
    gcc -O3 -g 3.510 255909 0
    gcc -O3 2.934 255909 0
    clang -O0 -g 0.302 232755 14
    clang -O0 0.189 232755 14
    clang -O -g 1.996 223410 0
    clang -O 1.683 223410 0
    clang -Os -g 1.693 154421 0
    clang -Os 1.451 154421 0
    clang -O2 -g 2.774 259569 0
    clang -O2 2.359 259569 0
    clang -O3 -g 2.970 280235 0
    clang -O3 2.537 280235 0

    I have dully provided both time when using '-g' and without.
    Both are supposed to produce the same code (so also code
    and data sizes are the same), but you can see that '-g'
    measurably increases compile time. AFAIK compiler data
    structures contain slots for debug info even if '-g' is
    not given and compiler generates no debug info. So
    actial cost of supporting '-g' is higher than the difference,
    you pay part of this cost even if you do not use the
    capability.

    ATM I do not have data handy to compare runtimes (TeX needs
    extra data to do uesful work), so I provide code and data
    size as a proxy. As you can see even at -O0 gcc and clang
    manage to put almost all data into istructions (actually
    in tex.c _all_ intialized data is constant), while tcc
    keeps it as data which requires extra instructions to
    access. gcc at -O and -Os and clang at -Os produce code
    which is about half of size of tcc result. Some part
    of it may be due to using smaller instructions, but most
    is likely because gcc and clang results simply have much
    less instructions. At higher optimization level code
    size grows, this is probably due to inlining and code
    duplication. This usually gives some small speedup at
    cost of bigger code, but one would have to measure
    (sometimes attempts at optimization backfire and lead
    to slower code).

    Anyway, 19062 lines is much larger than typical file that
    I work with and even for such size compile time is reasonable.
    Maybe less typical is modest use of include files, tex.c
    uses few standard C headers and 1613 lines of project-specific
    headers. Still, there are macros and macro-expanded result
    is significantly bigger than the source.

    In the past TeX execution time correlated reasonably well with
    Dhrystone. On Dhrystone tcc compiled code is about 4 times
    slower than gcc/lang, so one can expect tcc compiled TeX to
    be significantly slower than one compiled by gcc or clang.

    --
    Waldek Hebisch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: To protect and to server (3:633/280.2@fidonet)
  • From Janis Papanagnou@3:633/280.2 to All on Sat Dec 7 21:58:49 2024
    On 06.12.2024 00:51, Keith Thompson wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 02.12.2024 19:48, Bart wrote:
    [...]
    All I can suggest is that people have gone to some lengths to justify
    having a car that can only travel at 3 mph around town, rather then 30
    mph (ie 5 vs 50 kph).

    (You certainly meant km/h.)

    Both "kph" and "km/h" are common abbreviations for "kilometers per
    hour". Were you not familiar with "kph"?

    No. Must be a convention depending on cultural context of locality.
    ("kph", if anything, is "kilopond-hour, per standard.)

    So thanks for pointing that out. (I forget sometimes that in some
    countries there's a reluctance using the [established] standards,
    and I certainly don't know about all the cultural peculiarities of
    the [many] existing countries, even if they are as dominating as
    the USA is [or other English speaking or influenced countries].)

    We're used to the SI units and metric form, although hereabouts
    some folks also (informally, but wrongly) pronounce it as "k-m-h".

    Janis


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bart@3:633/280.2 to All on Sat Dec 7 23:40:57 2024
    On 06/12/2024 23:30, Waldek Hebisch wrote:
    Bart <bc@freeuk.com> wrote:

    (For example, I got tcc.c working at one point. My generated tcc.exe
    could compile tcc.c, but that second-generation tcc.c didn't work.)

    Clear, you work in stages: first you find out what is wrong with second-generation tcc.exe.

    Ha, ha, ha!

    While C /can/ written reasonably clearly, tcc sources are more typical.
    Very dense, mixed-up lower and upper case everywhere, apparent over-use
    of macros, eq:

    for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
    if (sym->st_shndx == SHN_UNDEF) {
    name = (char *) symtab_section->link->data + sym->st_name;
    sym_index = find_elf_sym(s1->dynsymtab_section, name);

    If I was looking to develop this product then it might be worth spending
    days or weeks learning how it all works. But it's not worth mastering
    this codebase inside out just to discover I wrote 0 instead of 1
    somewhere in my compiler.

    I need whatever error it is to manifest itself in a simpler way. Or have
    two versions (eg. one interpreted the other native code) that give
    different results. The problem with this app is that those different
    results appear too far down the line; I don't want to trace a billion instructions first.

    So, when I get back to it, I'll test other open source C code. (The
    annoying thing though is that either it won't compile for reasons I've
    lost interest in, or it works completely fine.)

    In
    my interpreter, it grows downwards!)

    You probably meant upwards?

    Yes.

    And handling such things is natural
    when you have portablity in mind, either you parametrise stdarg.h
    so that it works for both stack directions, or you make sure that
    interpreter and compiler use the same direction (the later seem to
    be much easier).

    This is quite a tricky one actually. There is currently conditional code
    in my stdarg.h that detects whether the compiler has set a flag saying
    result will be interpreted. But it doesn't always know that.

    For example, the compiler might be told to do -E (preprocess) and the
    result compiled later. The stack direction is baked into the output.

    Or it will do -p (generate discrete IL), where it doesn't know whether
    that will be interpreted.

    But this is not a serious issue; the interpreted option is for either debugging or novelty uses.


    Actually, I think that most natural way is to
    have data structure layout in the interpreter to be as close as
    possible to compiler data layout.

    I don't want my hand forced in this. The point of interpreting is to be independent of hardware. A downward growing stack is unnatural.

    They'd have to use it from the start. But then they may want to use
    libraries which only work with gcc ...

    Well, you see that there are reasons to use 'gcc'.

    Self-perpetuating ones, which are the wrong reasons.


    Next version was cross-compiled on Linux using gcc. This version
    used inline assembly for rounding and was significantly faster
    than what Borland C produced. Note: images to process were
    largish (think of say 12000 by 20000 pixels) and speed was
    important factor. So using 'gcc' specific code was IMO justified
    (this code was used conditionally, other compilers would get
    slow portable version using 'floor').

    I have a little image editor written entirely in interpreted code. (It
    was supposed to a project that was mixed language, but that's some way off.)

    However it is just about usable. Eg. inverting the colours (negative to positive etc) of a 6Mpix colour image takes 1/8th of a second. Splitting
    into separate R,G,B 8-bit planes takes half a second. This is with
    bytecode working on a pixel at a time.

    It uses no optimised code in the interpreter. Only a mildly accelerated dispatcher.


    You need to improve your propaganda for faster C compilers...

    I actually don't know why I care. I get the benefit of my fast tools
    every day; they're a joy to use. So I'm not bothered that other people
    are that tolerant of slow, cumbersome build systems.

    But then, people in this group do like to belittle small, fast products
    (tcc for example as well as my stuff), and that's where it gets annoying.

    I tried tcc compiling TeX. Long ago it did not work due to limitations
    of tcc. This time it worked. Small comparison on main file (19062
    lines):

    Command time size code size data
    tcc -g 0.017 290521 1188
    tcc 0.015 290521 1188
    gcc -O0 -g 0.440 248467 14
    gcc -O0 0.413 248467 14

    This is demonstrating that tcc is translating C code at over 1 million
    lines per second, and generating binary code at 17MB per second. You're
    not impressed by that?

    Here are a couple of reasonably substantial one-file programs that can
    be run, both interpreters:

    https://github.com/sal55/langs/blob/master/lua.c

    This is a one-file Lua interpreter, which I modified to take input from
    a file. (For original, see comment at start.)

    On my machine, these are typical results:

    gcc -s -O3 14 secs 378KB 3.0 secs (compile-time, size, runtime)
    gcc -s -O0 3.3 secs 372KB 10.0 secs
    tcc 0.12 secs 384KB 8.5 secs
    cc 0.14 secs 315KB 8.3 secs

    The runtime refers to running this Fibonacci test (fib.lua):

    function fibonacci(n)
    if n<3 then
    return 1
    else
    return fibonacci(n-1) + fibonacci(n-2)
    end
    end

    for n = 1, 36 do
    f=fibonacci(n)
    io.write(n," ",f, "\n")
    end

    The one is a version of my interpreter, minus ASM acceleration,
    transpiled to C, and for Linux:

    https://github.com/sal55/langs/blob/master/qc.c

    Compile using for example:

    gcc qc.c -oqc -fno-builtin -lm -ldl
    tcc qc.c -oqc -fdollars-in-identifiers -lm -ldl

    The input there can be (fib.q):

    func fib(n)=
    if n<3 then
    1
    else
    fib(n-1)+fib(n-2)
    fi
    end

    for i to 36 do
    println i,fib(i)
    od

    Run like this:

    ./qc -nosys fib

    On my Windows machine, gcc-O3-compiled version takes 4.1 seonds, and tcc
    is 9.3 seconds. It's narrower than the Lua version which uses a C style
    that depends more on function inlining. (Note that being in one file,
    allows gcc to do whole-program optimisations.)

    My cc-compiled version runs in 5.1 seconds, so only 25% slower than
    gcc-O3. It also produces a 360KB executable, compared with gcc's 467KB,
    even with -s. tcc's code is about the same as gcc-O3.

    (My cc-compiler doesn't yet have the optimising pass that makes code
    smaller. The original source qc project, builds to 266KB with that pass enabled, while gcc's -Os on qc.c manages 280KB.

    But my 266KB version runs faster than gcc's 280KB! And accelerated code
    runs 5 times as fast. (6 secs vs 1.22 secs.)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)